Upload
libfsb
View
162
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Article Title Page
Benchmarking of Marine Bunker Fuel Suppliers: The Good, The Bad, The Ugly Author Details Author 1 Name: Ole Jørgen Anfindsen University/Institution: DNV Research & Innovation Town/City: Høvik Country: Norway Author 2 Name: Grunde Løvoll University/Institution: DNV Research & Innovation Town/City: Høvik Country: Norway Author 3 Name: Thomas Mestl University/Institution: DNV Research & Innovation Town/City: Høvik Country: Norway Corresponding author: Ole Jørgen Anfindsen Corresponding Author’s Email: [email protected] Acknowledgments (if applicable): n/a Biographical Details (if applicable): Ole Anfindsen holds a dr. scient. degree (PhD) in computer science and a bachelors degree in electronics engineering. For more than 25 years he has worked with databases and related technologies. He has been senior research scientist in Telenor R&D, visiting researcher at GTE Laboratories (Massachusetts) and Sun Microsystems Laboratories (California), as well as adjunct associate professor at the Institute of Informatics at the University of Oslo. He currently works as a researcher in the Research & Innovation department of DNV, where his main activity is directed towards data analysis especially in the maritime area. G. Løvoll has a dr. scient. degree (PhD) in physics. Grunde has worked for 6 years as a Post Doc and researcher at the Department of Physics at the University of Oslo doing experimental studies on multiphase flow in porous materials, water diffusion in dry clay and optical tweezers. Dr. Løvoll currently works as a researcher in DNV Research & Innovation, where his main focus is on data analysis in the maritime area. Thomas Mestl has a Dr. Scient. (PhD) in mathematics and a degree in precisions engineering. He has worked in DNV's Research Department for the last 13 years within the field of information technology. A large part of his work has been on identifying emerging technology trends, evaluating new ICT technologies (especially with respect to mobile work and information management), and to identify promising business opportunities offered by new or combination of existing technologies. Currently, his main activity is directed towards data analysis especially in the maritime area.
Structured Abstract: Purpose - This paper has two main focus areas; the construction of a realistic best practice benchmark, and the development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known in this trade, unfair business behaviors in the bunker fuel market are not uncommon, resulting in financial losses for the buyers.
Design/methodology/approach - Establishing a best practice will naturally involve some degree of subjectivity as there is not a priori correct answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived from a best practice benchmark histogram. The main advantages of this method are its relative independence both of sample size and of the underlying distribution, as well as being computationally very efficient.
Findings - Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to outliers and is well suited for small datasets and even single numbers. When applied to data for all suppliers worldwide it turns out that the number of good suppliers is actually much lower than might be expected.
Practical implications - Bunker fuel is a major expense for ship owners, and can easily reach $30 million/year for a single container ship. There is therefore a considerable interest in the market for benchmarking of individual fuel suppliers. Our methodology is also applicable to other quality related fuel parameters.
Originality/value - To the best of our knowledge this is the first attempt to benchmark actors in the marine bunker fuel industry and to quantify their behaviors.
Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality, best practice
Type footer information here
Type header information here
Article Classification: Technical paper
For internal production use only Running Heads:
p. 1
Benchmarking of Marine Bunker Fuel Suppliers:
The Good, The Bad, The Ugly
Abstract
Purpose This paper has two main focus areas; the construction of a realistic best practice benchmark, and the development of a methodology for comparison of individual suppliers of marine bunker fuel. As is well-known in this trade, unfair business behaviors in the bunker fuel market are not uncommon, resulting in financial losses for the buyers.
Design/methodology/approach Establishing a best practice will naturally involve some degree of subjectivity as there is no a priori correct answer to this problem. Using the concept of membership functions from fuzzy set theory, a score can be derived from a best practice benchmark histogram. The main advantages of this method are it’s relative independence both of sample size and of the underlying distribution, as well as being computationally very efficient.
Findings Our methodology turns out to be more powerful than standard descriptive statistics, as it is less sensitive to outliers and is well suited for small datasets and even single numbers. When applied to data for all suppliers worldwide it turns out that the number of good suppliers is actually much lower than what might be expected.
Practical implications Bunker fuel is a major expense for ship owners, and can easily reach $30 million/year for a single container ship. There is therefore a considerable interest in the market for benchmarking of individual fuel suppliers. Our methodology is also applicable to other quality related fuel parameters.
Originality/value To the best of our knowledge this is the first attempt to benchmark actors in the marine bunker fuel industry and to quantify their behaviors.
Keywords: benchmarking, membership functions, scoring, fuzzy clustering, supplier quality, best practice
Category: Technical Paper
1. Introduction The density of marine bunker fuel can be regarded as one of its most basic parameters. It is used for fuel quantity estimation, and is also the basis for the so-called Calculated Carbon Aromaticity Index (CCAI), an important factor for ignition and for deposits in the engine and used for calculating the specific energy content in fuel. Density is also an important factor when it comes to the process of separating water or solids from bunker fuel.
For the typical ship operator the primary importance of density comes from the fact that bunker fuel is delivered by volume but paid per ton. The conversion is done by means of the fuel density reported by the supplier. A small density difference between stated and actual fuel density can quickly lead to large financial losses for the ship operator. For instance, if a density of 977 kg/m3 is stated when the actual value happens to be 960 kg/m3, this will give rise to a difference of nearly 35 ton when
p. 2
bunkering 2000m3, the value of which, in the current market, is close to US$ 20,000 – just for a single bunkering.
Although this example belongs in the high end of the spectrum, it is not at all hard to find even more extreme examples in real life. And such a way of making a quick buck is exploited by many fuel suppliers as their stated density is usually used to calculate the quantity of the delivered fuel. Over-reporting of density, i.e. claiming that the fuel density is higher than what is actually the case, is called short-lifting, while the opposite could be termed long-lifting. Short-lifting implies that the ship operator loses money, since he pays for more fuel than he receives. Long-lifting implies that the fuel supplier loses money, and that the ship operator gets more than what he pays for.
The global market for marine bunker fuel is more than 300 million tons annually (IEA 2010, p. 618; Eyring et al 2010; IMO 2009; EPA 2008). We estimate that more than 300,000 tons of bunker fuel, i.e. about 1‰ of the global consumption, is short-lifted every year. We further estimate that the amount of long-lifting exceeds 150,000 tons. That is, on the order of half a million tons are long- or short-lifted annually. Thus, bunker fuel worth more than US$200 million appears not to be properly accounted for every year.
Both short- and long-lifting may be indications of fraudulent behavior of individual employees within the ship operator’s or bunker fuel supplier’s organization. Such behavior is however sufficiently
widespread that a systematic and commonly accepted short-lifting praxis in parts of the bunker fuel trade may be suspected. Some fuel suppliers use this tactic to consistently over-state the delivered amount to improve the company’s profit margin. Many ship operators and suppliers would welcome a benchmarking of suppliers, ports, or geo-regions against some best practice.
The rest of the paper is organized as follows: In Section 2 we take a closer look at concrete examples of different density reporting strategies and discuss the difficulties associated with single number characteristics. In Section 3 we use this to characterize good suppliers and derive criteria for defining a best practice. In Section 4, a Best Practice Classifier is constructed that will assign a Best Practice
Score to an individual bunkering or a supplier. We also present a series of benchmarking comparisons between regions together with an overview of how they developed over a 10 year period. This paper ends with a discussion and some promising leads for further work.
2. Investigating density reporting behavior Table 1 gives some statistics for density deviations on a global and local basis (e.g. Canada and the US West coast, South Asia, Middle East, and South America West) and for 4 selected suppliers (S1, S2 , S3,
S4) in 4 different bunker ports. The density difference, dd, is the difference between the density
claimed by the supplier and the actual density measured by a fuel testing agency (e.g. DNVPS). The
average density difference, dd , could in principle be used to characterize the behavior of a fuel
supplier (a port or a region) as good, medium or bad.
Unfortunately, most of such single number quality measures have some sort of shortcoming as they compress a wealth of information into a single number. They often wipe out (quite effectively) much of the information about the interesting behavior of a supplier. In addition, the arithmetic mean or median may be less suited for distributions that are non-normal, skewed or showing heavy tails. Also, the mean and standard deviation is very sensitive to outliers (a few unusually large or small observations) (Bhattacharyya & Johnson 1977). As an example, the mean value of ten bad bunkerings could easily be balanced by one exceptionally good one (or a typing error), while the median is less sensitive to outliers. Another problem with the mean and median is that they reveal nothing about the shape of the underlying distribution. For instance, if we only look at the mean, the geo-region South America West seems to be better than e.g. Canada & US West Coast from a short-lifting perspective, see Table 1. If we take the standard deviation into account it is obvious that there is a higher risk of being short-lifted in South America West than in the other geo-regions, simply because the distribution is wider. The standard deviation only refers to the width of the underlying distribution but not to the actual shape. As can be seen in Figure 2 the distributions are non-normal, i.e. a highly skewed middle spike combined with a very long one-sided tail.
p. 3
Table 1: Standard descriptive measures of density differences for some selected geo-regions and suppliers
(n = number of samples, dd = mean density difference, σdd = standard deviation of dd). Histograms for the
geo-regions and suppliers are shown in Figures 1 and 2 respectively, whereas their scatter plots are shown
in Figures 3 and 4. Data in this table and in the following examples is, unless otherwise stated, based on
DNVPS bunkering samples of RMG380 fuel collected in 2008 (confer DNV 2010).
n dd
in kg/m3
median(dd) in kg/m3
σσσσdd
Global 43343 0.39 0.10 3.92
Canada & US West Coast 1919 0.03 -0.10 2.43
South Asia 6806 1.22 0.90 3.35
Middle east 2990 1.83 0.70 4.76
South America West 565 -0.48 -0.90 6.00
Supplier 1 (S1) 129 -0.12 -0.10 0.95
Supplier 2 (S2) 239 2.31 0.90 4.84
Supplier 3 (S3) 71 2.40 2.60 1.83
Supplier 4 (S4) 145 2.07 1.50 2.81
Histograms
For a more detailed understanding of the properties of the data in Table 1 please refer to the density difference histograms of Figures 1 and 2. For comparison we have plotted a smoothed version of the global histogram (dashed line) and a smoothed version of the actual histogram (solid line). These histograms represent estimates for the underlying probability density distribution and can thus tell us something about the risk and possible amount of the short-lifting. A comparison with a reference
histogram, like the global histogram, would provide the desired benchmark.
From Figure 1 it can be seen that none of the histograms seem to come from a normal distribution (the implications of this observation will not be further discussed in this paper). This can be confirmed by means of a probability plot. The different geo-regions also show significant differences in their density reporting practice. Canada & US West Coast appears better than the global average, the peak of the histogram is centered at 0 and has shorter tails. For South Asia, the width of the histogram is similar to the global one, but its center is shifted towards short-lifting, whereas the Middle East shows a fairly heavy short-lifting tail. The histogram for South America West is especially remarkable as the chance of actually getting the fuel density stated by the supplier appears to be slim. The rule is rather that the buyer is either short- or long-lifted, something which could not be deduced from the standard descriptive statistics.
Figure 1: Probability distribution of density reporting deviations (i.e. the difference between claimed and
measured density) for 4 selected geo-regions. The histograms are (clockwise from top left): Canada & US
West Coast, South Asia, Middle East, and South America West Coast. The solid lines represent the
smoothed histogram while the dashed lines are the smoothed global histogram. The underlying number of
samples, averages, medians, and standard deviations are given in Table 1. The histograms reveal
considerable variation in density reporting.
Histograms for individual suppliers listed in Table 1 are shown in Figure 2 below. A visual comparison indicates that Supplier 1 is much better than the global average with a narrow symmetric distribution centered at 0. The three other suppliers are all heavily short-lifting with varying degrees of right-shifted and/or right-heavy distributions. Based on these histograms the suppliers might be characterized as rather bad, but any fine grained information about their underlying reporting strategy is removed by the histogram. A main disadvantage of using histograms for characterizing suppliers is that they require a considerable amount of data which could be a challenge when considering short time periods or suppliers with few data samples.
p. 4
Figure 2: Probability distribution of density reporting deviations (i.e. the difference between claimed and
measured density) for 4 selected suppliers in 4 different bunker ports (for more details se Table 1). The
histograms reveal different reporting behavior, but histograms become noisy when the number of samples
becomes too low.
Scatter plots
Scatter plots of measured vs. claimed density allows a much more fine grained view on the underlying data. These plots may be used to unravel the various reporting strategies of the suppliers, see Figure 3
and Figure 4. Scatter plots quite effectively visualize the density reporting behavior of suppliers or groups of suppliers. Note that each dot in a scatter-plot represents at least one bunkering sample. The diagonal solid line represents correct density reporting (i.e. stated = measured, in the following called no-cheat line). The horizontal and vertical dashed lines specify the upper density limit given by the ISO8217 standard.
These scatter plots exhibit some interesting observations. Note that the range of densities of the available fuel varies between geo-regions; e.g. the fuel density range is much wider in the Middle East than in North America or South Asia. This phenomenon may be traced back to the proximity to crude oil production in the regions.
Observe also that in many bunkerings the fuel density was above the limit (dots to the right of vertical dashed line) but almost none of them were reported to lie above the limit (above horizontal dashed line). This is true for all suppliers.
From Figure 4 we may deduce that Supplier 1 could be considered as rather good, since most of his samples are on or close to the no-cheat line. This behavior seems to be dominant for most of the suppliers in the Canada & US West geo-region (note: good suppliers are found in all geo-regions). In contrast, Supplier 2 may be regarded as bad, since his stated densities cover the whole range from the no-cheat line and all the way up to maximum-cheating, i.e. the upper density limit given by the standard. This type of behavior is also visible both in the South Asia and the Middle East scatter plots.
It seems that Supplier 3 has a strategy of simply adding an offset to the real density, which is reflected in the mean density different from zero and a relative low standard deviation. A fourth reporting scheme appears in Supplier 4 who has a tendency of always stating a density near the limit – independently of the actual density. This could be termed as the worst behavior since they short-lift as much as possible. This behavior is not uncommon in South Asia and the Middle East. Variations to this scheme, i.e. stating a fixed fuel density but lower than the limit, are seen in Asia, Middle East and South America West. They appear as horizontal lines in the scatter plot.
Figure 3: Scatter plot of measured vs. claimed density for the same geo-regions as in Table 1 and Figure 1.
Each black dot represents (at least) one bunkering. The solid line represents the no-cheat line, i.e.
bunkerings where the supplier states the density correctly (claimed = measured), whereas the dashed lines
indicate the upper density limit in the ISO standard for bunker fuel (ISO8217), viz. 991 kg/m3, implicitly
giving the maximum possible amount of cheating. Many dots along the upper dashed line indicate a high
degree of cheating in many bunkerings. Note that in many bunkerings the fuel density was above the limit
(dots to the right of vertical dashed line) but almost none of them were reported to lie above the limit
(above horizontal dashed line).
Figure 4: Scatter plot of measured versus claimed density for the same suppliers as in Table 1 and Figure
2. Supplier 1 reports quite honestly as his dots are scattered close along the no-cheat line. In contrast,
Supplier 2 and 3 have many reportings away from this no-cheat line but they are not as dishonest as
Supplier 4, who basically reports only one density close to 991 irrespective of the actual fuel density.
p. 5
3. The Good: Best practice benchmark The above discussion has emphasized the need for a good benchmark for measuring the goodness in density reporting, and for distinguishing between various short-lifting and long-lifting strategies.
The scatter plots of Canada & US West Coast and Supplier 1 are examples of good density reporting behaviors that could be used as best practice references. Our interpretation of good or best practice is indicated by the grey diagonal area around the no-cheat line in Figure 5. Fair reporting and good control of the delivered density should result in a small symmetric scatter around the no-cheat line, and thus a narrow density difference (dd) histogram centered at dd = 0 (like the one for Supplier 1 in Figure 2).
The goal is to establish a best practice, and then use it as a predefined reference to which bunkerings may be compared. This best practice benchmark is given by the dd-histogram for a group of selected good suppliers.
Figure 5: Scatter plot of bunkering data from South Asia. Data points around the diagonal line (no-cheat
line) indicates good or best practice behavior, i.e. fair reporting, with little or no cheating. In the area
above the no-cheat line, customers get short-lifted (pay too much) whereas below the line the supplier loses
money. The more dots there are above the fair line, and the further away from it they are, the less
accurate the density reporting. Bunkerings far below the fair area should be considered suspicious and
may indicate a bribing situation. Reportings in the grey horizontal area (reporting densities close to the
upper density limit) indicate that some suppliers consciously choose a strategy of maximum density
cheating. A close up of the scatter plot near the density limit = 991 kg/m3 reveals that hardly any suppliers
are willing to state that their fuel exceeds the limit even when this is clearly the case.
This best practice histogram shall represent good suppliers and should be based on many data points. Any outliers, intentional cheating, or other indications of dishonesty should be eliminated to obtain an unbiased and fair benchmark. The following criteria for deriving the best practice benchmark should therefore be chosen (there will always be a certain element of subjective judgment in this process, but the method for deriving the benchmark should as far as possible be transparent, sound, and unbiased):
1) Select some geo-regions where the scatter plots show that data are predominantly found along the no cheat line.
2) For each selected dataset we:
a. Eliminate extreme outliers, max cheating and near limit lying; only data inside a predefined area around the no-cheat line is selected (see Figure 6 for details).
b. Eliminate any bias by centering the dd data around dd = 0.
3) The adjusted and selected dd data for all the selected sets are then merged into one large dataset.
4) Calculate the dd histogram for the dataset.
Figure 7 shows the best practice reference histogram derived from the geo-regions Biscay, Canada & US East Coast, Canada & US West Coast, US Gulf Coast, and Oceania.
Figure 6: Only bunkering samples between the 2 blue solid lines will be used as basis for deriving the best
practice benchmark histogram. This effectively eliminates max cheating, outliers, and ‘near limit effects’,
i.e. less than complete honesty when selling too heavy fuel. The upper solid line divides the angle between
no-cheat and max-cheat lines. The lower solid line is simply mirrored around the no-cheat line such that
the density deviations are the same above and below, i.e. |+����| = |-����|.
p. 6
Figure 7: Best practice dd histogram based on samples from selected geo-regions (Biscay, Canada
& US East and West Coast, US Gulf Coast and Oceania) where max cheating, outliers and near
limit dishonesty have been eliminated. The dashed line is the histogram function H, i.e. a
smoothed version of the histogram indicating the global best practice.
Classification by membership function
Once the best practice histogram is generated, the challenge is to benchmark a supplier, a port, or a region against it. In principle, this histogram must be compared with the dd histograms for the suppliers in question and the degree of conformance would then give the desired benchmark. Unfortunately this is a non-trivial task and for many of the suppliers only relatively few samples are available, resulting in bad histograms. We therefore propose a more elegant approach that is insensitive to the number of data points and outliers, and that can even be used for a single bunkering.
The concept of a membership function (Turksen 1991; Terano et al 1987, p. 21), which is widely applied in Fuzzy set theory (Lowen 1996, Self 1990), is used to achieve this benchmarking. A single number (score) is computed denoting the goodness of a specific bunkering or supplier.
An example will hopefully make this clear. Consider the task of benchmarking people into fast and slow runners, respectively. One way to do this is to set a threshold T on how fast a person should be able to run 100 m, and then categorize the people who run slower than the threshold as slow (=0) and those who run faster than the threshold as fast (=1). This sorting is achieved by a Boolean membership function B with threshold T for the measured time t on 100 m, i.e. B(T,t). However, it is quite obvious that this benchmarking will result in a crude oversimplification as there is a continuous transition from extremely fast runners to the really slow ones, and a small change in the chosen threshold could seriously alter the number of members in each category. A better approach would be to replace the Boolean function with a continuous function, assigning a continuous membership value between 0 and 1 depending on how fast they run. This is an example of a so-called membership function, and will in the following simply be denoted m.
The situation is analogous to our best practice density benchmark where suppliers (or bunkerings) are not grouped into crisp sets of good and bad but rather get a score indicating how close to or far away from the best practice they are. This, by the way, is also the reason why e.g. discriminant analysis (Hastie et al 2009) is unsuitable for the task at hand.
The challenge is to find a membership function for the good group, faithfully reflecting what we consider to be good. Fuzzy set theory does not provide help in determining the membership function, as all kinds of functions are used, e.g. triangular, trapezoid, Gaussian, etc. The discussion of good behavior above gives us some hints about the properties of the desired membership function. It should not be too wide, as a bad bunkering could then be regarded as good. Likewise, if it is too narrow then a good bunkering would get a too low goodness score. It is important that the membership function represents the best practice set as well as possible. The obvious choice is to derive the membership function directly from the dd histogram itself.
The membership function for good bunkerings, mG, must have a maximum value of 1 at dd = 0, i.e. mG(dd=0) = 1, and is continuously decreasing in both directions, i.e. a rescaling and shift of the H
histogram has to be done. We therefore propose the following definition of the membership function:
)0(
)(
)max(
)()(
H
ddH
H
ddHddmG ==
where the subscript G indicates that this gives a goodness scoring, and H is the smoothed (and adjusted) best practice histogram (i.e. H is the histogram function). Note that mG is a function of the distance of dd to 0, as well as the frequency of dd in the best practice. This membership function can now be applied e.g. to all n supplier samples to obtain the overall goodness benchmark,
p. 7
∑=
⋅=n
i
iGG ddmn
b1
)(1
where the summation is done over all n bunkerings for a specific supplier, port, or geo-region.
An interesting observation is that the scoring from the membership function mG(dd) is not (a priori) a
probabilistic measure, it is a measure (0→1) based on how far away a variable is from some value, i.e. dd=0; see Figure 8. However, this rescaling does preserve an interesting probabilistic feature, viz. the following: the probability of finding a value x in a small interval around dd, relative to that of finding a value y in an equally sized interval close to 0, given that the samples are drawn from the best practice group.
Figure 8: The solid line gives the goodness membership function, mG, which is a scaling of the best practice
histogram. mB = 1-mG gives the membership function for the opposite (dashed line), i.e. bad which in turn
could be divided into a long- and short-lifting part, mLL and mSL respectively (corresponding to negative
and positive dd values). E.g. a bunkering with dd=2.3 would get a good score of mG=0.23 and a bad score of
mB=0.77 (with mLL=0 and mSL=0.77).
The Bad
Note that mG(dd) was derived based on what was chosen to be the best practice. It therefore gives a measure/score for how good a bunkering or supplier is with respect to this best practice. The complementary,
mB(dd) = 1 - mG(dd),
give a badness scoring but it will not tell weather the bad scoring comes from short- or long-lifting. Fortunately, mB can, depending on whether a sample falls into the short- or long-lifting domain, be further divided into mSL and mLL. That is, if the dd value of a sample is positive, its mSL will be greater than zero; if the dd value of a sample is negative, its mLL will be greater than zero.
This enables us to calculate short- and long-lifting scores similar to the goodness score:
∑=
⋅=n
i
ixLxL ddmn
b1
)(1
,
where the subscript xL should be SL or LL, which stands for short- or long-lifting, respectively. These scores indicate the behavior of a supplier and give the risk of being short- or long-lifted. Note, by definition:
bG + bSL + bLL = 1
Remember that the scores correspond to the degree of membership, i.e. how close a bunkering is to the good or bad benchmark, they can therefore be understood as weights corresponding to the proportion of good or bad.
The Ugly
As pointed out above, profit maximization by reporting densities at or close to the upper limit may be considered as fairly ugly behavior. The same methodology can be applied to obtain a near limit score for this behavior by constructing a membership function
mNC(claimed density) = mG(claimed density - 991)
where the subscript NC denotes Near Ceiling.
This membership function assigns a scoring to a bunkering corresponding to the distance from the density limit and frequency of occurrence in the benchmark. To avoid categorizing a bunkering as ugly when the measured density is actually near the limit, we employ a convolution of mNC and mSL. In
p. 8
so doing we exclude all reportings that are near the limit but that are actually honest. We propose the following ugly or near limit benchmark
∑=
⋅⋅=n
i
iNCiSLNC densityclaimedmddmn
b1
)()(1
giving the fraction of short-lifting that could be considered as near limit reporting.
Further characterization of Good and Bad
In order to further characterize bunkering samples within the good-, short-, or long-lifting region in the scatter plot, the average density deviations in each region could be computed by weighting each bunkering sample with the corresponding score from the membership function. For instance, the mean
density difference ( SLdd ) in the short-lifting area is:
( ) ( )
( )∑
∑ ⋅
=
i
iSL
i
iSLi
SL
ddm
ddmdd
dd
in kg/m3, where the index i runs over all samples n.
This means, for a given supplier we can provide information about the risk of being short-lifted, bSL,
and about the expected average amount in density difference, SLdd . The method is easily extended to the other identified behaviors.
4. Application of the benchmarks As discussed above the power of the scatter plot lies in the visualization of the different density reporting schemes. Several patterns, like fixed value density reporting, systematic density reporting deviations, etc., are easily spotted. The benchmarks developed above are constructed to discriminate between some of these different reporting schemes, and to quantify the risk of being short-lifted as well as the amount of short-lifting that should be expected. The benchmarks for our examples from Table 1 are given in Table 2 below.
Table 2: Standard descriptive measures together with our benchmark(s) for the geo-regions and suppliers
from Table 1. The benchmarks for the data that were used to generate the best practice histogram are also
included for comparison. A row, e.g. Global, is read as follows: average density difference is 0.39, std=3.92.
Benchmarking against the best practice gives the following results: 43% of the samples can be regarded as
good (bG), 31% qualify as short-lifting (bSL), and 26% as long-lifting (bLL). For the short-lifting samples the
average density difference is 3.31, but only 7% of them were near the ceiling.
dd (kg/m3)
σdd bG bSL bLL bNC SLdd (kg/m3)
Best Practice 0.05 1.16 0.62 0.19 0.19 0.01 1.50
Global 0.39 3.92 0.43 0.31 0.26 0.07 3.31
Canada & US West Coast 0.03 2.43 0.55 0.22 0.24 0.02 2.09
South Asia 1.22 3.35 0.41 0.52 0.07 0.26 2.44
Middle east 1.83 4.76 0.32 0.49 0.19 0.02 4.61
South America West 0.48 6.00 0.08 0.42 0.50 0.00 3.73
Supplier 1 0.12 0.95 0.71 0.09 0.20 0.02 1.70
Supplier 2 2.31 4.84 0.36 0.53 0.11 0.13 4.65
Supplier 3 2.40 1.83 0.09 0.87 0.03 0.00 2.81
Supplier 4 2.07 2.81 0.27 0.72 0.01 0.46 2.64
p. 9
The samples used to generate the best practice histogram were included in the table for easy comparison. Note that the only way the good score can be 1 is when all samples are at dd=0, this explains why even the good score of the best practice is ‘only’ 0.62. The table shows that for the selected geo-regions the highest risk of being short-lifted is found in South Asia. The near-limit benchmark, bNC, confirms what is apparent from the scatter-plot (Figure 3), that for many suppliers it is a common practice to maximize their profit by just reporting a fuel density at or near the limit.
South America West nicely illustrates the strong ability of the benchmark to identify the underlying behavior. Recall that for this area the mean was near zero, but the high standard deviation suggested large fluctuations in their reporting. Even so, no indications about the underlying reporting schemes, or the risk of being short- or long-lifted, can be deduced. In contrast, our benchmark reveals that the likelihood of actually getting what you paid for is rather slim, viz. around 8%. In the vast majority of the cases either short- or long-lifting takes place.
Observe also that Supplier 1 can indeed be regarded as honest with a good score higher than best practice. Supplier 2 and 3 have comparable average density differences but their good and near limit benchmarks clearly separates them. A comparison of the benchmarks with the corresponding scatter plots will confirm that the benchmarks do indeed give a more accurate description of the honesty of suppliers than standard descriptive statistics.
Figure 9: Comparison of different benchmarking methods: suppliers ranked based on their mean density
difference, dd , (top), and their corresponding good score, bG (bottom). Observe that ranking with respect
to the mean would result in about 1057 good suppliers (| dd | ≤≤≤≤ 0.7). Our scoring with respect to best
practice, (0.62), reveals however that about 150 are definitively bad (left-hatched area), even below global
average (0.43). 539 are rally good (equal to or better than best practice, right-hatched area) whereas the
rest are located between global average and best practice. Observe also that simply relying on the mean to
characterize suppliers would label several of them as bad even though their good score is above global best
practice.
Supplier ranking
In Figure 9 (top) all suppliers of RMG380 fuel worldwide are ranked with respect to their mean
density difference, dd . When using |dd | ≤ 0.7 as a criterion for goodness then the mean would imply
there are about 1057 good suppliers. Applying this mean dd to our benchmarking method results in
the continuous bell-shaped curve (blue). If dd is indeed an unbiased measure for the goodness of suppliers, then their scorings should be closely scattered around this curve – this is, however, not at all the case. This discrepancy stems from the unreliability of the mean (or standard deviation) as a trustworthy measure whenever the underlying distributions are non-normal or outliers have a large effect. The figure visualizes clearly that 150 of the apparently good suppliers are actually quite bad, i.e. even below global average (left hatched area), whereas just about the half (539) can be considered equal to or better than best practice (right hatched area). Observe also that many of the apparently bad
suppliers (those with | dd | > 0.7) are actually better then their reputation as most of them are above the bell shaped curve, some are even above best practice – further emphasizing the need for an unbiased score like bG.
Development over time
Following the development of the score of a supplier, port, or region over time may give valuable indications about what may be expected in the near future. For instance, Figure 10 shows the development of the bG score for two major ports, Singapore and Rotterdam, over the past 25 years.
p. 10
Figure 10: Time series of goodness scores bG for two large ports in different geo-regions. Data from all
available suppliers are included. Dots are quarterly time intervals while the stippled lines are year
averages. Each dot is based on a varying number of ‘raw data points’, i.e. the number of bunkerings
during the corresponding time interval.
Observe that from the beginning of the 1980s and up to the mid 1990s the quality of the density reporting was increasing. It then leveled off until 2008, when a change in behavior occurred – perhaps triggered by the onset of the global recession?
5. Discussion and concluding remarks This paper has two main focus areas: the construction of a realistic benchmark and the development of a methodology that allows comparing one or more samples with the benchmark.
The examples given above demonstrate the capabilities of our approach. It is more powerful than
standard descriptive statistics (e.g. dd and σdd), as it is less sensitive to outliers and is well suited for small datasets and even single numbers. Recall that our benchmarks give better quantifications than
the dd and σdd together. Further, it makes no assumptions about the data distributions. There are actually no restrictions to the probability distribution of the underlying data – any distribution is allowed. Only some weak requirements apply to the membership function (e.g. increasing/decreasing). The methodology is quite generic and could in principle be applied to any kind of comparison task, i.e. benchmarking.
The fact that the benchmark is based on a probability density function, and that a probabilistic interpretation of the scoring is possible, is an aid to the user’s intuition, making it easier to understand and interpret the results.
Once a best practice histogram has been generated, a membership function can be derived, after which benchmarking is easily done. Subjectivity is only involved in the definition of what can be regarded as best practice, as there is no a priori correct answer to this problem. Our approach has been to ask: what should be expected of a good supplier? And by answering this question we have picked suppliers that best match our expectations. Outliers and incorrect claims near the density limit are of course not wanted from a good supplier, hence their removal from the best practice data set.
From a user perspective the main strengths of the presented benchmark are:
• Institutive and easy to understand.
• Applicable for few or even singleton samples.
• Able to pinpoint different density reporting schemes.
In closing let us return to the extent and amount of global short-lifting which is estimated to be around 1.7 ton per bunkering on average. Thanks to our benchmarking methodology we can now provide a more detailed picture of the situation. First, 43% of the bunkerings could be considered to be loss neutral (bG=0.43), since they are within best practice. Second, 26% are instances of long-lifting (bLL=0.26), where the buyer gains on average 1.8 ton. Third, 31% could be regarded as short-lifting (bSL=0.31), with an average buyer loss of 2.5 ton per bunkering. This highlights the importance of choosing the right supplier.
The presented benchmark methodology is easily extendable to other (quality and economical) bunkering parameters like viscosity, sulfur or water content, as well as a series of physical and chemical properties. The methodology will be the basis for a benchmarking web tool, scheduled for release by DNVPS later this year.
Figure 11: Bunker surveyor on board a ship. Photo by DNV Petroleum Services (used with
permission).
p. 11
References
Bhattacharyya, G., Johnson, R. (1977), Statistical Concepts and Methods, Wiley, New York.
DNV (2010). Total fuel management, http://www.dnv.com/industry/maritime/servicessolutions/fueltesting (accessed 13. Oct. 2010).
EPA (2008), Global Trade and Fuels Assessment -Future Trends and Effects of Requiring Clean Fuels
in the Marine Sector. Assessment and Standards Division Office of Transportation and Air Quality, U.S. Environmental Protection Agency. EPA420-R-08-021, November 2008.
Eyring, V., Isaksen, I.S.A., Berntsen, T., Collins, W.J., Corbett, J.J., Endresen, O., Grainger, R.G., Moldanova, J., Schlager, H., Stevenson, D.S. (2010), “Transport impacts on atmosphere and climate: Shipping”, Atmospheric Environment, Volume 44, Issue 37, December 2010, pp. 4735-4771.
Hastie, T., Tibshirani, R., Friedman, J. (2009), The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (second edition). Springer, New York.
IEA (2010). World Energy Outlook 2010. International Energy Agency, OECD Publishing, Paris.
IMO (2009). Prevention of Air Pollution from Ships. International Maritime Organization, Marine Environment Protection Committee. MEPC 59/INF.10, 9 April 2009.
Lowen, R. (1996), Fuzzy Set Theory, Kluwer Academic Publishers, Dordrecht.
Self, K. (1990), “Designing with fuzzy logic”, IEEE Spectrum, Vol 27, No 11, November 1990, pp. 42-44, p. 105.
Terano, T., Asai, K., Sugeno, M. (1987), Fuzzy Systems Theory and its Applications. Academic Press, San Diego.
Turksen, I.B. (1991), “Measurement of membership functions and their acquisition”, Fuzzy Sets and
Systems, Vol. 40, pp. 5-38.
Figure 1:
Figure 2:
Figure 3:
Figure 4:
Figure 10:
Figure 11:
961
971
981
991
961
971
981
991
Me
asu
red
de
nsi
ty
Claimed densitym
ax.
ch
eat
are
a
Su
sp
icio
us
Bad
Go
od
Limit
Lim
it
Fig
ure
5
Lim
it
Lim
it =
max. ch
eat
lin
e +� ��� -� ���
=
no
cheatline
Fig
ure
6
Probability
Den
sit
y
de
via
tio
ns
Fig
ure
7
1
de
ns
ity d
iffe
ren
ce
dd
= 2
.3
Good: m
G=
0.2
3
Bad: m
B=
1-0
.23
= 0
,77
mB
=1-m
G
mG
0
Short
-lifting
Long-lifting
Fig
ure
8
0
0,2
5
0,5
0,7
51
0500
1000
1500
2000
2500
Be
st
pra
cti
ce
sc
ore
Glo
ba
l a
ve
rag
e s
co
re
Man
y “
go
od
su
pp
liers
”are
actu
ally q
uit
e b
ad
!
150
539
So
me “
bad
su
pp
liers
”are
actu
ally v
ery
go
od
!
Good scoreclaimed –measured density
-505
10
05
00
10
00
15
00
20
00
25
00
0.7
-0
.7
tota
l num
ber
of supplie
rs
Ca
. 1057
su
pp
lie
rs
So
me “
bad
su
pp
liers
”are
actu
ally s
lig
htl
y b
ett
er
!
Fig
ure
9