Upload
vanderbilt
View
0
Download
0
Embed Size (px)
Citation preview
Electronic copy available at: http://ssrn.com/abstract=916605
A LATENT CLASS MODEL
OF EARNINGS ATTRIBUTES
June 12, 2008
Paul K. Chaney*
Bruce Cooil
Debra C. Jeter
401 21st Ave S Owen Graduate School Vanderbilt University
Nashville, TN 37203-2422 (615)322-2685
* Corresponding author ([email protected]). We thank Hui Chen, Bill Christie, Mara Faccio, Jennifer Francis, Craig Lewis, Ron Masulis, Hans Stoll, Bob Whaley and seminar participants at McMaster University and Vanderbilt University for helpful comments, and we acknowledge the financial support of the Dean’s Fund for Faculty Research at Vanderbilt University. JEL classification: M41 Keywords: Earnings Attributes, Accruals, Conservatism, Income Smoothing, Earnings Persistence
Electronic copy available at: http://ssrn.com/abstract=916605
A Latent Class Model of Earnings Attributes
Abstract
While the term earnings quality has been overused, rarely defined, and largely misunderstood, there is general agreement that users of financial statements react to their content, and that the accuracy and credibility of that content matters to them. To differentiate among financial reports, many papers rely on the association between one specific earnings attribute (such as accruals variability) and some observable outcome (auditor choice, for example). A notable exception, Francis, LaFond, Olsson and Schipper (2004) examine the relation between the cost of capital and several earnings attributes, including both accounting-based characteristics and market-based attributes. Whether each of these attributes provides incremental information for users of financial data beyond the others has not yet been established.
We draw upon Dechow and Dichev (2002), Francis et al. (2004), and Khan and Watts (2007) to identify potentially informative reporting dimensions or earnings attributes. We use a latent class analysis to show empirically that our measures for Accrual Variability, Persistence, Smoothness, Predictability, and Conservatism all have significant incremental value in differentiating among firms. The best across-firm latent class model also allows us to identify and study six categories of firms that differ significantly in terms of these five distinct measures. We summarize how these groups of firms differ in terms of both the attributes themselves and various innate firm characteristics. We also show how this new classification of firms is related to a principal component analysis, and we seek the best within-group model for each attribute in terms of innate firm characteristics. We find that two primary principal components provide useful and complementary summaries of the primary dimensions of the attributes examined.
A LATENT CLASS MODEL OF EARNINGS ATTRIBUTES
1. Introduction The credibility and usefulness of reported earnings as a measure of firm
performance have long been of interest to researchers, and never more so than in the wake
of the accounting scandals near the beginning of the millennium. Most often, this illusive
quality is measured using a single dimension such as the magnitude or variance of
discretionary accounting accruals. In this paper, we use a latent class analysis to form
clusters of firms based on several earnings attributes identified in previous studies, as well
as a principal components analysis based on the same attributes. The latent class model
offers certain advantages over other techniques, including the principal component
analysis, because it allows us to examine the inter-dependencies among multiple earnings
attributes, as well as how they are related to firm characteristics, and to use all the measures
together in determining various classifications of reporting credibility. Our research
objective is to utilize the incremental content in multiple measures of reporting attributes to
improve our understanding of financial reporting by firms, to identify statistically distinct
groups of firms based on their reporting attributes, and to study how these groups differ in
terms of innate firm characteristics.
Assuming the perspective of the analyst, Dechow and Schrand (2004) suggest that
earnings of “high quality” should accurately reflect the current performance of a firm,
while also providing useful information about its future and summarizing its value. They
go on to state that an earnings amount that annuitizes expected future cash flows should be
1
both persistent and predictable, but that these attributes alone do not guarantee high quality
earnings. For example, adoption of certain accounting methods might increase the
persistence of earnings while reducing its usefulness (Chamberlain and Anctil, 2003).
Many researchers rely upon the magnitude or variability of abnormal or discretionary
accruals alone to capture, or proxy for, the quality of reported earnings (Teoh, Wong, and
Rao, 1998; Shivakumar, 2000; DeFond and Jiambalvo, 1994; DeFond and Park, 2001;
DeFond and Subramanyam, 1998; among others). Dechow and Schrand (2004) remark
that large accruals can indicate large underlying volatility of operations and thus low-
quality earnings even in the absence of opportunistic behavior. Proper use of accrual
accounting should serve to smooth the variability of cash flows to the extent that the
variability is not reflective of underlying firm performance.
While the term earnings quality has been overused, rarely defined, and largely
misunderstood, there is general agreement that users of financial statements react to their
content, and that the accuracy and credibility of that content matters to them. Francis et al.
(2004) examine the relation between the cost of capital and seven earnings attributes,
including four accounting-based attributes and three market-based attributes. Ecker, et al.
(2006) present evidence that investor perceptions, as measured by the e-loading from
regressions of daily excess returns on a factor-mimicking portfolio designed to capture
earnings quality, are generally consistent with the measures used in Francis et al. (2004).
Ecker et al. (2006) point out that any single measure is unlikely to be best in all research
settings for capturing earnings quality. We draw upon Dechow and Dichev (2002), Francis
et al. (2004), and Khan and Watts (2007) to identify potentially informative reporting
dimensions or earnings attributes. We perform a latent class analysis first to identify those
2
attributes with the greatest explanatory power in classifying firms into reporting categories
and to find the best possible model based on those attributes that is also supported by the
residual diagnostics. We also apply a principal components analysis (PCA) and show how
our classifications relate to the PCA, and we describe the differences between the two
techniques.
Our latent class cluster (LCC) analysis enables us to classify firms into groups
based on several measures of earnings attributes (Vermunt 2003). This approach is, in some
respects, analogous to that used by Larcker and Richardson (2004) in their examination of
the relation between unexpected accruals and audit and non-audit fees. Among the various
forms of latent class (LC) analysis that have been widely used in social science research,
(Lazarsfeld 1950, among others), the latent class cluster (LCC) model is a specific
extension that allows us to classify firms into clusters or groups based on a multivariate
dependent variable consisting of several measures of earnings attributes. The number and
statistical characteristics of the groups into which these firms are classified are assumed to
be “latent,” or unknown a priori (Vermunt and Magidson 2002, Everitt 1993). Those firms
belonging to the same cluster are similar with respect to the various measures of earnings
attributes in that their observed values come from the same probability distributions (with
unknown parameters).
In our context, a statistical model permits us to assess the likelihood that there are
groups of firms with distinctly different reporting attributes. We make no predictions a
priori as to the number of groups or whether, indeed, there are distinct groups at all. There
are several advantages to using a statistical model to allocate firms to clusters or categories.
First, the choice of the cluster criterion is less arbitrary than non-model based methods,
3
including typical distance-based clustering approaches. Second, LC clustering allows both
simple and complex distributional forms for the observed variables, and formal tests can be
applied to check the validity of the parameters. Third, this model-based cluster analysis
provides a way of studying the dependencies among the different measures of earnings
attributes and the relationship between each earnings attribute measure and each innate
firm characteristic. Finally, this analysis accommodates differences across industries.
We present evidence of the existence of six categories of firms with distinctly
different earnings attributes. We contribute to the literature by creating a classification of
firms based not on a single, but multiple dimensions of reporting characteristics. We
compare this approach to a straightforward principal components analysis and demonstrate
that, while both have advantages over the use of a single earnings attribute, the LCC offers
additional advantages over the PCA. For instance, the LCC provides a richer description
than any single indicator or principal component measure of how the earnings attributes
differ across firms and industries. By using LCC, we draw maximum information from the
earnings attributes to identify the best possible model.
The remainder of the paper is organized as follows. In section 2, we describe the
measures for the various earnings attributes, their relation to information risk, and some
innate determinants of earnings attributes. In this section, we also discuss the advantage of
a multi-dimensional approach over the use of a proxy that captures only one reporting
aspect. Our preliminary analysis is presented next (section 3), in which we describe our
data set, present descriptive statistics, and explain the transformation of our measures to
their normalized forms. Section 4 describes our two-level latent class cluster analysis, and
Section 5 presents our results for the cluster analysis. Section 6 concludes.
4
2. Review of the measures, their relation to information risk, and innate determinants
We begin our analysis with proxies for various earnings attributes described in
Dechow and Dichev (2002), McNichols (2002), Francis et al. (2004), and Khan and Watts
(2007). We follow the definitions adopted in these papers, not because these measures are
necessarily optimal (though they are certainly reasonable), but for the sake of consistency
and because our contribution lies elsewhere. In particular, in this paper, we do not aim to
improve upon the measurement of the attributes individually, but rather to aggregate them
in such a way as to draw the optimal information from their correlations, their magnitude,
their variability, etc. Also, consistent with Francis et al. (2004), all of the measures below
are defined and interpreted in the exact reverse of their labels. For example, a high
Smoothness measure means that earnings are more variable than cash flows; thus, a high
value represents less smooth earnings; a high value for Persistence means less persistent
earnings; etc. All the measures are computed for each firm over a ten-year period.
2.1 Accrual variability
It is fairly widely accepted that reported earnings are at best a noisy proxy for
economic earnings, where economic earnings reflect the “true” performance of the firm
(Choi and Jeter, 1992). In assessing how closely reported earnings approximate economic
earnings, cash flows from operations is sometimes used as an alternative, albeit, imperfect
measure of true or economic earnings, since economic earnings are unobservable. Because
reported earnings and cash from operations (CFO) differ by the amount of reported
accruals (as well as depreciation, amortization, etc.), many researchers focus on the
magnitude or variability of accruals to assess their usefulness. More specifically, Accrual
5
Variability is measured as the standard deviation of a given firm’s (j) residuals from a
regression of total current accruals (TCA) on cash flow from operations (CFO) from prior
(t-1), current (t), and subsequent (t+1) periods, measured yearly and using ten-year
windows. McNichols (2002) provides evidence that the residuals from the Dechow and
Dichev (2002) model are correlated with the “change in sales” variable included in the
Jones model (Jones, 1991). Also, in the McNichols’ model both property plant and
equipment (GrossPPE) and the change in sales (ΔSales) are still significant after controlling
for cash flows. Further, Larcker and Richardson include a book to market ratio
(BVEquity/MVEquity) in their model of accruals. Thus, we estimate the following
regression by industry (industry subscripts omitted) and year:
.654
1,3
,2
1,10
jt
jtjt
jtj
jt
jtj
jt
jtj
jt
tjj
jt
tjj
jt
tjjj
jt
MVEquityBVEquity
AssetsGrossPPE
AssetsSales
AssetsCFO
AssetsCFO
AssetsCFO
AssetsTCA
εααα
αααα
+++Δ
++++= +−
(1)
Accrual Variability, for each firm, is computed as the standard deviation of each firm’s
residual from equation (1) in each of the ten years.
2.2 Persistence
In the Conceptual Framework of the Financial Accounting Standards Board
(FASB), the objectives of financial reporting are stated to include providing information
that is useful for decision making and for assessing future cash flows (Statement of
Financial Accounting Concepts No. 1). In July 2006, in a joint project with the
International Accounting Standards Board, the FASB reiterated these objectives.1 It is this
focus on the future, rather than the present, that creates a belief, upheld by research, that
1 Financial Accounting Series No. 1260-001, Preliminary Views, Paragraphs OB2-OB3 (Norwalk, Conn.: FASB, July 6, 2006).
6
earnings are a better predictor than current cash flows of future cash flows. Hence, the
extent to which the information in earnings persists into future periods is crucial to its
usefulness.
Persistence is the extent to which current period earnings are reflective of future
periods as well as the current period. In theory, a component of current period earnings
(for example, an upswing in sales revenue) is persistent if it is sustainable in future periods.
This attribute is sometimes measured (Lev 1983; Ali and Zarowin 1992, for example) as
the estimated autocorrelation at lag 1, based on a first-order autoregressive model (AR1) of
annual earnings before extraordinary items (in essence, a regression of current period
earnings on prior period earnings):
jttj
tjjj
jt
jt
SharesCommonExtrabeforeIncome
SharesCommonExtrabeforeIncome
νφφ +⎟⎟⎠
⎞⎜⎜⎝
⎛+=
−
−
1,
1,10 . (2)
We use the negative of the AR1 parameter ( jtφ ) as our measure of Persistence.
2.3 Predictability
FASB identifies relevance as one of the primary characteristics of desirable
financial data (FASB Concept No. 2), and predictive value as one of the components of
relevant information.2 We measure Predictability using the standard deviation of the errors
in the autoregressive model (AR1) used to measure Persistence (equation (2); see Lipe
1990 and Lee 1999),
Predictability )ˆ( jtνσ= . (3)
2 In the Preliminary Views expressed by FASB in its reconsideration of the conceptual framework, the Board noted that predictive value and predictability are not identical terms. The Board further indicated that the term is one that needs further attention or clarification.
7
2.4 Smoothness
Smoothness may be viewed as a desirable attribute if we believe that managers use
the discretion available to them to smooth out the nonrecurring fluctuations (Trueman and
Titman 1988) rather than to misrepresent current (and future) performance (and
expectations). Managers are assumed to have private information enabling them to
distinguish between permanent and transitory components of earnings (Chaney, Jeter, and
Lewis 1998), not easily visible to outsiders, and they exercise their discretion to convey
that information more accurately rather than for personal gain. To the extent that managers
are assumed not to use their discretion to affect the pattern of reported earnings, then
Smoothness of earnings will also capture the differences among firms in terms of the actual
variability of earnings relative to cash flows. In either case, firms with smoother earnings
should be easier to predict, should have a greater proportion of permanent rather than
transitory components, etc. Thus we see the connection among our various attributes.
Whether or not each of these attributes provides incremental information beyond the others
is assessed in subsequent sections of this paper.
Smoothness is measured as the firm’s standard deviation of scaled earnings (before
extraordinary items) divided by the standard deviation of scaled CFO (Leuz et al. 2003;
Hunt et al. 2000),
Smoothness( )
( )jt
jt
CFOExtra before IncomeNet
σσ
= (4)
2.5 Conservatism
Conservatism was identified in SFAC No. 5 as one of the constraints in recognition
and measurement. The Conservatism concept is explained as favoring a choice that avoids
overstating assets or profit when in doubt, but not as an endorsement for deliberate
8
understatement. To describe this attribute in an empirical setting, we turn to the change in
market value of equity as measured by accounting earnings. Conservatism can be argued to
capture the ability of earnings to reflect economic losses, as measured by negative stock
returns, relative to the ability of earnings to reflect positive news via stock returns. Thus a
reverse regression (of returns on earnings) is frequently analyzed in this context.
Looking to the models in the extant literature, we consider two individual firm-level
measures for conservatism. These variables measure the incremental timeliness for bad
news over good news. The first measure is the standard Basu (1997) measure estimated on
a firm-level basis over a ten-year period while the second measure is the C-Score
developed by Khan and Watts (2007). The Basu measure is based on a regression of
returns on earnings with an interactive dummy variable on returns to indicate whether the
return was negative.
Earni,t = β1,i,t + β2,i,tNegi,t + β3,i,tReturni,t + β4,i,tNegi,tReturni,t + ei,t (5)
where Earn is earnings before extraordinary items, Return is the firm’s annual compounded
stock return, and Neg is =1 if the return is negative for year t and 0 otherwise. Thus β4 is
the incremental timeliness of bad news in the basic Basu model.
In contrast, the Khan and Watts approach specifies the incremental timeliness of
bad news as a linear function of time-varying firm-specific characteristics. The C-score is
computed as:
C-score = β4,i,t = λ1,i,t + λ2,i,t Size1,i,t + λ3,i,t M/B1,i,t + λ4,i,t Lev1,i,t (6)
9
The empirical model is estimated using equation (5) but substituting equation (6) for β4,i,t
(and substituting a similar equation for the timeliness of good news) and then using
equation (6) to determine the C-Score.3
We chose to include the Khan and Watts measure in our subsequent analysis for the
following reasons. First, many firms did not have a negative annual return within the 10-
year estimation period,4 and many more firms reported only one or two negative return
years in the last 10 years. Thus the basic Basu model did a relatively poor job of measuring
the incremental timeliness of bad news estimated on a firm-level basis. The Khan and
Watts procedure estimates the model on a cross-sectional basis, but then uses firm specific
characteristics to compute a firm-level conservatism measure. This allows us to measure
the timeliness of bad news for all firms. Finally, Khan and Watts (2007) provide evidence
that their C-Score predicts the Basu measure up to three years in advance even if they
include firms with only positive returns.5
2.6 Advantage of using multiple dimensions of quality
Why is a model based on multiple dimensions potentially superior to one relying on
only one aspect of earnings quality? Consider the possibility that a researcher chooses a
measure of Predictability alone as her proxy for the informativeness of earnings. Suppose 3 See Khan and Watts (2007) for further details. 4 Thus, if you compute (β3 + β 4)/ β 3, the median value is 1, indicating that the estimated coefficient for β 4 = 0 (i.e., no available negative annual returns for these firms). 5 Three attributes of earnings are measured in Francis et al. (2004) using a market-based approach: conservatism, relevance, and timeliness. However, in their conclusion, Francis et al. (2004, p. 1007) state:
“. . . our results suggest that a focus on accounting-based attributes (rather than on market-based attributes) would allow for more sharply delineated comparisons in settings where the consideration of earnings numbers or reporting systems is linked to investors’ resource allocation decisions.”
In addition, our preliminary tests confirm the tenuous value of including these particular market-based measures. Therefore, we focus on the earnings (rather than market) attributes proposed by Francis et al. (2004), and rely on the alternative market-based score developed by Khan and Watts (2007), which we believe provides a more comprehensive measure of accounting conservatism.
10
that a firm’s reported earnings in period 1 are exactly equal to its reported earnings in
period 2, and the researcher chooses a proxy for Predictability that reveals this firm to have
essentially perfect Predictability (and hence extremely high information value). Further,
suppose that this same firm’s true economic performance in period 2 was actually quite
dismal, and that the level of reported earnings was sustained only by drawing upon a
“cookie jar” of reserves created in period 1, when earnings was actually understated. An
examination of Accrual Variability in conjunction with Predictability would reveal that
there is much less information about earnings than is suggested by using Predictability
alone. Similar examples can be constructed for reliance on Conservatism alone,
Smoothness alone, or Accrual Variability alone.
As previously mentioned, the various earnings attributes of earnings quality are
integrally linked in some respects, both to each other and to the decision-making process.
Thus, we make no attempt to rank their importance to users, but instead allow our model to
reveal the importance of each in assessing the overall informativeness of reported earnings.
2.7 Innate determinants
Here we follow Francis et al. (2004) in selecting and measuring eight innate firm
characteristics. The first innate measure is client firm size, where we use the log of total
assets. We measure the Variability of CFO as the standard deviation of CFO measures
over our ten-year period (and scaled by total assets). We include the Operating Cycle,
defined as the log of summed days in accounts receivable plus days in inventory. We
measure Losses (or Negative Earnings), as the portion of losses (before extraordinary
items) over the ten-year period. We define Intangible Intensity as the sum of R&D and
advertising expense scaled by sales revenues. We include a dummy variable (Intangible
11
Dummy) set equal to 1 if Intangible Intensity is zero, and set at 0 otherwise.6 We measure
capital intensity (Capital Intensity) as the net book value of property, plant & equipment
(PPE) divided by the book value of total assets. Finally, Sales Variability is the standard
deviation of a firm’s ten-year sales revenues divided by total assets.
3. Preliminary analysis
3.1 Descriptive statistics
In Table 1, we present summary information on the 1251 individual firms used in
the cluster analysis. Each of the earnings attribute measures is calculated for a ten-year
period ending between 1997 and 2003. Because we normalize the data for the analysis and
do not eliminate outliers, we report quartiles, trimmed means, and estimates of the standard
deviation based on the quartiles of the data.7 Our median numbers for the earnings
attributes are similar to the median numbers reported by others. For instance, the median
Accrual Variability in our sample is 0.019 which is the same as the median reported by
Francis et al. (2004), while Dechow and Dichev (2002) reported a median of 0.020. Our
measures of Predictability, Persistence, and Smoothness are also similar to those reported
in Francis et al. (2004). Our median Predictability measure is 0.62 compared to 0.54. The
median (negative of) AR1 parameter (Persistence) in our sample is −0.28, while in Francis
it was −0.52. This implies a weaker relationship between successive years’ earnings in our
study. Furthermore, our median measure of Smoothness is 1.20 whereas Francis et al.
(2004) reported a median of 0.58. Finally, the median Conservatism reported in Table 1
6 Because generally accepted accounting principles do not require separate disclosure of immaterial amounts for advertising and R&D, the Intangible Intensity may be measured as zero for firms with positive amounts deemed immaterial for advertising and R&D. 7 This approach accomplishes essentially the same objective as the use of Winsorized data in Francis et al. (2004).
12
(0.087) is similar to the median of 0.082 reported by Khan and Watts (2007, Table 4,
“C_Score1”).
3.2 Normalizing transformation To facilitate multivariate and latent class analyses in our study, each measure of
earnings attributes is transformed using the inverse standard normal cumulative distribution
function (cdf) of the empirical cdf (or scaled rank) of the original variable. Details are
provided in the next section. All subsequent references to these measures refer to their
normalized forms.
3.3 Correlations, partial correlations, auxiliary R2
As Table 2 indicates, Smoothness has the strongest relationships with the other
variables, although only 22.4% of its variance is accounted for by other measures. Except
for the correlation between Accrual Variability and Persistence, all correlations among
accounting measures are statistically significant (p<0.01), although none of these individual
correlations account for more than 15% of the variance of any variable (i.e., |correlation| <
0.39; the squared correlation represents the proportion of variance accounted for by each
variable). Also, Conservatism, the market based measure, is significantly correlated with
Accrual Variability and with Predictability, although this last correlation is actually
negative. In every case the partial correlations are significant (p<0.01) when and only
when the corresponding correlation is significant.8 Thus, Accrual Variability exhibits a
direct relationship with all measures except Persistence; i.e., this is the only insignificant
partial correlation between Accrual Variability and any of the other three measures.
8 Partial correlations are computed controlling for the effect of the remaining earnings attributes.
13
4. Description of the model We use a two-level latent class cluster analysis to study how firms differ in terms of
our five measures of earnings attributes. This model incorporates industry random-effects,
so that observations are assumed to be independent across industries, but the observations
for firms within an industry are independent only after conditioning on this random effect.
This is a natural extension of likelihood-based latent class cluster analysis, since the
probability that a firm is in a given class (or cluster) is still a logistic regression on
covariates, except that these covariates now include the industry-specific random effect
(Vermunt 2003), which has mean zero and a variance that is specific to the latent class. We
relax the typical assumption of local independence (i.e., the assumption that earnings
attributes are mutually independent within latent class or cluster) and incorporate
covariances among the seven measures, as part of the model. Also, within each cluster, we
find the best regression model for each of the five earnings attributes measures in terms of
eight innate firm characteristics (or innate determinants of earnings attributes).
As we discuss in more detail subsequently, we employ several forward and
backward stepwise procedures to select the best combination of covariates (predictors for
latent class membership), direct effects (predictors in the regression models for each of the
measures of earnings attributes within each cluster), and nonzero covariances (for
dependencies among the measures of earning attributes). The best model that emerges
empirically, through stepwise searches, is also optimal among all models that include
parameters for the most significant partial correlations presented in Table 2, each of which
explains more than 5% of the variance in the measures.
14
Forty-four of the forty-eight industries identified by Fama and French (1997) are
represented in this analysis.9 We also estimated models that used the industry variable as a
nominal covariate, but the use of an industry specific random-effect invariably provided
better scientific models, i.e., models with lower values for the Bayesian Information
Criteria (BIC, Schwarz, 1978), and better residual diagnostics. Mathematical details are
presented next.
4.1 The Latent class cluster model
In this analysis, the dependent variable is a vector of five normalized measures of
earnings attributes yi = (yi1, yi2, yi3, yi4, yi5), i.e., each measure is transformed using the
inverse standard normal cumulative distribution function (cdf) of the empirical cdf (or
scaled rank) of the original variable:
. (5) ( )1)+)/(Nitx(Rank = ity -1Φ
This normalization of each measure facilitates the use the normal latent class cluster model
(simple standardization, by subtracting the mean from each variable and dividing by the
standard deviation, would only recenter and rescale each variable and would not suffice as
a form of normalization). Other distributional approaches are possible but the normal
model provides a relatively simple and direct way of representing the multivariate
relationship among the five measures.
We use a two-level latent class cluster model across J groups (J=44 industries) and
K latent classes. Here yij represents the ith observed vector of earnings attributes in
9 The four industries from the Fama and French classification that did not make it into our sample are: tobacco products, shipbuilding, defense, and coal.
15
industry j (i = 1,…, nj). We find K clusters by maximizing the likelihood for N
observations, ( N = ): ∑J
1=j jn
, (6) { } ))()∑ ∫ ∏J
1jjj
n
1i jpredij
covijij
j
df , ,Z ,Z|(yf log L log j
== ⎥⎦
⎤⎢⎣⎡ Θ= ξξξ
ξ
with:
(7) ,),) ∑K
1kk
(k)predijijkj
covijijj
predij
covijij ,Z |(y ) ,,Z|k(X P , ,Z ,Z|f(y
=
ΣΓ==Θ βφξξ
where:
), k(k)pred
ijijk ,Z |(y Σβφ is the five-variate normal density within cluster k, with
covariance matrix and a mean kΣ kμ that is a linear function of the p predictors
with coefficients = ( ), )z...,,z(=Z ijp1ijij)k(β )k(
p)k(
1)k(
0 ,...,, βββ
; (8) ∑p
1r
predijr
kr
k0k z
=
+= )()( ββμ
and:
) ,,Z|k(X P covijij Γ= jξ is the logistic probability that observation i, from industry j, is in
latent class k,
∑ =
=Γ= K1 j
covij
jcovij
j )],Z|(exp[(
)],Zkk
ll ξη
ξηξ
|( exp[) ,,Z|(X P cov
ijij , (9)
with
, (10) j00
R
1r
covijrr0j
cov z),Z|(ij
ξγγγξη llll ∑=
++=
16
for ℓ = 1,…, K. In (9) and (10), the ξj , j = 1,…, J, are assumed to be independent standard
normal random effects, which is not very restrictive given the flexibility that is provided in
(10) by the random effect coefficients, γℓ00, which correspond to each latent class ℓ, ℓ=1,…,
K, and which are subject to the identifiability constraint . 0K
100 =γ∑
=ll
4.2 Model selection procedure
We employed backward and forward stepwise procedures to find the most
promising models, and selected the “best scientific model” as that model that minimized
the average value of Bayesian Information Criterion (BIC) (Schwarz 1978),
n−1{BIC} = n−1{-2 Log L* + log(n)p},
where Log L* is the maximum of the log-likelihood function in (6), p is the number of
parameters and n is number of observations available to fit the model under consideration
(n varies slightly among models due to missing values). Keribin (1998) showed that, under
relatively general conditions, BIC is a consistent criterion for selecting the number of
clusters. BIC imposes a greater penalty for model complexity than AIC or criteria that
measure proportional reduction in loss, and has also been shown to be consistent criterion
in very general theoretical settings (Woodroofe 1982; Bozdogan 1987) and to provide
models that perform well in empirical studies (Rust, Simester, Brodie and Nilikant 1995;
Steyerberg, Harrell, Borsboom, et al. 2001), including latent class analyses (Biernacki and
Govaert 1999; Dias 2004).
Initially we included all eight covariates (the innate determinants of earnings
attributes), along with the industry random-effect, as variables that determine latent class
membership in (9) and (10). We then repeatedly applied the following two-step procedure
until there was no further improvement in BIC.
17
Step 1: Given a specific selection of covariates, a forward stepwise procedure was used to
select those covariances and direct effects that would provide a better model (lower BIC);
at each step, the best candidate covariances and direct effects were determined by an
analysis of the bivariate residuals of the seven measures (Vermunt and Magidson 2002),
and the optimal number of clusters was redetermined.
Step 2: Subject to this selection of direct effects and covariances, we then used a backward-
stepwise procedure to eliminate the least significant covariates for latent class membership.
At each step the optimal number of clusters was redetermined.
By initially including all possible covariates, and first identifying the appropriate
direct effects and covariances, before eliminating those covariates that were not important,
we were able to find the best model (minimum BIC) that was also supported by the residual
diagnostics.
4.3 Principal components
Principal components analysis represents an alternative way to aggregate all the
earnings attributes into one or more factors to capture the informativeness of earnings.
Thus, PCA can be used alone, as described subsequently, or in conjunction with LCC, as
we do here. We use principal components analysis as a complementary way of
summarizing the data and as a way of finding a measure for the overall quality of earnings
(Total Measure). The principal components are not used explicitly in the latent class
model, although we do use the first two components as inactive covariates. The principal
components also provide an additional means of interpreting the firm clusters (or groups)
identified by the latent class analysis.
18
The principal components are summarized in Table 3. Sixty-one percent of variance
of the normalized measures is explained by the first two principal components, and it is
only these first two components that explain more variance than a single measure. The first
component, referred to as Total Measure, is positively correlated with all five measures,
most notably with the four accounting based measures, and accounts for 35% of the total
standardized variance of the five measures. The second component explains 27% of total
standardized variance (while only 20% would be explained by a single measure). This
second component contrasts Conservatism and Accrual Variability with the two measures
based on the first-order autoregressive model for earnings, Persistence and Predictability.
These correlations are summarized in Figure 1.
In Table 4 (discussed in the next section), we note that the first component (Total
Measure ) enables us to distinguish the top three quality clusters from the other three, and
to distinguish the lowest performing group of firms as significantly different from all
others. Thus, the use of principal components would enable the researcher to create a 0-1
dummy variable to distinguish firms with high reporting quality from firms with relatively
low quality, or lowest quality from all others. While PCA is simpler to apply than LCC
and offers an alternative means for considering multiple attributes rather than relying on a
single dimension, it does not distinguish as finely among the degrees of reporting attributes
(aggregated) as LCC.
5. Results of the cluster analysis
5.1 The Best scientific model
The model that minimizes the Bayesian Information Criterion (BIC) is a six-cluster
model summarized in Tables 4 through 6. There is substantial empirical support for the
19
selection of 6 clusters: BIC increases substantially when 5 or fewer clusters (ΔBIC > 15)
and when 7 or more clusters (ΔBIC > 30) are fit, where ΔBIC is the difference in BIC
values. ΔBIC provides a way to approximate the posterior odds favoring the selected model
(when prior model probabilities are equal) and the evidence supporting the selected model
is generally regarded as decisive (odds greater than 150 to 1, assuming an appropriate
“reference” prior; Kass and Wasserman 1995) when ΔBIC >10 (Kass and Raftery 1995).
As we will show subsequently, this model posits a covariate structure among the
five measures (found by stepwise search) that is consistent with the exploratory results of
Table 2. The choice of this model is also supported by the significant differentiation it
provides among all six clusters of firms in terms of earnings attribute measures, and the
levels of the innate firm characteristics. In this model, even the smallest cluster is
represented by over 30 firms (from 19 industries) and this group of firms has a profile of
characteristics, and earnings attributes, that distinguish it as a significantly different cluster
(as we will show in Table 4).
As one might expect with a random effects model, it is difficult to classify each
firm precisely, but the model makes classification errors with an estimated probability of
only 14%, and the model provides a 74% reduction in misclassification error (lambda)
relative to random classifications based on only the relative size of the clusters. Similarly,
the entropy R-square of classification accuracy is 75% (Vermunt and Magidson 2002).
Only four covariates were used for the classification component of this model: log total
assets, σ(CFO/Assets), Negative Earnings, and Intangible Intensity.
20
5.2 The Profiles of the six clusters Table 4 describes the profiles of the six clusters of firms. Each of the five measures
of earnings attributes have significant incremental value in this model in the sense that each
measure differs significantly across these six clusters (p<0.0001 for each measure, Wald’s
test). The cluster number was assigned so that clusters are ordered in terms of decreasing
quality according to the first principal component (Total Measure), i.e., Cluster 1 has the
best average value of Total Measure (9th percentile), and cluster 6 has the worst (average
Total Measure at 78th percentile). Cluster 5 is the largest at 45% of the sample, followed
by Cluster 2 (23%), Cluster 6 (19%), Cluster 1 (5.7%), Cluster 4 (4.0%), and Cluster 3
(2.8%), respectively. Consequently, 32% of the sample (and target population) is in one of
the three highest quality clusters, all of which have average Total Measure percentiles
below 20%, and 68% of firms are in the three lowest quality clusters, all of which have
average Total Measure percentiles above 60%. Clusters 6 has a significantly larger (worse)
average Total Measure percentile than all of the other groups (p<0.05; all significance tests
among groups use the family error rate appropriate for all pairwise comparisons).
This relative ranking of the best and worst performing clusters, that is provided by
the Total Measure component, is generally consistent with the relative performance of
these six clusters on each of the five individual measures of earnings attributes. In
particular, firms in Cluster 6 (the third largest group at 19% of firms) have averages on
Persistence (72nd percentile) and Predictability (70th percentile), that represent significantly
lower performance than all other groups of firms, and this is the only cluster of firms with
averages on all measures that are above the 50th percentile, indicating lower quality than
the median on each of the five quality dimensions. Cluster 6 is also one of the worst
21
performers on Smoothness, although in this case its performance is not significantly worse
than that of Clusters 4 and 5. Cluster 5 is significantly worse than all other groups in terms
of average Conservatism (90th percentile) and Cluster 4 is significantly worse than all other
groups in terms of average Accrual Variability (72nd percentile). In contrast, nearly all of
the averages for Clusters 1, 2 and 3 (32% of firms) are below the 50th percentile, indicating
higher than median quality; the two exceptions are that the mean of Cluster 1 is at the 61st
percentile on Persistence, and Cluster 3 (2.8% of firms) is at the 54th percentile on
Conservatism. Except for Persistence, Cluster 1 is actually at the best average level, or not
significantly different from the best level, on all other measures. Clusters 2 and 3 also have
the best average levels for Persistence and Predictability and Cluster 3 is one of the best
performers in terms of average Smoothness. This discussion of Table 4 and the overall
ranking in terms of the Total Measure component indicates that there are three distinct
groups, each of which is significantly different: the best performers (Clusters 1-3), the two
moderately poor performers (Clusters 4-5), and the worst performer Cluster 6. A list of
firms and their modal cluster groupings can be obtained from the authors. Alternatively, in
Appendix A, we present an approach to compute the first principal component (Total
Measure), based on the five earnings attributes, which could be used in other research
applications.
Although the Total Measure component provides a very useful summary of overall
performance, the second principal component provides an interesting way of contrasting
the clusters within each of the high- and low-performing groups. In terms of this second
component, Clusters 1 and 4 are at the low and high ends, respectively, and, in terms of this
second component, each of these clusters is significantly different from the other clusters at
22
similar Total Measure levels. In fact, Cluster 1 is actually farther from Clusters 2 and 3 on
component 2 than it is from Clusters 5 and 6, and Cluster 4 is actually farther from Clusters
5 and 6 than it is from Clusters 2 and 3.
5.3 Profile differences in terms of innate determinants of earnings
As shown in the second half of Table 4, the best performing clusters (Clusters 1-3)
also have significantly lower averages for sales variability (σ(Sales/Assets)) and Negative
Earnings relative to the other clusters. The differences across clusters is especially stark
for Negative Earnings, which on average occurs less than 1% of the time in clusters 1-3
and more than 20% of the time in Clusters 4-6. Cluster 1 further distinguishes itself by also
having the lowest averages on cash flow variability (σ(CFO/Assets)), and Intangible
Intensity, and it has an average on Operating Cycle that is not significantly different from
the lowest. Similarly, it also has significantly higher averages for Capital Intensity, and for
the Intangible Dummy relative to all other clusters. In fact, the Intangible Dummy average
indicates that intangibles are absent 98.6% of the time in Cluster 1.
In terms of Intangible Intensity, though, Cluster 4 is the only group with a
significantly higher average relative to the other clusters. Cluster 4 also distinguishes itself
as having the lowest average size (Log Assets) and having the largest average proportion of
Negative Earnings; on both of these variables it is significantly different from all other
clusters. Clusters 4 and 6 also have the highest average sales variability (σ(Sales/Assets)),
followed by Cluster 5, which is still significantly higher than Clusters 1-3 (the best
performing clusters). Cluster 3 firms are significantly larger than firms in all other clusters,
and Cluster 4 firms are, on average, significantly smaller than the rest.
23
5.4 Classification into clusters by covariates
There are four covariates in the model, each of which aids significantly in the
classification of firms into clusters (see equations (9) and (10)). They are: Log Assets
(p<0.0001), σ(CFO/Assets) (p<0.0001), Negative Earnings (p<0.0001) and Intangible
Intensity (p<0.01). Since the industry random effect is also a significant component of the
classification function (p<0.001), the classifications made using only covariate information
are not as accurate as those using the complete model. For example, when only the
covariate values are used to make classifications, rather than all firm information, including
the actual value of the firms’ earnings attributes, lambda and the entropy R-square values
(for classification accuracy) decrease from 74% to 36% and from 75% to 43%,
respectively. Thus, cluster membership and the Total Measure component are not easy to
predict using firm characteristics. The coefficients of the model for the cluster
classification probabilities (equation 10), based on the four covariates and random effects,
are presented in Appendix B, but the summary of differences among the covariate means in
Table 4 provides a direct way of seeing the results of the classification.
5.5 After Classification--important covariances among the five measures
The best scientific model includes only four nonzero covariances among the five
measures, and these correspond to the only partial correlations in Table 2 that account for
more than 5% of the variance of each measure. These are the covariances between Accrual
Variability and Conservatism, Accrual Variability and Smoothness, Predictability and
Persistence, and between Predictability and Smoothness. The values of these four
covariances differ by cluster. Simpler models that used common cluster-independent
24
estimates of these four covariances did not fit as well and these models invariably had
higher BIC values.
5.6 How the clusters relate to the Fama-French industries
An analysis of how the clusters are related to industries provides a complementary
way of understanding the characteristics of each cluster and their differences. Table 5
presents the proportion of firms in each industry that are in each cluster. Clusters 2, 5, and
6 reveal the greatest representation across industries. As shown in Table 5, Cluster 1 is the
only cluster with fewer than 19 industries represented. The largest cluster, Cluster 5 (45%
of all firms) represents more than half the firms in 18 industries and more than 70% of the
firms in seven industries. These seven industries are agriculture (86%), fabricated products
(83%), precious metals (81%), textiles (75%), petroleum and natural gas (74%), steel
(73%), and business supplies (72%). As one would expect, there are low performing firms
in all industries. In particular, all forty-four industries included in our sample are
represented by firms in Clusters 5 and 6. But outside of Cluster 5, it is rare for a cluster to
include more than 50% of all firms in an industry, and this only happens three times in two
other clusters.
While the preceding paragraph refers to the percentages of the various industries
represented in each cluster, Figures 2 and 3 present the percentages of each cluster that are
in the various industries. All industries that comprise at least 5% of a cluster are specified
in these figures. With only a few exceptions, no single industry accounts for more than
15% of any cluster, and this happens only in Clusters 1 and 4. In Cluster 4, 26% of the
firms are in the pharmaceutical products industry and 24% are in medical equipment.
Ninety percent (90%) of Cluster 1 is comprised of utility companies. This striking statistic
25
draws attention to one of the distinctions between PCA and LCC. LCC enables us to see
that the cluster comprised almost entirely of utilities is unique and that its earnings
attributes are quite different from those of firms in other clusters, even from those firms in
Clusters 2 and 3 that also tend to perform well on most measures of earnings quality and on
the average value for component 1 (Total Measure).
If we return to Table 4, we see that the firms in Cluster 1 performed quite well on
every dimension examined except persistence. If a researcher believes persistence to be
particularly important or relevant to the issue being examined, then Cluster 1’s aggregate
performance in terms of the first principal component would be misleading. This also
demonstrates one of the reasons that firms in particular industries (utilities or banking, for
example) may need to be excluded or examined separately from other firms in some
research contexts.
5.7 Within-Cluster regression models Next we present the results of the within-cluster regression analysis (equation 8).
This analysis provides another way of studying the relation between individual earnings
attributes and innate firm characteristics, as well as a means to understand how the clusters
differ from each other. This analysis represents the most plausible way to explain how the
five earnings attributes are related to innate firm characteristics. Just as a researcher
searching for the best explanation for the relation between house rentals and underlying
home values might need different linear models for different geographic areas, there is no
one model that can describe the relation between innate characteristics and earnings
attributes in every setting. We are merely seeking the best explanation across all firms in
our sample.
26
The within-cluster linear models for the five measures of earning quality differ in
terms of intercepts and random effects, but the same set of predictor coefficients are used in
each cluster. These coefficients (and standard errors) are summarized in Table 6. The
predictor coefficients are significantly nonzero (p<0.0001) in all but two cases; these two
cases are in the model for Predictability, where the coefficient of Intangible Intensity is
significant at the 0.01 level, and in the model for Smoothness, where the coefficient for
Operating Cycle is significant at the 0.05 level).
Note that for Persistence, the best model is to simply use cluster specific intercepts,
because the innate firm characteristics do not have significant incremental predictive value
within cluster. In other words, the cluster means for Persistence provide the best overall
explanation for differences among firms, and we find that Cluster 3 exhibits the best
persistence (intercept of -0.75 in Table 6) while Cluster 6 exhibits the worst (+0.75).
Returning once more to Table 4, we see that there are three distinct levels of mean
Persistence. The three groups are, in order of declining average Persistence: (1) Clusters 2,
3 and 4, (2) Clusters 1 and 5, and (3) Cluster 6. (As before, lower scores mean higher
Persistence.)
For all measures where sales variability (σ(Sales/Assets)) and Negative Earnings are
important predictors, the positive coefficients in Table 6 mean that a higher numerical
value of each measure is associated with higher values of these variables, ceteris paribus.
In other words, negative earnings and more variable sales impact smoothness,
conservatism, etc. negatively. In contrast, firm size (Log Assets), cash flow variability
(σ(CFO/Assets)) and the Intangible Dummy have effects of different signs on various
attributes. For example, holding other firm characteristics constant, larger assets are
27
associated with less Predictability (higher values) but with better performance on the
Accrual Variability and Conservatism scales. Finally, higher values of Operating Cycle,
Intangible Intensity and Capital Intensity are each associated with better performance on a
different measure (Smoothness, Predictability and Accrual Variability, respectively).
6. Summary and conclusions
We contribute to the literature on the value of reported earnings by creating a
classification of firms based on not one, but several earnings attributes. We describe two
approaches for aggregating the information in the various attributes, a principal
components analysis and a latent class analysis. We show that the latter enables us to draw
more information from these attributes by identifying the best possible model. This
approach offers the advantage of having lower measurement error than any single indicator,
or even a simple principal component measure, and provides a richer description of how
the measures of earnings attributes differ across firms and industries.
Using a two-level latent class cluster analysis, we present evidence of the existence
of six clusters or categories of firms, across forty-four industries, with respect to the
usefulness of reported earnings. To facilitate the analyses in our study, each measure of
earnings attributes is transformed using the inverse standard normal cumulative distribution
function of the empirical scaled rank of the original variable.
Our paper serves to contribute to the literature on several important dimensions.
Measures that have been shown in prior studies (Francis et al. 2004, among others) to be of
value in assessing earnings informativeness along various facets, are aggregated into one
measure, and firms are identified with six unique reporting clusters or categories. We
cannot rule out the possibility that earnings management could play a role in producing
28
seemingly informative earnings on the various dimensions examined. We do believe,
however, that such manipulation on several fronts is less likely to be successful over a
period of years, thus reinforcing our belief that an aggregate measure is superior to one that
focuses on a single dimension.
29
Appendix A. Calculating the Total Measure Component
As shown in Table 4, the first principal component, Total Measure, provides a
useful way of summarizing the main differences among clusters. In our study, Total
Measure was calculated as the cumulative normal probability corresponding to ZTotal Measure,
where if we use the coefficients of the first principal component,
(A.1) ePersistenc+yVariabilitAccrual Z424.0Z464.0=Z MeasureTotal
. smConservati+Smoothness+tyedictabiliPr+ Z207.0Z549.0Z511.0
As shown in equation (5), the Z-values for each measure were calculated as a normalization
of the actual measures (using the inverse normal cdf). The table below provides a
convenient alternative way of calculating these Z-values directly from the five values of a
firm’s measures of earnings attributes. This measure can be used as a proxy for earnings
quality in a variety of settings.
30
Percentiles and Corresponding Z-values for the Five Measures of Earnings Attributes
Percentile Accrual Variability
Persistence Predictability Smoothness Conservatism Z-value
1 0.0034 -0.957 0.084 0.296 -0.077 -2.3265 0.0060 -0.919 0.154 0.497 -0.030 -1.645
10 0.0081 -0.848 0.194 0.603 -0.003 -1.28220 0.0110 -0.697 0.278 0.773 0.027 -0.84230 0.0136 -0.554 0.363 0.919 0.052 -0.52440 0.0158 -0.417 0.456 1.052 0.071 -0.25350 0.0186 -0.291 0.575 1.196 0.087 0.000 60 0.0220 -0.189 0.732 1.381 0.104 0.253 70 0.0259 -0.066 0.910 1.639 0.126 0.524 80 0.0313 0.059 1.233 2.065 0.150 0.842 90 0.0427 0.233 1.891 2.864 0.188 1.282 95 0.0538 0.360 3.032 3.608 0.222 1.645 99 0.0879 0.666 6.613 10.245 0.321 2.326
For example, assume a particular firm’s values for Accrual Variability, Persistence, Predictability, Smoothness, and Conservatism were 0.02, -0.19, 0.20, 1.9, and 0.24, respectively. According to this table, these values are at approximately the 45th, 60th, 10th, 80th, and 95th percentiles on each of the measures, respectively. If we use the last column of Table 7, we can find the corresponding Z-values corresponding to each measure’s percentile level (approximately), so that using equation (A.1),
= 0.18
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
++
−++
−
≈
)65.1(x207.0)84.0(x549.0)3.1(x511.0)25.0(x424.0
)13.0(x464.0
Z QualityTotal
Thus the firm’s measure of Total Measure is just above the 50th percentile, which indicates that it is slightly worse than median quality.
31
Appendix B. Logistic Model for Cluster Classification Probabilities (Coefficients of Expression (10))
Model Term Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 P-value Intercept 168.4106 173.4725 -847.5345 161.1120 174.0341 170.5053 <0.0001 Random Error 47.9168 48.3428 -241.7906 46.9102 49.7443 48.8766 <0.0001 Covariates Log Assets -11.8717 -11.2714 55.8472 -10.2670 -11.3675 -11.0695 <0.0001 σ(CFO/Assets) 20.7396 36.0389 -116.8649 46.7917 6.0191 7.2756 <0.0001 Negative Earn. 689.0073 628.9608 -3210.6627 528.0130 680.8636 683.8181 <0.0001 Intangible Intensity
3626.4491 3618.9579 -18130.9106 3631.6798 3627.5198 3626.3041 <0.01
P-values are for whether coefficients are significantly nonzero across clusters (based on the Wald statistic).
32
References Ali, A., and P. Zarowin. 1992. The role of earnings levels in annual earnings-returns
studies. Journal of Accounting Research 30: 286-296. Biernacki, C. and G. Govaert. 1999. Choosing models in model-based clustering and
discriminant analysis. Journal of Statistical Computation and Simulation, 64: 49-71.
Bozdogan, H. 1987. Model selection and Akaike’s information criterion (AIC): the
general theory and its analytical extensions. Psychometrika, 52: 345-370. Chamberlain, S. and R. Anctil. 2003. Determinants of the time-series of earnings and
implications for earnings quality. Working paper, University of British Columbia. Chaney, P., and C. Lewis. 1995. Earnings management and firm valuation under
asymmetric information. Journal of Corporate Finance: Contracting, Governance and Organization 1: 319-345.
Chaney, P., D. Jeter, and C. Lewis. 1998. The use of accruals in income smoothing: a
permanent earnings hypothesis. Advances in Quantitative Analysis of Finance and Accounting 6: 103-135.
Choi, S. and D. Jeter. 1992. The effects of qualified audit opinions on earnings response
coefficients. Journal of Accounting and Economics 15: 229-247. Dechow, P., and I. Dichev. 2002. The quality of accruals and earnings: The role of accrual
estimation errors. The Accounting Review 77 (Supplement): 35-59. Dechow, P., and C. Schrand. 2004. Earnings quality. The Research Foundation of CFA
Institute, The Research Foundation of CFA Institute, Charlottesville, Virginia. DeFond, M. and J. Jiambalvo. 1993. Debt covenant violation and manipulation of accruals.
Journal of Accounting and Economics 17: 145-176. DeFond, M. and K.R. Subramanyam. 1998. Auditor changes and discretionary accruals.
Journal of Accounting and Economics 25: 35-68. DeFond, M. and C. Park. 2001. The reversal of abnormal accruals and the market valuation
of earnings surprises. The Accounting Review 76: 375-404. Easton, P. and M. O’Hara. 2003. Information and the cost of capital. Journal of Finance
59: 1553-1583. Ecker F., J. Francis, I. Kim, P. Olsson, and K. Schipper. 2006. A returns-based
representation of earnings quality. The Accounting Review 81: 749-780.
33
Fama, E. F. and D. R. French. 1997. Industry costs of equity. Journal of Financial Economics 43, 153–193.
Francis, J., R. LaFond, P.M. Olsson, and K. Schipper. 2004. Costs of equity and earnings
attributes. The Accounting Review 79: 967-1010. Hunt, A., S. Moyer and T. Shevlin. 2000. Earnings volatility, earnings management and
equity value. Working paper, University of Washington. Jones, J. 1991. Earnings management during import relief investigations. Journal of
Accounting Research 29: 193-228. Kass, R.E. and A.E. Raftery. 1995. Bayes factors. Journal of the American Statistical
Association, 90: 773-795. Keribin, C. 1998. Consistent estimation of the order of mixture models. Comptes Rendus
De L Academie Des Sciences Serie I-Mathematique 326: 243-248. Khan, M. and R. Watts. 2007. Estimation and validation of a firm-year measure of
conservatism. MIT Sloan Research Paper No. 4640. Larcker, D. and S. Richardson. 2004. Fees paid to audit firms, accrual choices, and corporate
governance. Journal of Accounting Research 42: 625-658. Lazarsfeld, P. 1950. The logical and mathematical foundation of latent structure analysis.
Studies in Social Psychology in World War II, Princeton University Press, Princeton, NJ: 362-472.
Lee, C. 1999. Accounting-based valuation: Impact on business practices and research.
Accounting Horizons 13: 413-425. Leuz, C., D. Nanda, and P. Wysocki. 2003. Earnings management and investor protection: An
international comparison. Journal of Financial Economics 69: 505-527. Leuz, C. and R. Verrechia. 2004. Firms’ capital allocation choices, information quality, and
the cost of capital. Working paper, University of Pennsylvania. Lev, B. 1983. Some economic determinants of the time-series properties of earnings. Journal
of Accounting and Economics 5: 31-38. Lipe, R. 1990. The relation between stock returns and accounting earnings given alternative
information. The Accounting Review 65: 49-71. McNichols, M. 2002. Discussion of The quality of accruals and earnings: The role of accrual
estimation errors. The Accounting Review 77 (Supplement): 61-69.
34
35
Rust, R., D. Simester, R. Brodie and V. Nilikant. 1995. Model selection criteria: An investigation of relative accuracy, posterior probabilities and combinations of criteria. Management Science 41: 322-333.
Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics, 6: 461-464. Shivakumar, L. 2000. Do firms mislead investors by overstating earnings before seasoned
equity offerings? Journal of Accounting and Economics 29: 339-371. Steyerberg E.W., F.E. Harrell, Jr., G.J. Borsboom, M.J. Eijkemans, Y. Vergouwe and J.D.
Habbema. 2001. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology, 54: 774-81.
Teoh, S., I. Welch, and T.J. Wong. 1998. Earnings management and the underperformance
of seasoned equity offerings. Journal of Financial Economics 50: 63-99. Trueman, B., and S. Titman. 1988. An explanation for accounting income smoothing.
Journal of Accounting Research 26: 127-143. Vermunt, J.K. 2003. Multilevel latent class models. Sociological Methodology 33: 213-239
Vermunt, J.K., and J. Magidson. 2002. Latent class cluster analysis. In: J. Hagenaars and A. McCutcheon (eds.). Applied latent class models, 89-106. Cambridge University Press.
Woodroofe, M. 1982. On model selection and the arc sine laws. Annals of Statistics 10:
1182-1194.
Table 1 Descriptive Statistics of Measures of Earnings Attributes and Innate Firm Characteristics Based on 1,251 Firms
Variable
1st Quartile Median
Trimmed Mean
3rd Quartile
σ̂ Estimated
from IQRa
Measures of Earnings Attributes
Accrual Variability 0.012 0.019 0.021 0.029 0.012
Persistence -0.61 -0.29 -0.31 -0.0021 0.45
Predictability 0.31 0.58 0.75 1.02 0.52
Smoothness 0.84 1.20 1.40 1.83 0.74
Conservatism 0.041 0.087 0.089 0.14 0.070 Innate Firm Characteristics
Log Assets 5.00 6.43 6.49 7.93 2.17
σ(CFO/Assets) 0.036 0.057 0.064 0.087 0.052
σ (Sales/Assets) 0.10 0.16 0.22 0.27 0.12
Oper. Cycle 4.40 4.80 4.78 5.17 0.58
Negative Earn. 0 0.10 0.15 0.30 0.22
Intang. Intensity 0 0.013 0.035 0.055 0.041
Intang. Dummy 0 0 0.36 1 0.74 Capital Intensity 0.14 0.26 0.31 0.47 0.24
a Here we use the estimate σ̂ = 0.7413*IQR, where IQR is the inter-quartile range (the difference between the third and first quartiles).
36
Table 2 Correlations and Significant Partial Correlations among Normalized Measure of Earnings Attributesa
Persistence Predictability Smoothness Conservatism Auxiliary R2
(Adjusted in %) Accrual
Variability 0.069 (n.s.)
0.14 (0.12)
0.38 (0.40)
22.4
0.28 (0.23)
Persistence 11.6
-- 0.33 (0.30)
0.35 (0.074)
0.020 (n.s.)
Predictability -- 22.5
0.38
(0.28) -0.12
(-0.19)
Smoothness -- 17.9
0.052 (n.s.)
Conservatism -- 17.8
a All correlations in bold are significant at the 0.01 level (2-sided), and insignificant correlations are italicized. Partial correlations are given in parenthesis when they are at least significant at the 0.01 level (or equivalently when they are greater than 0.073), and otherwise “n.s.” indicates partial correlations that are not significantly nonzero. Auxiliary R2 is the amount of the earnings attribute explained by other attributes. Partial correlations are computed controlling for the effect of the remaining earnings attributes.
37
Table 3 Principal Components of the Correlation Matrix for Measures of Earnings Attributes
Principal Component 1 2
Reification Aggregated Measure
Conservatism & Accrual Variability versus Predictability &
Persistence Eigenvalue 1.73 1.33 % Variance Explained a 35 27 Cumulative Variance 35 61 Correlations with:
Accrual Variability 0.61 0.58 Persistence 0.56 -0.35
Predictability 0.67 -0.48 Smoothness 0.72 -0.07
Conservatism 0.27 0.80 [a The third, fourth, and fifth components account for 17, 12 and 10% of the total standardized variance.]
38
Table 4 Description of the Clusters
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
Cluster Size (%) 5.7 23 2.8 4.0 45 19
Panel A: Earnings Quality Percentile (%) of Mean
Accrual Variability 5b 35 43 72a 41 56
Persistence 61 23b 18b 35b 58 72a
Predictability 28b 22b 22b 18b 56 70a
Smoothness 11b 22 19b 61a 58a 58a
Conservatism 15b 38 54 90a 45 56
Panel B: Principal Components Percentile (%) of Mean
Total Measure (Component 1) 9b 18b 19b 61 63 78a
Conservatism & Accrual Variability versus Predictability & Persistence (Component 2)
20b
62
72
94a
43
46
Panel C: Covariate Means
Log Assets 8.25 6.74 9.69a 3.85b 6.10 6.80
σ(CFO/Assets) 0.030b 0.062 0.061 0.195a 0.070 0.070
σ (Sales/Assets) 0.119b 0.194b 0.169b 0.338a 0.221 0.244a
Operating Cycle 4.43b 4.70b 5.41a 4.95 4.80 4.73b
Negative Earnings 0.003b 0.005b 0.000b 0.680a 0.210 0.263
Intangible Intensity 0.000b 0.027b 0.072b 0.274a 0.063b 0.053b
Intangible Dummy 0.986a 0.330 0.213b 0.112b 0.332 0.429
Capital Intensity 0.585a 0.308 0.227b 0.204b 0.305 0.322 a These means are the highest across clusters or are not significantly different from the highest (Tukey family rate of 5% for all pairwise comparisons). For the earnings quality measures and components at the top of the table, testing is done on the original normalized variables (before conversion to percentiles). b These means are the lowest or not significantly different from the lowest (Tukey 5% rate).
39
Table 5 Percentage of Firms from each Industry in each Cluster
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
Agriculture 0 2 0 0 86 12 Aircraft 0 26 0 0 53 20 Alcoholic Beverages 0 43 32 0 24 2 Apparel 0 37 0 1 54 8 Automobiles & Trucks 0 21 1 0 65 13 Banking 6 0 55 16 3 20 Business Services 0 21 6 11 38 24 Business Supplies 0 15 0 0 72 13 Candy and Soda 0 0 31 32 1 36 Chemicals 0 19 0 0 58 23 Computers 0 5 0 6 69 20 Construction 0 26 5 6 11 52 Construction Materials 0 22 1 2 68 7 Consumer Goods 0 34 5 3 49 9 Electrical Equipment 0 35 0 2 49 14 Electronic Equipment 0 18 2 1 68 11 Entertainment 1 23 2 3 22 50 Fabricated Products 0 13 0 0 83 4 Food Products 1 34 13 3 28 21 Healthcare 1 16 1 4 34 44 Insurance 6 50 0 0 14 29 Machinery 0 32 0 0 53 15 Measuring and Control Equip
0 20 0 3 64 13
Medical Equipment 0 25 2 21 40 13 Miscellaneous 0 23 7 1 27 42 Nonmetallic Mining 0 45 0 0 48 7 Personal Services 0 25 0 0 51 24 Petroleum & Natural Gas 0 12 0 2 74 13 Pharmaceutical Products 0 21 9 24 19 27
40
Table 5 Percentage of Firms from each Industry in each Cluster (Continued)
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
Precious Metals 0 0 0 1 81 18 Printing & Publishing 0 34 0 0 45 20 Real Estate 0 21 0 1 48 30 Recreational Products 0 40 0 0 51 8 Restaurants Hotel 0 46 1 0 33 19 Retail 0 45 3 2 29 21 Rubber & Plastic Products
0 46 0 0 26 27
Shipping Containers 0 2 0 1 46 51 Steel 0 18 0 0 73 9 Telecommunications 0 20 10 9 19 41 Textiles 0 21 0 0 75 4 Trading 0 25 25 2 16 32 Transportation 0 45 0 0 31 23 Utilities 29 3 0 0 18 50 Wholesale 0 42 0 2 33 23 Number of industries Represented
6
41
19
25
44
44
41
42
Table 6 Regression Coefficients (Equation (8)) for Innate Firm Characteristics as Predictors of the Five Measures a
Accrual
Variability Persistence Predictability Smoothness
Conservatism Log Assets -0.05 0.16 -0.40 (0.010) (0.011) (0.007)
σ(CFO/Assets) 7.02 -5.50 (0.469) (0.564)
σ (Sales/Assets) 0.37 0.45 0.34 (0.093) (0.095) (0.080)
Oper. Cycle -0.059 (0.029)
Negative Earn. 0.92 1.42 1.51 (0.138) (0.162) (0.165)
Intangible Intensity -0.26 (0.097)
Intangible Dummy 0.16 -0.30 0.13 (0.042) (0.046) (0.027)
Capital Intensity -0.63 (0.080) Intercepts Overall -0.55 -0.16 -1.95 0.20 2.75 Cluster 1 -0.69 0.44 -0.34 -0.67 -0.20 Cluster 2 0.23 -0.57 -0.08 -0.26 -0.40 Cluster 3 0.54 -0.75 -0.45 -0.38 1.10 Cluster 4 -0.60 -0.22 -0.63 0.47 -0.07 Cluster 5 0.08 0.35 0.67 0.47 -0.48 Cluster 6 0.44 0.75 0.83 0.37 0.04
a All coefficients of firm characteristics are significantly nonzero, p<0.05 (two-sided). Standard errors are in parenthesis. Each measure of earnings quality has been normalized (see equation (5)). A blank indicates that the firm characteristic is not in the best regression model for that measure.
Figure 1
0.60.50.40.30.20.10.0
0.75
0.50
0.25
0.00
-0.25
-0.50
First Component
Se
con
d C
om
po
ne
nt
Conservatism
Smoothness
Predictability
Persistence
Accrual Variability
Correlation Loadings of Principal Components on Five Measures of Earnings Attributes
43
Figure 2
Other9.7%
Utilities90.3%
Other69.0%
Retail10.4%
Wholesale7.7%
Machinery7.5%Electronic Equipment
5.4%
Other26.2%
Pharmaceutical Products13.8% Food Products
13.5%
Trading11.9%
Banking8.5%
Business Services8.0%
Consumer Goods6.7%
Retail6.4%Telecommunications
5.0%
Industry Composition of the Three Best Performing Clusters
Cluster 1 Cluster 2 Cluster 3
44
45
Figure 3
Other33.0%
Pharmaceutical Products26.0%
Medical Equipment24.1%
Business Services10.7%
Computers6.3%
Other64.2%
Electronic Equipment10.8%
Machinery6.7%
Petroleum & Natural Gas6.6%
Computers6.3%
Construction Materials5.4%
Other75.3%
Utilities7.2%
Retail6.2%
Pharmaceutical Products5.7%
Wholesale5.6%
Industry Composition of the Three Worst Performing Clusters
Cluster 4 Cluster 5 Cluster 6