12
Deciding the financial health of dot-coms using rough sets Indranil Bose * School of Business, The University of Hong Kong, Room 730 Meng Wah Complex, Pokfulam Road, Hong Kong, PR China Received 25 July 2005; received in revised form 16 May 2006; accepted 1 August 2006 Available online 12 September 2006 Abstract We conducted an empirical investigation of dot-coms from a financial perspective. Data from the financial statements of 240 such businesses was used to compute financial ratios and the rough sets technique was used to evaluate whether the financial ratios could predict financial health of them based on available data. The most predictive financial ratios were identified and interesting rules concerning the financial ratios and financial health of dot-coms were discovered. It was shown that rough sets performed a satisfactory job of predicting financial health and were more suitable for detecting unhealthy dot-coms than healthy ones. # 2006 Elsevier B.V. All rights reserved. Keywords: Data mining; Dot-coms; Financial health; Rough sets; Rules; Sensitivity analysis 1. Introduction The number of companies that primarily conducted their business using the Web (dot-coms) grew tremen- dously in the 1990s. According to Hendershott [13], ‘‘dot-coms sell products through a Web-based store (online retailers and auction sites) and/or generate revenue by selling market opportunities to merchants who want access to the dot-com’s users’’. Many startup companies used the medium to open new businesses or provide new channels for existing businesses. Compa- nies like Amazon.com changed the retailing business. The growth also led to the formation of online subsidiaries of existing companies, who utilized the medium to expand their businesses. In addition, several online companies grew to facilitate communication and transact business using the technology. The dot-coms rapidly raised enormous amounts of venture capital. Their apparent success was, however, short lived. Their stock prices started to tumble from March 2000 and their market value declined rapidly. According to Mathieson [18], there were 121 closures of dot-coms worldwide in the last quarter of 2000 and webmer- gers.com reported that 564 dot-com ventures failed between January 2000 and June 2001 (59% of them were B2C firms). The financial pundits had already conjectured that the demise of dot-coms was inevitable. From 1999, it was clear that the amount of venture capital that had funded the growth of dot-coms was almost exhausted and there were no new sources [16]. It has been suggested that dot-coms’ inability to improve revenues and earnings, failure to post profits, attempt to capture a major market share quickly, and a tendency to operate in limited geographical areas were among the main causes of failure [30]. Others identified an emphasis on providing free services, lack of solid business models, limited vision, improper channel management, heavy emphasis on meaningless adver- tising, and a wish to expand quickly were also major factors that hurt dot-coms [26]. The authors of [37] www.elsevier.com/locate/im Information & Management 43 (2006) 835–846 * Tel.: +852 2241 5845; fax: +852 2858 5614. E-mail address: [email protected]. 0378-7206/$ – see front matter # 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.im.2006.08.001

Deciding the financial health of dot-coms using rough sets

Embed Size (px)

Citation preview

Page 1: Deciding the financial health of dot-coms using rough sets

www.elsevier.com/locate/im

Information & Management 43 (2006) 835–846

Deciding the financial health of dot-coms using rough sets

Indranil Bose *

School of Business, The University of Hong Kong, Room 730 Meng Wah Complex, Pokfulam Road, Hong Kong, PR China

Received 25 July 2005; received in revised form 16 May 2006; accepted 1 August 2006

Available online 12 September 2006

Abstract

We conducted an empirical investigation of dot-coms from a financial perspective. Data from the financial statements of 240

such businesses was used to compute financial ratios and the rough sets technique was used to evaluate whether the financial ratios

could predict financial health of them based on available data. The most predictive financial ratios were identified and interesting

rules concerning the financial ratios and financial health of dot-coms were discovered. It was shown that rough sets performed a

satisfactory job of predicting financial health and were more suitable for detecting unhealthy dot-coms than healthy ones.

# 2006 Elsevier B.V. All rights reserved.

Keywords: Data mining; Dot-coms; Financial health; Rough sets; Rules; Sensitivity analysis

1. Introduction

The number of companies that primarily conducted

their business using the Web (dot-coms) grew tremen-

dously in the 1990s. According to Hendershott [13],

‘‘dot-coms sell products through a Web-based store

(online retailers and auction sites) and/or generate

revenue by selling market opportunities to merchants

who want access to the dot-com’s users’’. Many startup

companies used the medium to open new businesses or

provide new channels for existing businesses. Compa-

nies like Amazon.com changed the retailing business.

The growth also led to the formation of online

subsidiaries of existing companies, who utilized the

medium to expand their businesses. In addition, several

online companies grew to facilitate communication and

transact business using the technology. The dot-coms

rapidly raised enormous amounts of venture capital.

* Tel.: +852 2241 5845; fax: +852 2858 5614.

E-mail address: [email protected].

0378-7206/$ – see front matter # 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.im.2006.08.001

Their apparent success was, however, short lived.

Their stock prices started to tumble from March 2000

and their market value declined rapidly. According to

Mathieson [18], there were 121 closures of dot-coms

worldwide in the last quarter of 2000 and webmer-

gers.com reported that 564 dot-com ventures failed

between January 2000 and June 2001 (59% of them

were B2C firms). The financial pundits had already

conjectured that the demise of dot-coms was inevitable.

From 1999, it was clear that the amount of venture

capital that had funded the growth of dot-coms was

almost exhausted and there were no new sources [16].

It has been suggested that dot-coms’ inability to

improve revenues and earnings, failure to post profits,

attempt to capture a major market share quickly, and a

tendency to operate in limited geographical areas were

among the main causes of failure [30]. Others identified

an emphasis on providing free services, lack of solid

business models, limited vision, improper channel

management, heavy emphasis on meaningless adver-

tising, and a wish to expand quickly were also major

factors that hurt dot-coms [26]. The authors of [37]

Page 2: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846836

explored the managerial, organizational, and environ-

mental characteristics responsible for failure of five

prominent dot-coms. Some researchers also indicated

that the discrepancy between actual performance and

future expectations was exemplified by their high price-

to-earnings ratios in early 2000 [4]. Our research

attempted to find out whether financial ratios could have

predicted the viability of dot-coms. The analysis of the

relationship between financial ratios as independent

variables and financial health as dependent variable was

studied using the method of rough sets. We also

identified financial ratios that were highly predictive of

the financial future of the companies and determined

business rules linking the financial ratios in order to

identify whether a dot-com was financially healthy.

It is important to identify companies that market

technologies that are viable, have a solid business

model, and can sustain funding and growth. This is

likely to be more important because of the expectation

of a second dot-com boom; e.g., a survey by Actinic

Software of small and medium retailers reported a 60%

increase in Web based sales in November and December

2004. They remarked ‘‘Each year adds to the feeling

that the original dot-com boom hype wasn’t so much

wrong as too early’’ [14]. Also, though the $20.9 billion

invested in 2876 deals in 2004 is only 20% of the

venture capital spending in 2000 it was the first increase

in 3 years, suggesting that the dot-com phenomenon is

not over [32]. Fortune magazine also reported that ‘‘The

not-so-surprising result is that the Internet industry isn’t

just back, it’s better than it was before’’ [15].

2. The method of rough sets

Rough sets theory deals with uncertainty and

vagueness in the classification of objects in a set. It

is founded on the belief that every object is associated

with some information and objects that are associated

with the same information are similar and belonged to

the same class. Although somewhat similar to statistical

probability theory and other soft approaches, such as

fuzzy sets, the rough sets approach is significantly

different. Fuzzy sets are useful for handling imprecision

when objects in a data set do not exclusively belong to a

single category. However, rough sets theory is useful

when ‘‘the classes into which the objects are to be

classified are imprecise, but can nevertheless be

approximated with precise (crisp) sets’’ [22]. Rough

sets theory is dependent on the calculation of a lower

approximation for a class, an upper approximation for a

class, and an accuracy of approximation for a class. The

lower approximation is the set of all objects of the data

set that can be certainly classified as its elements and the

upper approximation is the set of all objects of the data

set that can be classified as its elements. The accuracy of

classification is the ratio of the cardinalities of the lower

approximation and upper approximation [27].

The main advantage of rough sets theory is that it does

not require any a priori information about the probability

distribution of the data or any knowledge about the grade

of membership in a class. It finds its use in ‘‘data

reduction (elimination of superfluous data), discovery of

data dependencies, estimation of data significance,

generation of decision (control) algorithms from data,

approximate classification of data, discovery of simila-

rities or differences in data, discovery of patterns in data,

and discovery of cause–effect relationships’’ [28].

Appendix A contains a description of the steps involved

in rough sets analysis. For a more mathematical

description and an illustrative example see [34].

The method was chosen for our research because it

allowed us to identify important features, and in dot-

com financial analysis it was necessary to know which

aspects of financial statements were needed to decide its

financial future. Another advantage was that it led to the

creation of rules linking the dependent to the

independent variables and this could be valuable to

financial analysts. A classification system should

provide an explanation of the decision. In rough sets

analysis this is provided in the rules that are discovered

by the system. Another benefit is that the rules are based

on the data and are supported by real examples, thereby

improving the validity of the results and making them

understandable.

3. Literature review

The prediction of financial health of a company is

similar to the problem of predicting bankruptcy, which

is a well-researched area where several techniques have

been used. (Some notable examples include the use of

multiple discriminant analysis by Altman [2], multi-

criteria decision aid methodology by Dimitras et al. [6],

support vector machines by Fan and Palaniswami [7],

neural networks by Fletcher and Goss [8], recursive

partitioning algorithm by Frydman et al. [9], mathe-

matical programming methods by Gupta et al. [11], self-

organizing maps by Magnusson et al. [17], logit

analysis by Ohlson [23], multi-factor model by

Vermeulen et al. [38], and probit analysis by Zmijewski

[42].) Many other papers have reviewed the techniques

used for bankruptcy prediction, such as those of Bose

and Mahapatra [3], Dimitras et al. [6], Wong et al. [40],

and Wong and Selvi [41].

Page 3: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846 837

In the area of business, rough sets have been used for

business failure prediction, database marketing, and

financial investment. Tay and Shen [36] reviewed the

various methodologies and software that have been used

for analyzing these problems and provided a table of

papers related to each of these areas. The use of rough

sets was also reported for building a trading system that

tracked the S&P 500 index and made recommendations

about buying and selling stocks based on financial

indicators [31]. Wang [39] used a fuzzy rough sets

method. An early work [35] used rough sets for business

failure prediction. Using five financial ratios, it showed

that the model exhibited a 1% improvement in accuracy

for predicting business failures over that provided by

multiple discriminant analysis. Slowinski and Zopou-

nidis [33] used 12 financial ratios and compared the

rough sets approach with other statistical approaches for

evaluation of bankruptcy risk and observed that the

rough sets method was superior to multi-attribute

sorting. The problem of determining likelihood of

acquisition of companies using financial ratios was

studied by Slowinski et al. [34]. Using a sample of 30

firms and 10 financial ratios, the rough sets method was

found to perform better than discriminant analysis. The

value closedness relation was shown to classify objects

when no rules matched the objects. Dimitras et al. [5]

used a sample of 80 Greek firms and compared the

rough sets method to inductive learning, discriminant

analysis, and logit analysis. They showed that the rough

sets method performed significantly better than other

methods in terms of classification accuracy when 12

financial ratios were used for predicting failure or

success of companies. Using a sample of 200 US based

companies, McKee [19] showed that the rough sets

method could generate 88% accuracy when predicting

bankruptcies for firms using two key financial ratios. A

hybrid technique consisting of a genetic algorithm

coupled with rough sets was used by McKee and

Lensberg [21] to predict bankruptcies for US public

companies using a sample of 291 companies. It was

shown to perform better than independent rough sets

analysis. Another example of a hybrid approach for

business failure prediction was discussed in [1], where

the rough sets techniquewas used to reduce the number of

independent variables and generate rules linking them

with the dependent variable. For instances that matched

any of the rules, classification was performed using rough

sets. For those that did not, classification was made using

a neural network. Using a sample of 2400 Korean firms,

this hybrid approach was shown to perform better than

approaches based on discriminant analysis or neural

networks. McKee [20] showed that the rough sets based

bankruptcy prediction methodology did not provide any

improvement in accuracy over the bankruptcy signaling

rates provided by auditors. His experiments were

conducted using a sample of 291 companies and 11

financial variables. A successful use of the hybrid

methodology for a different business application was

described in [12], where neural networks and rough sets

were used to predict bank holding patterns.

4. Numerical experimentation

Data from financial statements of 240 dot-coms were

collected using the WRDS (Wharton Research Data

Services) database. The companies identified as dot-

coms either had the suffix .com in their name or

conducted business primarily using the Web. Half of

these companies were identified as unhealthy if their

stock prices were less than 10 cents (output = 0) and the

remaining ones were classified as financially healthy

(output = 1). Some well-known examples of financially

healthy dot-coms were Amazon.com, Ebay Inc., and

Netflix Inc. The stock prices for all companies were

recorded on 30 June 2001.

Based on the literature, 24 financial ratios were

identified and calculated for the 240 firms based on

numbers in their financial statements. For both

financially healthy and unhealthy dot-coms the ratios

were calculated for the year 2000. These ratios

(variables) are shown in Table 1. Of the 24 variables,

1–15 were identified as most popularly used in literature

related to prediction of financial health and the

remaining were constructed to capture the novelty of

dot-coms. The dot-coms have been noted as having

inflated stock prices and large numbers of traded shares

as well as low income [4,30]. Though there were no

guidelines on financial ratios specifically important to

dot-coms, ratios 16–24 was used to reflect their sales,

earnings, cash, income, market capitalization, and stock

prices.

The data gathered was divided into two groups. The

training group accounted for 80% of the data and the

testing group for the remaining 20%. A separate random

testing data was needed, because the model built from

the training data could be overspecialized and could

generate good results only when it was used to analyze

records that were similar to the training data. To

increase the generalization ability of the model and

estimate the true error rate, it was therefore normal to

check the model on a testing dataset that it had not

previously seen. This is called simple cross-validation.

To further remove the bias in obtaining the error rate,

multiple random training and testing samples were

Page 4: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846838

Table 1

List of financial ratios used in data analysis

Variables Symbols Description

1 WC/TA Working capital/total assets

2 TD/TA Total debt/total assets

3 CA/CL Current assets/current liabilities

4 OI/TA Operating income/total assets

5 NI/TA Net income/total assets

6 CF/TD Cash flow/total debt

7 QA/CL Quick assets/current liabilities

8 CF/S Cash flow/sales

9 RE/TA Retained earnings/total assets

10 S/TA Sales/total assets

11 GP/TA Gross profit/total assets

12 NI/SE Net income/shareholders’ equity

13 C/TA Cash/total assets

14 I/S Inventory/sales

15 QA/TA Quick assets/total assets

16 P/E Price per share/earnings per share

17 S/MC Sales/market capitalization

18 CA/TA Current assets/total assets

19 LTD/TA Long term debt/total assets

20 OI/S Operating income/sales

21 OI/MC Operating income/market

capitalization

22 C/S Cash/sales

23 CA/S Current assets/sales

24 NI/(TA �TL) Net income/

(total assets � total liabilities)

formed and the results obtained from the different

training–testing sample pairs were averaged to give the

result with least bias. This resampling technique is

known as multiple cross-validation. Here 10 random

training samples and 10 random testing samples were

created. Of these, two had equal representation of

healthy and unhealthy firms and were examples of

balanced samples. The remaining samples were

unbalanced. The composition of the 10 random samples

is shown in Table 2.

Table 2

Composition of 10 random samples

Sample no. Training Testing

Unhealthy Healthy Total Unhealthy Healthy Total

1 99 93 192 21 27 48

2 98 94 192 22 26 48

3 100 92 192 20 28 48

4 93 99 192 27 21 48

5 96 96 192 24 24 48

6 100 92 192 20 28 48

7 96 96 192 24 24 48

8 97 95 192 23 25 48

9 93 99 192 27 21 48

10 98 94 192 22 26 48

For the analysis, the ROSETTA software was used

[29] because of its high level of success in classification

type problems and its ease of use. With the different

choices for methods of discretization, reduction, and

classification, 20 possible combinations were obtained.

Using each, the rough sets analysis was conducted over

10 random samples of training and testing. The results

are shown in Table 3. Type I recorded the percentage of

cases when an unhealthy firm was correctly identified.

Type II accuracy showed the percentage of cases when a

healthy firm was correctly identified. The percentages

represented average accuracy values calculated for the

10 samples. The best overall accuracy for testing was

72.1%. This row is italicized. The Equal frequency

method of discretization tended to generate high testing

accuracies compared to other methods because it

created discretization intervals in such a way that there

were an equal number of objects in each interval. The

genetic algorithm method for generation of reducts

always performed better than the Johnson’s algorithm

as it provided a more exhaustive search of the search

space for reducts. Type I accuracy was found to be

higher than Type II in 16 cases out of 20. This meant that

the method of rough sets could identify unhealthy firms

better than healthy firms.

In the next set of experiments, case 9 (with the

highest average testing accuracy) was explored in more

details to obtain further insights. A unique feature of the

rough sets method was its generation of rules that

played an important role in predicting the output.

ROSETTA listed the rules for the different samples and

provided some statistics for them as well (including

information about support, length, LHS coverage, and

RHS coverage). The rule support is defined as the

number of records in the training data that fully exhibit

the property described by the IF–THEN conditions. The

length is defined as the number of conditional elements

in the IF part while the rule coverage is defined as the

fraction of records in the training sample that are

identifiable by the IF or THEN parts. The LHS coverage

is defined as the fraction of training records that satisfied

the IF conditions of the rule. It is obtained by dividing

the support of the rule by the total number of records in

the training sample. On the other hand, the RHS

coverage is defined as the fraction of training records

that satisfied the THEN condition (and is obtained by

dividing the support of the rule by the number of records

in the training sample that satisfied the THEN

condition). In order to find the most significant rules

for each sample, they were sorted according to the value

of their support. The generated rules did not differ much

in terms of length (most were of length 2 or 3) and thus

Page 5: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846 839

Table 3

Testing results for different choices of discretization, reduction, and classification

Method of discretization Method of reduction Method of classification Testing

Type I Type II Overall

Boolean reasoning Genetic algorithm Standard voting 70.5 62.5 66.3

Boolean reasoning Genetic algorithm Voting with object tracking 70.8 60.3 65.2

Boolean reasoning Johnson’s algorithm Standard voting 65.5 59.5 62.5

Boolean reasoning Johnson’s algorithm Voting with object tracking 66.4 59.2 62.7

Entropy algorithm Genetic algorithm Standard voting 58.8 63.6 60.6

Entropy algorithm Genetic algorithm Voting with object tracking 58.2 63.8 60.4

Entropy algorithm Johnson’s algorithm Standard voting 48.4 40.7 44.8

Entropy algorithm Johnson’s algorithm Voting with object tracking 48.4 40.7 44.8

Equal frequency Genetic algorithm Standard voting 73 72.1 72.1

Equal frequency Genetic algorithm Voting with object tracking 73.4 70.3 71.5

Equal frequency Johnson’s algorithm Standard voting 68.3 70.3 69.4

Equal frequency Johnson’s algorithm Voting with object tracking 69. 69.5 69.4

Naı̈ve algorithm Genetic algorithm Standard voting 64.9 60.6 62.1

Naı̈ve algorithm Genetic algorithm Voting with object tracking 63.7 60.4 61.3

Naı̈ve algorithm Johnson’s algorithm Standard voting 52.1 43.2 47.7

Naı̈ve algorithm Johnson’s algorithm Voting with object tracking 52.1 43.2 47.7

Semi-naı̈ve algorithm Genetic algorithm Standard voting 64.6 61.1 62.1

Semi-naı̈ve algorithm Genetic algorithm Voting with object tracking 63.7 59.7 60.8

Semi-naı̈ve algorithm Johnson’s algorithm Standard voting 53.1 42.8 47.9

Semi-naı̈ve algorithm Johnson’s algorithm Voting with object tracking 53.1 42.8 47.9

support was used as the criterion for ranking them. The

‘best’ rules for the 10 samples of case 9 are given in

Table 4. They have length �3. Of the 10 ‘best’ rules

generated, 7 were associated with prediction of

unhealthy firms (output = 0) and 3 with prediction of

healthy firms. For this sample, the ‘best’ rule was not

unique and no two samples had the same ‘best’ rule. Of

the two ‘best’ rules with the highest support, the first

was obtained from sample 1 and dealt with prediction of

unhealthy dot-coms (output = 0). It was supported by 23

Table 4

‘Best’ rule statistics for the different samples

No. of

samples

‘Best’ rule

1 LTD/TA([0.00006, 0.03)) AND OI/S([*, �0.78))) OUTPU

2 NI/TA([�0.88, �0.23)) AND S/MC([*, 0.10))) OUTPUT(

3 EBIT/TA([*, �0.60)) AND S/TA([0.34, 1.26))) OUTPUT(

4 CF/TD([0.003, *)) AND GP/TA([*, 0.11))) OUTPUT(0)

5 S/TA([1.17, *)) AND OI/S([�0.04, *)) AND OI/MC([�0.00

OUTPUT(1)

6 GP/TA([0.43, *)) AND S/MC([1.18, *)) AND OI/S([�0.04,

OUTPUT(1)

7 NI/TA([�0.87, �0.22)) AND S/MC([*, 0.13))) OUTPUT(

8 CF/TD([0.0004, *)) AND S/MC([*, 0.15))) OUTPUT(0)

9 GP/TA([0.48, *)) AND NI/(TA �TL)([�0.74, 0.009))) OU

GP/TA([0.48, *)) AND NI/SE([�0.74, 0.009))) OUTPUT(

10 CF/TD([0.003, *)) AND S/MC([*, 0.12))) OUTPUT(0)

records in the test set (47.9%). This rule specified an

unlimited lower bound for the variable OI/S, which

suggested that any dot-com with a highly negative

operating income compared to sales was likely to be

financially unhealthy. Another ‘best’ rule that was

obtained from sample 5 was supported by 24 records in

the test set (50%) and was concerned with prediction of

healthy dot-coms. The IF conditions for this rule

specified unlimited upper bounds for the variables S/

TA, OI/S, and OI/MC. Thus if a dot-com could generate

Support Length LHS

coverage

RHS

coverage

T(0) 23 2 0.12 0.23

0) 20 2 0.10 0.20

0) 17 2 0.09 0.17

15 2 0.08 0.16

08, *))) 24 3 0.13 0.25

*))) 19 3 0.10 0.21

0) 20 2 0.10 0.21

19 2 0.01 0.20

TPUT(1) 21 2 0.11 0.21

1) 21 2 0.11 0.21

21 2 0.11 0.21

Page 6: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846840

Table 5a

Samples showing number of reducts and rules

Sample no.

1 2 3 4 5 6 7 8 9 10 Average

No. of reducts 8656 8596 9003 8954 8758 8595 8562 8964 8915 8534 8753.7

No. of rules 17486 18402 19309 18297 18500 18270 18150 18631 18540 17909 18349.4

large sales as well as large operating income then it was

likely to be financially healthy.

The total number of rules and reducts that were

generated for case 9 are shown in Table 5a. The highest

number of rules and reducts were generated for sample

3 and there was no obvious relationship between

number of reducts and rules. In order to find the relative

importance of the variables, the relative frequency of

occurrence of the variables in the reducts generated

from the samples was computed. The result is shown in

Table 5b. Variables RE/TA, S/MC, and S/TAwere found

to occur most frequently in the reducts.

Since the method generated a large number of rules,

it was important to know whether all rules played a role

in the classification process. The effect of the number of

generated rules on Type I, Type II, and overall

accuracies of the 10 samples is listed in Tables 6a

Table 5b

Samples showing relative frequency of variables in generated reducts

Variables Sample no.

1 2 3 4 5 6

WC/TA 0.036 0.037 0.037 0.038 0.035 0.

TD/TA 0.043 0.043 0.040 0.043 0.043 0.

CA/CL 0.036 0.035 0.036 0.035 0.035 0.

OI/TA 0.037 0.039 0.039 0.043 0.040 0.

NI/TA 0.039 0.039 0.040 0.037 0.041 0.

CF/TD 0.043 0.045 0.042 0.042 0.042 0.

QA/CL 0.037 0.035 0.037 0.039 0.034 0.

CF/S 0.046 0.046 0.044 0.045 0.044 0.

RE/TA 0.044 0.046 0.047 0.047 0.045 0.

S/TA 0.049 0.045 0.044 0.044 0.048 0.

GP/TA 0.04 0.044 0.044 0.042 0.047 0.

NI/SE 0.038 0.036 0.038 0.038 0.038 0.

C/TA 0.046 0.043 0.041 0.040 0.041 0.

I/S 0.041 0.042 0.039 0.041 0.039 0.

QA/TA 0.043 0.044 0.044 0.044 0.043 0.

P/E 0.043 0.046 0.047 0.045 0.047 0.

S/MC 0.044 0.047 0.045 0.044 0.045 0.

CA/TA 0.045 0.044 0.045 0.045 0.045 0.

LTD/TA 0.043 0.044 0.043 0.043 0.048 0.

OI/S 0.041 0.039 0.039 0.041 0.041 0.

OI/MC 0.041 0.039 0.041 0.042 0.04 0.

C/S 0.041 0.041 0.043 0.039 0.039 0.

CA/S 0.042 0.042 0.043 0.043 0.040 0.

NI/(TA �TL) 0.038 0.035 0.039 0.038 0.038 0.

and 6b. In Table 6a, the percentage of rules generated

was reduced from 100 to 10%, in steps of 10%, and the

reduced rule sets were used for classification of testing

records. The total number of generated rules varied

across the samples and the minimum, maximum, and

average number of rules used for testing across the

samples were computed. As might be expected, the

overall testing accuracy decreased as the percentage of

rules was reduced, and the only exception occurred

when the percentage of rules was reduced from 50 to

40%. However, it is interesting to note that when the

percentage of rules was reduced from 100 to 10%, the

reduction in testing accuracy was only 4.3%. This

clearly indicated that redundant rules were generated

when using the rough sets analysis. Table 6b shows the

effect of increasing the number of generated rules from

100 to 1000 on Type I, Type II, and overall testing

7 8 9 10 Average relative

frequency

038 0.037 0.038 0.036 0.037 0.037

044 0.041 0.042 0.040 0.041 0.042

038 0.037 0.036 0.037 0.036 0.036

041 0.041 0.039 0.041 0.039 0.040

039 0.039 0.039 0.039 0.040 0.039

042 0.045 0.043 0.044 0.041 0.043

037 0.036 0.037 0.039 0.035 0.037

044 0.043 0.046 0.049 0.046 0.045

046 0.045 0.047 0.045 0.046 0.046

044 0.044 0.045 0.045 0.044 0.045

045 0.045 0.045 0.044 0.044 0.044

035 0.039 0.036 0.037 0.037 0.037

044 0.042 0.042 0.043 0.043 0.043

042 0.043 0.045 0.041 0.041 0.041

043 0.044 0.043 0.045 0.043 0.044

042 0.047 0.044 0.045 0.044 0.045

047 0.045 0.046 0.046 0.046 0.045

043 0.043 0.043 0.043 0.044 0.044

045 0.043 0.043 0.041 0.042 0.044

039 0.042 0.042 0.039 0.044 0.041

040 0.041 0.041 0.042 0.041 0.041

040 0.041 0.039 0.039 0.042 0.041

043 0.042 0.040 0.041 0.042 0.042

035 0.036 0.036 0.036 0.039 0.037

Page 7: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846 841

Table 6a

Effect of removal of rules on testing accuracy

Percentage of rules in testing Minimum rules Maximum rules Average rules Type I Type II Overall

100 17399 19312 18420 74.6 72.1 72.9

90 15659 17381 16578 74.6 71.7 72.7

80 13919 15450 14736 74.2 71.7 72.5

70 12179 13518 12894 74.2 71.7 72.5

60 10439 11587 11052 74.2 70.8 72.1

50 8700 9656 9210 73.8 70.8 71.9

40 6960 7725 7368 73.3 71.7 72.1

30 5220 5794 5526 74.4 70.4 71.9

20 3480 3862 3684 73.9 68.8 70.8

10 1740 1931 1842 72.1 68.4 69.8

Table 6b

Effect of addition of rules on testing accuracy

No. of rules

in testing

Unclassified

observations

Type I Type II Overall

100 141 61.2 40.2 50.2

200 76 68.8 54.9 61.3

300 49 70.1 59.5 64.2

400 31 72.9 64.4 67.9

500 18 73.7 67.1 69.8

600 11 74.3 67.9 70.6

700 3 74.9 68.8 71.3

800 2 74.3 68.8 71.0

900 2 73.8 68.8 70.8

1000 0 72.9 68.4 70.2

All 0 72.9 72.4 72.3

accuracies over 10 samples of case 9. If the number of

rules generated was not large enough, then some records

could not be classified (they were not covered by any

rule). If 1000 rules or more were used, there were no

unclassified records in any sample. As the number of

Table 7a

Effect of length of rules on testing accuracy

No. of samples Rules of length � 3 Rules of l

Type I Type II Overall Type I

1 90.5 55.6 70.8 57.1

2 77.3 73.1 75.0 50.0

3 90.0 71.4 79.2 55.0

4 77.8 80.9 79.2 70.4

5 75.0 62.5 68.8 62.5

6 80.0 57.1 66.7 70.0

7 54.2 83.3 68.8 50.0

8 82.6 68.0 75.0 73.9

9 77.8 61.9 70.8 44.4

10 81.8 50.0 64.6 40.9

Average 78.7 66.4 71.9 57.4

rules was increased from 100 to 1000, the overall testing

accuracy increased by 20%, although most of the

improvement (19.6%) took place when the number of

rules was increased from 100 to 500. This meant that

instead of considering all generated rules it was

sufficient to consider only 500 rules to generate results

with reasonable accuracy. From this we concluded that

though ROSETTA generated many rules from the

training samples most of them were redundant and did

not significantly contribute to increasing the overall

testing accuracy.

Next, the impact of the length of the rules on testing

accuracy was evaluated and shown in Table 7a. The

rules were divided into two groups—rules of length less

than or equal to 3 and greater than 3. Classification was

then performed on the 10 random samples of case 9

using these two groups, exclusively. On average, rules

of length less than or equal to 3 gave rise to a higher

overall testing accuracy. Subsequently, a t-test (shown

in Table 7b) was conducted to confirm if the difference

between the two rule groups was significant. Indeed, the

ength > 3 All rules

Type II Overall Type I Type II Overall

66.7 62.5 95.2 55.6 72.9

80.8 66.7 63.6 76.9 70.8

75.0 66.7 80.0 71.4 75.0

80.9 75.0 81.5 85.7 83.3

75.0 68.8 75.0 62.5 68.8

82.1 77.1 80.0 67.9 72.9

75.0 62.5 50.0 75.0 62.5

84.0 79.2 78.3 80.0 79.2

90.5 64.6 74.1 80.9 77.1

69.2 56.3 68.2 65.4 66.7

77.9 67.9 74.6 72.1 72.9

Page 8: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846842

Table 7b

Result of t-test on difference in overall testing accuracy for the two

rule groups

Mean difference in overall accuracy 3.9

Standard error of difference 2.8

Degrees of freedom (unequal variance) 16.1

t-Statistic �1.4

p-Value 0.1

difference was not significant at the 5% level of

significance. It was found that Type I accuracy was

higher than Type II accuracy for smaller rules and the

pattern was reversed for larger rules. This indicated that

the dataset led to the formation of a smaller number of

rules that could correctly identify financially unhealthy

firms.

Rules created for predicting unhealthy and healthy

firms were then compared and are shown in Table 8a.

In general more rules were generated for financially

healthy firms and they were larger in length, had smaller

support, and smaller LHS and RHS coverage. Table 8b

shows the result of t-tests conducted to explore if the

differences in number, support, length, LHS coverage,

and RHS coverage were significant at 5% level.

Table 8a

Comparison of statistics of rules for outputs ‘0’ and ‘1’

No. of

samples

Rules with outcome ‘0’

Number Support Length LHS

coverage

RHS

coverag

1 8205 3.2 3.5 0.017 0.032

2 8540 3.1 3.5 0.016 0.032

3 9172 2.9 3.6 0.015 0.029

4 8588 2.9 3.6 0.015 0.031

5 9078 2.9 3.5 0.015 0.030

6 8816 3.0 3.5 0.016 0.030

7 8508 3.1 3.5 0.016 0.032

8 8767 2.9 3.5 0.015 0.030

9 8519 2.8 3.6 0.015 0.030

10 8751 3.3 3.5 0.017 0.034

Average 8694.4 3.0 3.5 0.016 0.031

Table 8b

Result of t-tests

Statistics Number Suppo

Mean difference �910.9 0.2

Standard error of difference 133.7 0.1

Degrees of freedom (unequal variance) 17.9 14.3

t-Statistic �6.8 3.8

p-Value <0.0001* 0.002

The differences were significant across all parameters

and are denoted using * in Table 8b. This indicated the

rules associated with financially healthy firms were

poorer in quality than those associated with unhealthy

firms when quality was judged in terms of number,

length, support, LHS and RHS coverage of rules. The

difference in quality was also statistically significant,

indicating that this analysis consistently did a better

job in identifying financially unhealthy firms.

The impact of changing various parameters asso-

ciated with the testing procedure on the average Type I,

Type II, and overall testing accuracies across all

samples was investigated next. Table 9a shows that

when the training sample size was reduced to 10 and

20% of its original size, the average Type I accuracy

decreased. However, the overall testing accuracy was

highest when the training sample size was reduced by

10%. This indicated that the rough sets method was

suffering from overtraining. Table 9b shows that when

the testing sample size was decreased by 10% the

overall testing accuracy increased. But as expected,

when the testing sample size was reduced to 20%, the

testing accuracy increased. In all numerical experiments

conducted to this point, the ratio of training sample size

Rules with outcome ‘1’

e

Number Support Length LHS

coverage

RHS

coverage

9281 2.8 3.7 0.014 0.029

9862 2.8 3.8 0.014 0.029

10137 2.6 3.7 0.014 0.028

9583 2.9 3.7 0.015 0.029

9422 2.9 3.7 0.015 0.030

9324 2.8 3.7 0.015 0.031

9642 2.9 3.7 0.015 0.030

9761 2.8 3.7 0.014 0.029

9883 2.8 3.7 0.015 0.029

9158 2.7 3.7 0.014 0.029

9605.3 2.8 3.7 0.015 0.029

rt Length LHS coverage RHS coverage

�0.2 0.001 0.002

0.01 0.0003 0.0005

12.2 14.3 14.1

�12.6 3.8 3.3

1* <0.0001* 0.0021* 0.0052*

Page 9: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846 843

Table 9b

Effect of size of testing sample on testing accuracy

Reduction of size of

testing sample (%)

Type I Type II Total

0 73.7 70.8 71.9

10 72.3 71.2 71.4

20 73.0 74.4 73.7

Table 9c

Effect of ratio of training to testing on testing accuracy

Ratio of training to testing data Type I Type II Total

80:20 72.2 71.1 71.3

75:25 69.4 72.7 71

70:30 73.4 74.7 73.9

60:40 66.4 76.8 71.0

Table 9d

Effect of balance of sample on testing accuracy

Balance of sample Type I Type II Total

Balanced 58.3 72.9 65.6

Unbalanced 75.7 70.6 72.7

Table 9a

Effect of size of training sample on testing accuracy

Reduction of size

of training sample (%)

Type I Type II Total

0 73.7 70.8 71.9

10 71.1 74.7 72.7

20 65.6 76.9 71.5

Table 10

ANOVA for ‘best’ model after removal of insignificant four-way, three-wa

Source Degrees of freedom Sum of

Balance 1 78.4

Ratio of training to testing 3 75.5

Size of testing sample 2 28.8

Size of training sample 2 12.4

Table 11

Comparison of rough sets approach with other approaches

Methods Training

Type I Type II

Rough sets 100 99.6

Logistic regression 71.6 79.4

Discriminant analysis 65.9 78.8

and testing sample size was kept fixed at 80:20. Table 9c

shows the impact of changing this ratio on testing

accuracies. The best result was obtained when the ratio

was kept at 70:30. Apparently a ratio of 80:20 led to

overtraining. A sample is balanced if it had equal

representation of financially healthy and unhealthy

firms. The data used in our research consisted of two

balanced samples and eight unbalanced ones. Table 9d

shows that unbalanced samples tended to have higher

overall testing accuracy.

ANOVA tests were conducted using four factors:

balance of sample, ratio of training to testing sample

size, testing sample size, and training sample. There

were no significant four-way, three-way, or two-way

interactions between any of them. After removing all

interactions, the ANOVA was performed once more and

the results are shown in Table 10. From the F-test it was

clear that none of the factors were significant at 5%

level. This indicated that there was no significant effect

of changing the balance of the sample, training to

testing sample size, training sample size, and testing

sample size on average overall testing accuracies across

all samples.

The ‘best’ classification result obtained was then

compared with that using two other statistical

approaches: logistic regression and discriminant ana-

lysis. Both of these have often been used for predicting

financial health of corporations and thus were adopted

for comparison. The results are reported in Table 11

where it can be seen that the rough sets method

generally performed better than the others in terms of

classification accuracy on the training as well as the

testing samples.

y, and two-way interactions

squares Mean squares F-statistic p-Value

78.4 4.4 0.04

25.2 1.4 0.24

14.4 0.8 0.45

6.2 0.4 0.71

Testing

Overall Type I Type II Overall

99.8 73 72.1 72.1

75.4 65.3 65.3 65.2

72.3 62.3 69.4 65.8

Page 10: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846844

Tab

le1

2

Co

mp

aris

on

of

curr

ent

rese

arch

wit

hp

rio

rre

sear

chin

bu

sin

ess

fail

ure

s

Mo

del

feat

ure

sS

low

insk

ian

d

Zo

pou

nid

is[3

3]

Slo

win

ski

etal

.[3

4]

Gre

co

etal

.[1

0]

Dim

itra

s

etal

.[5

]

McK

ee[1

9]

Ah

n

etal

.[1

]

McK

eean

d

Len

sber

g[2

1]

McK

ee[2

0]

Curr

ent

rese

arch

Co

un

try

/sco

pe

Gre

ece

Gre

ece

Gre

ece

Gre

ece

US

AK

ore

aU

SA

US

AD

ot-

com

s

Var

iab

leco

din

gS

ubje

ctiv

eO

bje

ctiv

eS

ubje

ctiv

eS

ub

ject

ive

Ob

ject

ive

Ob

ject

ive

Ob

ject

ive

Ob

ject

ive

Ob

ject

ive

Dev

elo

pm

ent

sam

ple

size

39

60

39

80

10

02

20

01

44

15

01

92

Val

idat

ion

sam

ple

size

00

03

81

00

20

01

44

14

14

8

Nu

mb

ero

fvar

iab

les

inm

od

el4

10

45

28

34

24

Nu

mb

ero

fru

les

inm

od

el1

52

41

51

02

7N

ot

rep

ort

ed1

77

/86

18

349

.4

Acc

ura

cyon

dev

elopm

ent

sam

ple

100%

100%

94.9

0%

100%

93%

Not

report

ed83%

100%

99.7

9%

Acc

ura

cyon

val

idat

ion

sam

ple

None

use

dN

one

use

dN

one

use

d76.3

%88%

89.1

%80%

68%

72.0

8%

5. Conclusion

From the experiments conducted it can be concluded

that the rough sets method correctly classified

financially healthy and unhealthy dot-coms. Three

variables RE/TA, S/MC, and S/TA appeared to be the

three major predictors as they occurred most frequently

in the generated reducts. A disadvantage of the rough

sets method, however, was that it often resulted in the

generation of many rules associated with each class. By

increasing and decreasing the number of rules and

checking their effect on the overall testing accuracy it

became apparent that most of the rules were redundant

and only the top 10% were (in terms of support) really

important and needed to be retained. This observation is

important for users who want to find out the ‘best’ rules

that describe financially healthy and unhealthy firms.

Another important observation was that the analysis

was better at identifying financially unhealthy than

healthy firms. The rules that were associated with

unhealthy firms were smaller in length and enjoyed

higher support and coverage. This is significant, as

identifying unhealthy firms correctly seems to be more

important, when studying the financial health of firms.

Indeed, incorrectly identifying a financially healthy dot-

com as unhealthy merely represents a missed invest-

ment opportunity and is comparatively less harmful.

Another key finding was that factors such as balance

of sample, size of training sample, size of testing

sample, and ratio of training to testing sample size did

not play any major role in impacting the outcome of the

experiments.

In Table 12, the current research is compared to prior

research in the area of business failures. Ours is the only

research effort involving dot-coms. The classification

accuracy of our research on the training sample was

comparable to those reported in other research.

However, the accuracy of classifying the testing or

validation sample was satisfactory but less than that

reported in prior research. This might be due to the

inherent difficulty in predicting the financial health of

dot-coms as opposed to any publicly traded corporation.

Acknowledgements

The author wants to thank the anonymous reviewers of

this paper for their constructive comments which greatly

improved the overall quality and readability of this paper.

This research is supported by a grant received by the

author from the Research Grants Council of Hong Kong

under the Competitive Earmarked Research Grants

scheme (Project code HKU 7131/04E). The author also

Page 11: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846 845

thanks Mr. James Pang for his help in conducting the

numerical experiments reported in this paper.

Appendix A

In rough sets theory, data is presented in the form of

an information table where the rows represent objects

and the columns represent attributes. The first step is

discretization of the independent attributes. It involves

searching for cuts that determine intervals for numeric

attributes and can be considered to be a way to convert

numerical attributes to category attributes. The

ROSETTA software used in this research provides

various methods for discretization. For a detailed

description of ROSETTA see Øhrn [24].

The second step is the formation of the reducts. This

is the minimal attribute subset that provides the same

quality of classification as the original set of attributes.

An information table usually has more than one reduct.

There are two ways of forming them in ROSETTA

software: Johnson’s and Genetic algorithms.

The information table in the rough sets approach can

also be considered to be a decision table that contains

the condition (independent) attributes and the decision

(dependent) attributes. From this table a set of rules can

de derived; they are of the form:

IF hconjunction of conditioniTHEN hdecisioni:

The decision rule may be exact or approximate. If, the

set of dependent attributes is a singleton set, then the

decision rule is exact, otherwise it is approximate. The

approximate decision rule indicates that, based on

available evidence, the rough sets approach is unable

to classify the object as belonging to a single class. Each

decision rule is described by its accuracy and coverage.

Rules are considered to be of high quality if they have

high accuracy and also show high coverage. However, it

has been reported in [25] that usually accuracy and

coverage of rules have an inverse relationship.

The last step of the rough sets analysis is the

classification of the unknown or the test data based on

the rules and the reducts that are obtained from the

training data. In ROSETTA, there are two methods that

are used for classification—standard voting and voting

with object tracking. In standard voting, the rules are

unordered and a certainty factor is assigned to each

decision class when a rule is applied for classifying an

unknown object. The object is identified as belonging to

that class for which the certainty factor is highest, after

all rules are applied. In voting with object tracking, the

original objects from which the rules are derived are

tracked so that original objects that are found to be

overlapping can be given less weight than in the case of

standard voting.

References

[1] B.S. Ahn, S.S. Cho, C.Y. Kim, The integrated methodology of

rough set theory and artificial neural network for business failure

prediction, Expert Systems with Applications 18, 2000, pp. 65–

74.

[2] E.I. Altman, Financial ratios, discriminant analysis and the

prediction of corporate bankruptcy, The Journal of Finance

23, 1968, pp. 589–609.

[3] I. Bose, R.K. Mahapatra, Business data mining—a machine

learning perspective, Information & Management 30 (3),

2001, pp. 211–225.

[4] T. Dahlberg, J. Horluck, Internet hype overreaction—and what

we can learn from it, Working paper, Department of Manage-

ment, University of Aarhus, Denmark. Available: http://www.

econ.au.dk/afv/WP/wp2001-14jh-et-al.pdf.

[5] A.I. Dimitras, R. Slowinski, R. Susmaga, C. Zopounidis, Busi-

ness failure prediction using rough sets, European Journal of

Operational Research 114, 1999, pp. 263–280.

[6] A.I. Dimitras, S.H. Zanakis, C. Zopounidis, A survey of business

failures with an emphasis on prediction methods and industrial

applications, European Journal of Operational Research 90,

1996, pp. 487–513.

[7] A. Fan, M. Palaniswami, Selecting bankruptcy predictors using a

support vector machine approach, in: Proceedings of the Inter-

national Joint Conference on Neural Networks, Como, Italy, 24–

27 July, 2000.

[8] D. Fletcher, E. Goss, Forecasting with neural networks: an

application using bankruptcy data, Information & Management

24 (3), 1993, pp. 159–167.

[9] H. Frydman, E.I. Altman, D.-L. Kao, Introducing recursive

partitioning for financial classification: the case of financial

distress, The Journal of Finance 40 (1), 1985, pp. 269–291.

[10] S. Greco, B. Matarazzo, R. Slowinski, A new rough set approach

to evaluation of bankruptcy risk, in: C. Zopounidis (Ed.), New

Operational Tools in the Management of Financial Risks,

Kluwer Academic Publishers, Dordrecht, 1998, pp. 121–136.

[11] Y.P. Gupta, R.P. Rao, P.K. Bagghi, Linear goal programming as

an alternative to multivariate discriminant analysis: a note,

Journal of Business Finance and Accounting 17 (4), 1990, pp.

593–598.

[12] R.R. Hashemi, L.A. Le Blanc, C.T. Rucks, A. Rajaratnam, A

hybrid intelligent system for predicting bank holding structures,

European Journal of Operational Research 109, 1998, pp. 390–

402.

[13] R.J. Hendershott, New value: wealth creation (and destruction)

during the internet boom, Journal of Corporate Finance 10, 2004,

pp. 281–299.

[14] ITWales.com, The second coming of the dot-com boom (January

17, 2005). Available: http://www.itwales.com/998550.htm.

[15] A. Lashinsky, The boom is back, Fortune 8, 2006, pp. 48–56.

[16] N. Macaluso, Dot-com VC funding down sharply in Q2, Ecom-

mercetimes.com (August 2, 2000). Available: http://www.

ecommercetimes.com/news/articles2000/000802-7.shtml.

[17] C. Magnusson, A. Arppe, T. Eklund, B. Back, H. Vanharanta, A.

Visa, The language of quarterly reports as an indicator of change

in the company’s financial status, Information & Management 42

(4), 2005, pp. 561–574.

Page 12: Deciding the financial health of dot-coms using rough sets

I. Bose / Information & Management 43 (2006) 835–846846

[18] S. Mathieson, Vultures circle over dotcom casualties, Computing

(January 18, 2001). Available: http://www.vnunet.com/Analysis/

1116848.

[19] T.E. McKee, Developing a bankruptcy prediction model via the

rough sets theory, International Journal of Intelligent Systems in

Accounting, Finance and Management 9, 2000, pp. 159–173.

[20] T.E. McKee, Rough sets bankruptcy prediction models versus

auditor signaling rates, Journal of Forecasting 22, 2003, pp. 569–

586.

[21] T.E. McKee, T. Lensberg, Genetic programming and rough sets:

a hybrid approach to bankruptcy classification, European Journal

of Operational Research 138, 2002, pp. 436–451.

[22] H. Nurmi, J. Kacprzyk, M. Fedrizzi, Probabilistic, fuzzy and

rough concepts in social choice, European Journal of Opera-

tional Research 95, 1996, pp. 264–277.

[23] J.A. Ohlson, Financial ratios and the probabilistic prediction of

bankruptcy, Journal of Accounting Research 1980, pp. 109–131.

[24] A. Øhrn, Discernibility and rough sets in medicine: tools and

applications, PhD Thesis, Department of Computer and Infor-

mation Science, Norwegian University of Science and Technol-

ogy (NTNU), Trondheim, Norway, NTNU report 1999:133, IDI

report, ISBN 82-7984-014-1, 1999. Available: http://www.idi.

ntnu.no/�aleks/thesis/.

[25] A. Øhrn, ROSETTA Technical Reference Manual, Department

of Computer and Information Science, Norwegian University of

Science and Technology (NTNU), Trondheim, Norway, 2000,

Available: http://rosetta.lcb.uu.se/general/resources/manual.pdf.

[26] R. Oliva, J.D. Sterman, M. Giese, Limits to the growth in the new

economy: exploring the ‘get big fast’ strategy in e-commerce,

Working paper, Harvard Business School. Available: http://

www.people.hbs.edu/roliva/research/dotcom/OSG_2.0.pdf.

[27] Z. Pawlak, Rough sets, International Journal of Computer and

Information Sciences 11 (5), 1982, pp. 341–356.

[28] Z. Pawlak, J. Grzymala-Busse, R. Slowinski, W. Ziarko, Rough

sets, Communications of the ACM 38 (11), 1995, pp. 89–95.

[29] ROSETTA homepage, 2004. Available: http://rosetta.lcb.uu.se/

general/.

[30] A. Sharma, Dot-coms’ coma, The Journal of Systems and

Software 26, 2001, pp. 101–104.

[31] L. Shen, H.T. Loh, Applying rough sets to market timing

decisions, Decision Support Systems 37 (4), 2004, pp. 583–597.

[32] J.W. Schoen, Signs of life in the venture capital market, MSNBC

(January 24, 2005). Available: http://msnbc.msn.com/id/

6853330/.

[33] R. Slowinski, C. Zopounidis, Application of the rough set

approach to evaluation of bankruptcy risk, International Journal

of Intelligent Systems in Accounting, Finance and Management

4 (1), 1995, pp. 27–41.

[34] R. Slowinski, C. Zopounidis, A.I. Dimitras, Prediction of com-

pany acquisition in Greece by means of the rough set approach,

European Journal of Operational Research 100, 1997, pp. 1–15.

[35] A. Szladow, D. Mills, Tapping financial databases, Business

Credit 95 (7), 1993, p. 8.

[36] F.E.H. Tay, L. Shen, Economic and financial prediction using

rough sets model, European Journal of Operational Research

141, 2002, pp. 641–659.

[37] J. Thornton, S. Marche, Sorting through the dot bomb rubble:

how did the high-profile e-tailers fail? International Journal of

Information Management 23, 2003, pp. 121–138.

[38] E.M. Vermeulen, J. Spronk, N. Van der Wijst, The application of

the multi-factor model in the analysis of corporate failure, in: C.

Zopounidis (Ed.), Operational Tools in the Management of

Financial Risks, Kluwer Academic Publishers, Dordrecht,

1998, pp. 59–73.

[39] Y.-F. Wang, Mining stock price using fuzzy rough set system,

Expert Systems with Applications 24, 2003, pp. 13–23.

[40] B.K. Wong, V.S. Lai, J. Lam, A bibliography of neural network

business applications research: 1994–1998, Computers and

Operations Research 27, 2000, pp. 1045–1076.

[41] B.K. Wong, Y. Selvi, Neural network applications in finance: a

review and analysis of literature, Information & Management 34

(3), 1998, pp. 129–139.

[42] M.E. Zmijewski, Methodological issues related to the estimation

of financial distress prediction models, Studies on Current

Econometric Issues in Accounting Research 1984, pp. 39–82.

Indranil Bose is a associate professor of

Information Systems at School of Business,

the University of Hong Kong. His degrees

include B.Tech. from the Indian Institute of

Technology, M.S. from University of Iowa,

M.S. and Ph.D. from Purdue University. He

has research interests in telecommunica-

tions, data mining, electronic commerce,

and supply chain management. His teach-

ing interests are in telecommunications,

database management, data mining, and decision science. His pub-

lications have appeared in Communications of AIS, Communications

of the ACM, Computers and Operations Research, Decision Support

Systems and Electronic Commerce, Ergonomics, European Journal of

Operational Research, Information and Management, and Operations

Research Letters.