3
Rule Induction and Statistics Author(s): Anna Hart Source: The Journal of the Operational Research Society, Vol. 38, No. 5 (May, 1987), pp. 470- 471 Published by: Palgrave Macmillan Journals on behalf of the Operational Research Society Stable URL: http://www.jstor.org/stable/2582743 . Accessed: 28/06/2014 09:27 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Palgrave Macmillan Journals and Operational Research Society are collaborating with JSTOR to digitize, preserve and extend access to The Journal of the Operational Research Society. http://www.jstor.org This content downloaded from 91.223.28.130 on Sat, 28 Jun 2014 09:27:29 AM All use subject to JSTOR Terms and Conditions

Rule Induction and Statistics

Embed Size (px)

Citation preview

Page 1: Rule Induction and Statistics

Rule Induction and StatisticsAuthor(s): Anna HartSource: The Journal of the Operational Research Society, Vol. 38, No. 5 (May, 1987), pp. 470-471Published by: Palgrave Macmillan Journals on behalf of the Operational Research SocietyStable URL: http://www.jstor.org/stable/2582743 .

Accessed: 28/06/2014 09:27

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Palgrave Macmillan Journals and Operational Research Society are collaborating with JSTOR to digitize,preserve and extend access to The Journal of the Operational Research Society.

http://www.jstor.org

This content downloaded from 91.223.28.130 on Sat, 28 Jun 2014 09:27:29 AMAll use subject to JSTOR Terms and Conditions

Page 2: Rule Induction and Statistics

Journal of the Operational Research Society Vol. 38, No. 5

RISKS IN THE EVALUATION OF ACCEPTABLE RISK

The methodology advocated by Whittaker,l whilst intuitively attractive, displays a

lack of practical robustness which may prohibit its universal acceptance. Worse

still, it may be based on false assumptions.

The data presented in Figure 1 of Whittaker's paper, for life expectation of

Canadian males, and summarized in his Figure 2 are not truly representative of the

longer-term changes in life expectancy over the last one hundred or so years, and the

increase may in fact be purely an artifact of the rapidly changing social and welfare

conditions during the early part of the century rather than a product of the

technology he seeks to justify.

A close scrutiny of the English life table (series DH1, No. 16, Table 22)

showing the data for England and Wales reveals that, between 1838 and 1900, life

expectancy at ages 45 and 65 was generally falling for both males and females, whilst

the corresponding figures for the younger ages increased slightly, with an important

contribution to mortality clearly remaining in the first year of life.

During the period 1900 to 1950 there was a general increase in the life

expectations over all ages, particularly at the lower age points, but as the graph in

Whittaker's article shows, this general upward trend is now levelling off.

It could be argued that having eliminated the prime sources of infant mortality,

any extrapolation into the future is likely to be a dynamic balance between the

better standards of geriatric and health care on the one hand and the risks from

choice of lifestyle and exploitation of technology on the other (e.g. transport,

occupational hazards, smoking, lack of exercise, etc.).

It is possible, therefore, that within a few years the graph may become

horizontal at best, or develop a negative slope at worst. Neither case would make

its employment in the way advocated very useful, and in any case, the gradient of

such a curve would be extremely sensitive to sampling and other short-term

variations, which would make its robustness questionable.

Finally, to be attractive to the individual in society, any methodology on risk

evaluation should take into account quality of life as well as longevity.

Paddock Wood, Kent DAVID G. SMITH

Reference

1. JOHN D. WHITTAKER (1986) Evaluation of acceptable risk, J. Opl Res. Soc. 37, 541-

547.

RULE INDUCTION AND STATISTICS

Recent papers have examined the power of the ID3 algorithm.l'2 I welcome this work

because it portrays such algorithms as methods of data exploration whose results need

investigation and interpretation. If the training set of examples input to the

algorithm is complete and correct (i.e. describing all possible types of problems

accurately), then the induced rules, which work for the training set, will obviously

work in general. In practice, training sets are not complete and seldom absolutely

correct. Knowledge about data exploration and statistical method is therefore

necessary. Results can be sensitive to changes in the input, and the quality of the

output depends heavily on the quality of the input. It is beneficial to know about

470

This content downloaded from 91.223.28.130 on Sat, 28 Jun 2014 09:27:29 AMAll use subject to JSTOR Terms and Conditions

Page 3: Rule Induction and Statistics

Letters and Viewpoints

sampling and the dangers of extrapolation of results. Results are suggestive, not certain.

Previous work by statisticians has stressed the importance of evaluating results and of pruning induced trees. Pruning involves growing a decision tree to its full length and then mathematically evaluating the benefit of extra rules against the cost of producing them. Superfluous rules are then pruned back. Results are possible rather than proven for general cases, and there is no evidence of causality or explanation. The method uses no knowledge about the problem domain, and so the induced results should always be examined by an expert who can explain, justify or contradict them. They should also be tested on other examples.

Induction can highlight problems and questions as much as it can suggest rules, and accepting the output without careful investigation is dangerous. For expert systems it gives a starting point, rather than an answer, in knowledge acquisition. For certain problems, other statistical techniques will give more efficient and more meaningful results. I have commented on this elsewhere.4'5

School of Computing, ANNA HART

Lancashire Polytechnic

References

1. J. MINGERS (1986) Expert systems - experiments with rule induction. J. Opl Res.

Soc. 37, 1031-1037.

2. J. MINGERS (1987) Expert systems - rule induction with statistical data. J. Opl

Res. Soc. 38, 39-47.

3. L. BREIMAN, J.H. FRIEDMAN, R.A. OLSHEN and C.J. STONE (1984) Classification and

Regression Trees. Wadsworth International, California.

4. A. HART (1985) Machine induction: practical issues and advice. Proceedings of the

Second International Conference on Expert Systems. Learned Information, London.

5. A. HART (1986) Knowledge Acquisition for Expert Systems. Kogan Page, London.

471

This content downloaded from 91.223.28.130 on Sat, 28 Jun 2014 09:27:29 AMAll use subject to JSTOR Terms and Conditions