6
Environmental and Molecular Mutagenesis 11A43-448 (1988) Commentary Computer Assisted Short-Term Test Battery Design: Some Questions John Ashby Central Toxicology Laboratory, Cheshire, England Yander et al. [ 19871recently commented in this Journal on a paper in Nature by Lave and Omenn [1986]. The topic was the selection of “batteries” of assays for detecting (predicting) carcinogens. Ennever and Rosenkranz [ 19871 responded positively to the paper by Yander et al. All three of these articles used statistical criteria to assess the success or failure of an assay or battery of assays. Thus, Yander et a]. required a 90% probability of a chemical being a noncarcinogen after testing in an undefined (variable) battery selected from the seven core assays maintained in their laboratory. Equally, Ennever and Rosenkranz described, for example, the in vitro sister chromatid exchange (SCE) assay as having a sensitivity to carcinogens of 0.6667 (66.67%) and a specificity for noncarcinogens of 0.8904 (89.04%). Yander et al. also observed: “We realize that any of these assays can yield a positive result, which may or may not be biologically significant. To accommodate such a response, an additional test can be included in the battery, which will either corroborate or negate the above response.” All of the papers had a feel of “chance” and “statistics;” sound biological or chemical assessment of data was lacking. Given that there is no general agreement on which short-term tests to use, and in what order to use them in, the prospect of having a computer decide for us is superficially appealing. However, if we are to use to the full the extensive computerized databases of genotoxicity data that are now available, some fundamental questions must be answered. Four of these are posed below, and possible answers are proposed. The object of a battery (as discussed by the above authors) is as follows: The perfect short-term test is assumed to be one having a sensitivity and specificity of 1, i.e., no false predictions in either direction. In the absence of such a test an approach to the ideal is sought through the selection of a battery of assays, which, when used together, might achieve a greater predictive value than any of the component assays. One needs a minimum number of assays to increase the overall sensitivity nearer to 1, but if too many assays are added, overall specificity will drop too low. What is Received December 2, 1987; revised and accepted January 25, 1988. Address reprint requests to John Ashby, Central Toxicology Laboratory, ICI PIC, Alderley Park, Macclesfield, Cheshire SKlO 4TJ, England 0 1988 Alan R. Liss, Inc.

Computer assisted short-term test battery design: Some questions

Embed Size (px)

Citation preview

Environmental and Molecular Mutagenesis 11A43-448 (1988)

Commentary Computer Assisted Short-Term Test Battery Design: Some Questions John Ashby

Central Toxicology Laboratory, Cheshire, England

Yander et al. [ 19871 recently commented in this Journal on a paper in Nature by Lave and Omenn [1986]. The topic was the selection of “batteries” of assays for detecting (predicting) carcinogens. Ennever and Rosenkranz [ 19871 responded positively to the paper by Yander et al. All three of these articles used statistical criteria to assess the success or failure of an assay or battery of assays. Thus, Yander et a]. required a 90% probability of a chemical being a noncarcinogen after testing in an undefined (variable) battery selected from the seven core assays maintained in their laboratory. Equally, Ennever and Rosenkranz described, for example, the in vitro sister chromatid exchange (SCE) assay as having a sensitivity to carcinogens of 0.6667 (66.67%) and a specificity for noncarcinogens of 0.8904 (89.04%). Yander et al. also observed: “We realize that any of these assays can yield a positive result, which may or may not be biologically significant. To accommodate such a response, an additional test can be included in the battery, which will either corroborate or negate the above response.” All of the papers had a feel of “chance” and “statistics;” sound biological or chemical assessment of data was lacking. Given that there is no general agreement on which short-term tests to use, and in what order to use them in, the prospect of having a computer decide for us is superficially appealing. However, if we are to use to the full the extensive computerized databases of genotoxicity data that are now available, some fundamental questions must be answered. Four of these are posed below, and possible answers are proposed.

The object of a battery (as discussed by the above authors) is as follows: The perfect short-term test is assumed to be one having a sensitivity and specificity of 1, i.e., no false predictions in either direction. In the absence of such a test an approach to the ideal is sought through the selection of a battery of assays, which, when used together, might achieve a greater predictive value than any of the component assays. One needs a minimum number of assays to increase the overall sensitivity nearer to 1, but if too many assays are added, overall specificity will drop too low. What is

Received December 2, 1987; revised and accepted January 25, 1988.

Address reprint requests to John Ashby, Central Toxicology Laboratory, ICI PIC, Alderley Park, Macclesfield, Cheshire SKlO 4TJ, England

0 1988 Alan R. Liss, Inc.

444 Ashby

sought, therefore, is a balance between sensitivity and specificity, with the number of assays employed acting as the constraint.

The four questions are as follows:

1. One of the most obvious ways to increase the number of negative results from a battery is to include insensitive assays in it. Thus, the Drosophilu SLRL assay is sensitive to few carcinogens beyond alkylating agents; for example, it does not detect key genotoxic arylamine carcinogens such as 2-acetylaminofluorene (2-AAF). It is therefore debatable whether this assay should be included in a battery. Similar concerns apply to host-mediated bacterial assays, the rodent germ cell assays, the in vitro hepatocyte DNA repair assay, etc. Thus, when testing a new chemical related in structure to 2-AAF, in a battery containing the Lh-osophilu SLRL assay, the expected negative SLRL response should have little influence on the eventual prediction of carcinogenic potential. The question is, should not external knowledge dictate if a particular assay should form a part of a battery, rather than considering the effect its inclusion will have on the statistical “power” of the battery?

2. The second question is whether or not it is defensible to regard in vitro and in vivo assays as equal in a battery. To illustrate this concern a partial reproduction of Tables 1 and 2 of Ennever and Rosenkranz [1987] is given below, as Table I.

TABLE I. Partial Reproduction of Tables 1 and 2 From Ennever and Rosenkranz [19871*

Salmonella (Sty) L5 178Y (Mly) Mouse bone marrow (in vitro) (in vitro) micronucleus (Mnt) P (CA/r)

t + t 0.898 + 0.634 +

- + + 0.574 + 0.209

Salmonella (Sty) UDS Chromosome aberrations P (CA/r) (in vitro) (in vitro) (in vitro) - - + 0.382

-

- -

*The upper portion of this table is from Table 1 of Ennever and Rosenkranz [1987] and the lower portion is from their Table 2.

The “posterior probability value” [P(CA/r)] must be equal to or less than 0.1 for a chemical to be considered a noncarcinogen; the smaller the number, the greater probability of noncarcinogenicity . Also, by inference, the greater the number, the higher the probability of carcinogenicity. Thus, for the four test profiles shown in the upper part of Table I, P(CA/r) varies from 0.898 to 0.209, almost the whole allowed range (1 to 0). This indicates that the battery produces a wide range of carcinogeni- cityhoncarcinogenicity probabilities for these four activity profiles. But in each case the chemical is a rodent (in vivo) somatic cell mutagen. For the first three profiles the probabilities of carcinogenicity are therefore suggested here to remain high and to be essentially equal. This assumes that the chemical was administered by a relevant route of exposure and at a defensible dose level, and that the derived data represent sound and reproducible genetic effects. Even the fourth test outcome is suggested here to carry weight, despite the fact that this profile of activities is very rare. So the difference implied by the computations between 0.898 and 0.209 does not make

Short-Term Test Battery Design 445

biological sense for an in vivo mutagen. Of greater concern is that in Table 2 of the Ennever and Rosenkranz paper (lower part of Table I herein,) an isolated positive result in an in vitro cytogenetic assay, accompanied by negative results in both the Salmonella assay and an assay for UDS in vitro, is accorded a P(CA/r) value of 0.382; i.e., this in vitro cytogenetic result is inferred to indicate a lower probability of noncarcinogenicity (i.e., a higher probability of carcinogenicity) than does the isolated positive in vivo response shown in Table I above [P(CA/r) of 0.2091. This surely stands biology on its head.

3. Tennant et al. [1987] recently concluded that some rodent carcinogens appear to be nongenotoxic, and consequently, that we should consider the reality of a group of nongenotoxic rodent carcinogens. Malling and Chu [ 19741 were among the first to suggest this possibility, and Clayson the most recent [Clayson, 19871. If such carcinogens are assumed to exist, then in vitro assays calibratedhalidated against them will have their sensitivity values artificially elevated in the event of a positive response. This therefore poses the following question-if a carcinogen is found to be negative in several well-established genotoxicity assays, should it not then be removed from the calibrationhalidation set used to establish the sensitivity of these assays? For example, sodium saccharin must surely be accepted as a nongenotoxin because of its metabolic stability, its chemical inertness, and its nonmutagenicity to Salmonella. However, this rodent bladder carcinogen is clastogenic in vitro at dose levels of 12 mg/ml or greater [for example, in CHL cells; Ishidate, 19871. Formally, that positive response contributes to the sensitivity of the CHL assay. The alternative is to accept sodium saccharin as a nongenotoxic carcinogen and to remove it from the “validation set.” This example applies to other in vitro assays and to many other chemicals [Brusick, 19871. The problem outlined above is important because a significant proportion of the NCI/NTP carcinogens have the characteristics of being nongenotoxic carcinogens [Ashby and Tennant, 19881, and it is from such databases that in vitro assay performance characteristics are calculated. As an example, Tennant et al. [1987] established the highly selective rodent carcinogen reserpine to be inactive in four in vitro assays, consistent with its chemical structure. They therefore classed it as a putative nongenotoxic carcinogen. As things stand, were reserpine now to be reported active as a recombinogen in yeast, then the carcinogen specificity of that assay would automatically be enchanced. In contrast, it is suggested that based on the data of Tennant et al., reserpine should be set aside as a nongenotoxic carcinogen and that structure-activity and genotoxicity assay databases should be purged of it. If, eventually, mechanistic studies establish it as a genotoxic carcinogen, then it could be reentered into those databases; structure-activity conclusions could be drawn and the yeast assay in question given credit for its complementary properties.

4. Finally, should it not be expected that an in vitro assay will be “over- sensitive,” i.e., have a specificity of less than l? In fact, it is suggested here that if an in vitro assay approaches a specificity value of 1 we should be concerned regarding its value as a screening test. To illustrate this point, Salmonella mutagenicity data for 28 amino-aromatic chemicals tested for carcinogenicity in rats and mice by the U.S. NCI/NTP are summarized in Figure 1 . For this group, the Salmonella assay has a sensitivity of 94% and a specificity of 18%. The mathematical approach would disqualify the Salmonella assay from further use based on that specificity value, yet this is the primary assay of virtually all practical approaches to carcinogen prediction. It would be equally easy to establish a specificity of 100% and a sensitivity of 0% for

Chemical structure CAS No. Alert from Salmonella Rodent

(year) (shown in bold) (Zeiger) (NCIINTP studies) NTP TR No. structure mutagenicity carcinogenicity

Short-Term Test Battery Design 447

the Salmonella assay using a series of nongenotoxic chlorinated mouse liver carcinogens or thiourea rat thyroid carcinogens. To average such disparate datasets seems to be indefensible based on the results of Tennant et al. [1987], but such is the origin of the 61.2% sensitivity value for the Salmonella assay used by Ennever and Rosenkranz [Pet-Edwards et al. , 19851.

Short-term tests are in a difficult period, but things will not improve until biological/chemical questions such as those posed above are considered, rather than statistical ones. As an illustration, Brusick [1987] recently edited a Special Edition of Mutation Research that focused on the problem of false-positive in vitro assay responses thought to be caused by extreme culture conditions (high dose, toxicity, changes in osmolarity, and pH). Brusick and his contributors attempted to define conditions of test that would obviate such presumed technical false-positive re- sponses. That biological/chemical approach will probably be of greater long-term value to the science than a re-jigging of batteries in order to outweigh technical false-positive responses.

Perhaps the greatest danger is that general disenchantment with short-term test batteries might lead to a devaluation of mammalian mutagenicity as a significant toxicological endpoint. To avoid this it is suggested that we concentrate on the quantitative assessment of individual datasets, the genetic validity of individual assays, and the careful statistical design of experiments; not on assay “form” or battery statistics. There clearly is a need for a range of mutagenicity/genotoxicity assays, but each extra one conducted should be for either a chemical or a biological reason, not for a statistical one. Thus, we return to the quotation of Yander et al. given earlier herein-‘ ‘the corroboration or negation of biologically insignificant responses by means of the conduct of a further assay.” If the original positive response which gave cause for concern was similar to the activity of NaCl in CHL cells [Ishidate, 19871, then this could be “corroborated” by a wide range of additional assays [Brusick, 19871. It is suggested here that the intervention of human judgment, based on experience, is the only way to resolve such issues.

Ennever and Rosenkranz [ 19871 stated in their reply to Yander et al. [ 19871 that the battery of assays which they recently derived was offered only to illustrate that such a thing was possible, not that the battery itself should be generally adopted. It

Fig. 1. Twenty-eight ring-substituted monocyclic aromatic amines had been tested in rats and mice by the NCUNTP by March 1986, as listed above. Only the last could be considered as not presenting an alert to genotoxicity based on consideration of chemical structure, and this was because metabolic activation of the amino group to a DNA-reactive species will probably be attenuated or abolished by the adjacent carboxylic acid. The latter compound is also nonmutagenic to Salmonella and reported as a noncarcino- gen. Only two other compounds were nonmutagenic, both being mono-chloro-mono-methylanilines. Seventeen compounds were concluded to be carcinogens, and two showed equivocal evidence of carcinogenicity (E; regarded as noncarcinogens in subsequent calculations consistent with Tennant et al. [1987]). For this set of chemicals the sensitivity of the Salmonella assay is 94% (a’ 0.94) and its specificity 18% (a- 0.18). In some cases the mutagenicity observed may have been associated with a ring nitro group. One way of viewing these data is that the Salmonella assay detects a genotoxic potential that is not always expressed in vivo, another is that the assay is of no value because of its low specificity. These data were abstracted from Ashby and Tennant [ 19881, and the Salmonella data were provided for that paper by Zeiger and co-workers (see Supplements to Environmental Mutagenesis).

448 Ashby

is suggested that the time has come for such a statistically derived battery to be formally proposed so that its validity can be assessed; the days of theoretical batteries ended with the paper by Tennant et al. [ 19871.

REFERENCES

Ashby J, Tennant RW (1988): Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 compounds tested in rodents by the US NCUNTP. Mutat Res 204:17-115.

Brusick D (ed) (1987) Genotoxicity produced in cultured mammalian cell assays by treatment condition. Mutat Res 189(Special 1ssue)l-79.

Clayson (1987) The need for biological risk assessment in reaching decisions about carcinogens. Mutat Res 185:243-270.

Ennever FK, Rosenkranz HS (1987) Reply. Environ Mutagen 9:359-361. Ishidate M (1987) “Data Book of Chromosomal Aberration Test I n Vitro.” Tokyo: Life Science

Lave LB, Omenn GS (1986) Cost-effectiveness of short-term tests for carcinogenicity. Nature

Malling HV, Chu EHY (1974) Development of Mutational Model System for study of carcinogenesis. In Ts’O POP, DiPaolo JA (eds): “Chemical Carcinogenesis.” New York: Marcell Dekker Inc., pp

Pet-Edwards J, Chankong V, Rosenkranz HS, Haimes YY (1985) Application of carcinogenicity prediction and battery selection (CPBS) method to the Gene-Tox data base. Mutat Res

Tennant RW, Margolin BH, Shelby MD, Zeiger E, Haseman JK, Spalding J, Caspary W, Resnick M, Anderson B, Minor R (1987) Prediction of chemical carcinogenicity in rodents from in virro genetic toxicity assays. Science 236:933-941.

Yander G , Lin GHY, Mermelstein R (1987) Selection of batteries in an industrial setting. Environ Mutagen 9:357-358.

Information Centre.

324129-34.

545 -563.

153: 187-200.