13
COMPUTERS AND BIOMEDICAL RESEARCH 29, 494–506 (1996) ARTICLE NO. 0036 EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN MARIANNE GRETZ,RALF-DIETER SCHMITT, AND MARTIN THOMAS Boehringer Mannheim GmbH, Mannheim, Germany Received February 28, 1996 Differing results of the same search request processed on different hosts led the authors to investigate this issue more thoroughly. Two authors, two journal titles, two drug names, and two topics of a general medical nature were retrieved under identical circumstances and conditions on the hosts DATA-STAR 1 , DIALOG 1 , DIMDI, and STN. Comparing the figures revealed that none of the nine search requests produced an identical result on all hosts. Some of the search results differed only slightly, while the searches in the basic index produced considerable discrepancies. 1996 Academic Press, Inc. INTRODUCTION Hosts, suppliers of databases, obtain databases from the respective producers and make them available to the end-user via their own retrieval systems. The selection of a host’s databases depends on technical, economic, and market criteria (branch of industry, profitability, etc.). For the end-user, however, the availability of a database (which host does offer this particular database?), its costs, and the searcher’s familiarity with the search languages prevail. The data- base EMBASE produced by the Dutch media consortium Reed Elsevier is available on eight hosts. Boehringer Mannheim has contracts with six of these hosts. At the Mannheim site, however, EMBASE is usually searched on two hosts only (DATA-STAR, 1 DIMDI [Deutsches Institut fuer Medizinische Doku- mentation und Information]), under particular circumstances two more hosts (DIALOG, 1 STN [Scientific and Technical Network]) are accessed. The unintended observation that the same search request on EMBASE on 2 different hosts had led to different results brought up the project to carry out selected search requests on the four hosts relevant for Boehringer Mannheim. 1 Since 1995 the hosts DIALOG and DATA-STAR have adopted the name KNIGHT-RIDDER. Both hosts continue working with their own specific retrieval systems dating from the time before the takeover of DATA-STAR by DIALOG (Knight-Ridder). 494 0010-4809/96 $18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

Embed Size (px)

Citation preview

Page 1: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

COMPUTERS AND BIOMEDICAL RESEARCH 29, 494–506 (1996)ARTICLE NO. 0036

EMBASE Remains EMBASE—On Every Host?A Comparative Analysis of EMBASE on the

Hosts DATA-STAR, DIALOG, DIMDI, and STN

MARIANNE GRETZ, RALF-DIETER SCHMITT, AND MARTIN THOMAS

Boehringer Mannheim GmbH, Mannheim, Germany

Received February 28, 1996

Differing results of the same search request processed on different hosts led the authorsto investigate this issue more thoroughly. Two authors, two journal titles, two drug names,and two topics of a general medical nature were retrieved under identical circumstances andconditions on the hosts DATA-STAR1, DIALOG1, DIMDI, and STN. Comparing the figuresrevealed that none of the nine search requests produced an identical result on all hosts. Someof the search results differed only slightly, while the searches in the basic index producedconsiderable discrepancies. 1996 Academic Press, Inc.

INTRODUCTION

Hosts, suppliers of databases, obtain databases from the respective producersand make them available to the end-user via their own retrieval systems. Theselection of a host’s databases depends on technical, economic, and marketcriteria (branch of industry, profitability, etc.). For the end-user, however, theavailability of a database (which host does offer this particular database?), itscosts, and the searcher’s familiarity with the search languages prevail. The data-base EMBASE produced by the Dutch media consortium Reed Elsevier isavailable on eight hosts. Boehringer Mannheim has contracts with six of thesehosts. At the Mannheim site, however, EMBASE is usually searched on twohosts only (DATA-STAR,1 DIMDI [Deutsches Institut fuer Medizinische Doku-mentation und Information]), under particular circumstances two more hosts(DIALOG,1 STN [Scientific and Technical Network]) are accessed.

The unintended observation that the same search request on EMBASE on 2different hosts had led to different results brought up the project to carry outselected search requests on the four hosts relevant for Boehringer Mannheim.

1 Since 1995 the hosts DIALOG and DATA-STAR have adopted the name KNIGHT-RIDDER.Both hosts continue working with their own specific retrieval systems dating from the time beforethe takeover of DATA-STAR by DIALOG (Knight-Ridder).

494

0010-4809/96 $18.00Copyright 1996 by Academic Press, Inc.All rights of reproduction in any form reserved.

Page 2: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

EMBASE ON DATA-STAR, DIALOG, DIMDI, AND STN 495

Boehringer Mannheim (BM) is a research-based German pharmaceutical com-pany with sites at Mannheim and at the two towns of Tutzing and Penzberg inUpper Bavaria. At the Mannheim site, the core of the company founded morethan 150 years ago, today roughly 6,000 employees are active in research anddevelopment, production, marketing and sales of drugs, diagnostic devices (teststrips, computers for blood analysis), and biochemicals. There are three informa-tion departments (Medical Information Department, Chemical-pharmaceuticalInformation Department, Diagnostics-related Information Department) respon-sible for the timely and comprehensive supply of information, and a central librarycares for all book orders and delivery of documents. The Medical InformationDepartment (MID), where this study has been carried out, processes roughly1500 searches per year (not included are regularly repeated searches, SDIs);EMBASE is third in usage in this department (1).

MATERIAL AND METHODS

The database EMBASE evolved from the printed Excerpta Medica referenceservice. EMBASE is a large bibliographic database containing references tothe international biomedical and pharmaceutical literature from 1974 to thepresent (2).

The objective of this study was to find out if a search request carried out usingthe respective retrieval languages of the hosts would yield an identical result. Itwas not the goal to retrieve all citations pertaining to a drug name or a medicaltopic by using a sophisticated search strategy; the emphasis was rather on thecomparability of the procedure. In case of differing results the reasons were tobe investigated.

To this end two author names, a rather ‘‘tricky’’ name (at least to Germans),Ch. Nagant de Deuxchaisnes, and a widespread name, J. Lewis, were chosen.Furthermore, two journal names were selected, the well-known Journal of theAmerican Medical Association (JAMA), and Pharmacotherapy, a journal onlyexisting since 1981. As EMBASE emphasizes its focus on drugs (‘‘the drugdatabase’’—that’s their slogan), two drugs were also included: Daltroban, aninvestigational drug by Boehringer Mannheim, and Atenolol, a beta-blockermarketed under the name Tenormin since 1976 by Zeneca (in the past calledICI). In addition, the topics hypertension and myocardial infarction were selected.Searches were restricted to either one-word free text terms (except myocardialinfarction), or to controlled descriptors like heart infarction, searchable like aone-word term, or to entries with a comparable status like journal titles or authornames. Search steps combining several items were excluded (except limiting asearch by publication year) in order to be able to trace potential divergences.In search requests including several combinations, divergences might be attribut-able to various factors, e.g., to peculiarities of the retrieval language.

The searches pertaining to the author name Ch. Nagant de Deuxchaisnes andto the BM investigational drug Daltroban were analyzed in detail to find out theorigin of the differences. In both cases the number of citations found was large

Page 3: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

GRETZ, SCHMITT, AND THOMAS496

enough to provide sensible figures, while at the same time it was not too largeso that costs for downloading citations remained within justifiable limits.

All the searches were performed on the same day. In preparatory sessions thetopics had been selected, the exact search strategy determined, the day of thesearch chosen (the dates of update should be comparable), and the respectivesegment of the database, in this case since its beginning, had been agreed upon.Excluded was EMBASE ALERT (database containing the entries of 6 previousweeks, citations without index terms) available on DIMDI. When we performedthis analysis this most recent segment was integrated into the complete file onthe remaining three hosts. Meanwhile, STN offers such an extra file, too. Theanalysis of the results focusses on the period 1985 to 1994. Thus the oldest partsof the database, where inconsistencies are more likely, were excluded. EMBASEstarted in 1974, but it incorporated some citations from the years before 1974.Likewise omitted were the newest parts of the database (from 1995 onward), inthis part of our analysis, as differing dates of update might have influencedthe results.

One of the searchers used DATA-STAR and STN, while the second searcherworked on DIALOG, and the third person used the host DIMDI. All searcheswere performed in the expert mode, i.e., without menu guidance. In mostsearches, the search term was combined with each publication year, e.g., daltrobanand py 5 1986; daltroban and py 5 1987, and so on, and the entire databasewas searched in one batch without restrictions of publication year or with definedrestrictions, such as heart infarction and py .5 1985. Once the searches hadbeen carried out results were quickly compared, and in case of questionableresults or misunderstandings concerning the strategy, searches were repeatedthe same day. All search requests were saved continuously including tables andindex displays. Later on all resulting tallies, e.g., those from the searches perpublication year and from the searches of the whole database in one batch, weretransferred into an EXCEL file for calculations and graphical display.

The search results are mentioned in alphabetical order of the hosts, exceptwhere reasons existed to do otherwise. In this presentation the dollar sign ($)is used as a symbol of truncation, irrespective of its meaning on the individualhosts. When searching topics or drugs, the host-provided online version of theEMTREE, but not the printed version, was used.

It was not the objective of this analysis to list all available database segmentsand their specific features on each host. The features of the hosts’ retrievalsystems are discussed only as far as this is necessary for the understanding ofthe topic. Moreover, costs were not taken into account, either. The question,which retrieval system allows the most convenient and most efficient search wasexcluded, too. For methodological and practical reasons the EMBASE CD-ROMwas not included in the analysis.

RESULTS

The right column of Table 1 indicates the difference between the highest andthe lowest tally for each search term. Only 1 of our 10 search terms yielded the

Page 4: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

EMBASE ON DATA-STAR, DIALOG, DIMDI, AND STN 497

TABLE 1

SEARCHES PERFORMED ON THE HOSTS DATA-STAR (DS), DIALOG (DIA), DIMDI (DIM),AND STN FOR THE PERIOD 1985–1994

Difference betweenminimum and

1985–1994 DS DIA DIM STN maximum tallies

Nagant de D. 52 52 52 52 0Lewis 922 881 879 920 43JAMA 10,023 10,023 10,023 10,000 23Pharmacother. 684 684 679 684 5Daltroban/descr. 162 165 163 163 3Atenolol/descr. down 5,832 6,002 5,887 5,894 170Myocard.inf./descr. 18,374 18,339 18,480 18,504 165Myocard.inf./b.i. 20,538 21,469 20,507 20,545 962Hypertension/descr. 57,676 70,930 57,676 57,673 13,257Hypertension/b.i. 74,234 74,469 71,598 71,622 2,871

same number of hits on all 4 hosts. No trend is discernible as to which host waslikely to produce higher or lower tallies.

The author name Charles Nagant de Deuxchaisnes is a rather ‘‘tricky’’ name,as the surname is composed of three components not easily understandable forpeople not familiar with Belgian traditions of family names and French spelling.Accordingly, the name is listed in EMBASE in six variants each producing acertain number of hits: NAGANT DE DEUCHAISNES C, NAGANT DEDEUXCHAISNES C, NAGANT DE DEUXCHAISNES CH, NAGANTDE DEUXCHAISNESC, DE DEUXCHAISNES CHN, and DE DEUX-CHAISNES CN. It seems quite surprising that despite these obvious spellingproblems a congruent number of hits could be retrieved.

In contrast to our first search example the name J$ Lewis (with truncation ofJ) yielded a different number of hits on each host, the discrepancy between thehighest and the lowest number being 43. In view of the large number of hitsfound this difference might seem neglectable, but figures should be identical nev-ertheless.

The journal title Journal of the American Medical Association retrieved adifferent tally on STN only (-23 in comparison to DATA-STAR, DIALOG, andDIMDI), while the journal title Pharmacotherapy could be found less often (-5)on DIMDI.

Hits of the Boehringer Mannheim investigational drug differed only by 3, butonly DIMDI and STN showed an identical figure, while DATA-STAR andDIALOG exhibited 1 hit less and 2 hits more, respectively.

Atenolol, the second example of a drug search, again, produced completelydiffering results with the discrepancy between highest and lowest value being 170.

The descriptor heart infarction for myocardial infarction yielded different num-

Page 5: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

GRETZ, SCHMITT, AND THOMAS498

FIG. 1. Difference between minimum and maximum (in percentage of the maximum) per searchduring the publication period 1985–1994.

bers, too. The descriptor hypertension, in contrast, produced roughly 13,000 hitsmore on DIALOG, while the three other hosts show identical (DATA-STARand DIMDI) or comparable (STN) results.

When myocardial infarction or hypertension were searched in the basic index(b.i.) covering almost all fields of the database the numbers retrieved differed con-siderably. This may be due to each host’s specific implementation of the database.

Taking all searches together, a difference is discernible between entries in thefairly clear-cut and (well-)defined author, journal title, and descriptor fields onthe one hand and the entries in the basic index on the other. However, even ifthe searches pertaining to the basic index are not taken into account, there areconsiderable differences between the numbers retrieved from four sources whosecontent allegedly should be identical. This result is even more alarming, asparticularly delicate time periods like the early years (beginning of the database)and the recent period (almost one and a half years before the day of databasesearch) have been left out. Figure 1 shows that there are differences in retrievalrate between 0 and 7.8% in this database segment minimally affected by theupdate problem (as the more delicate older and latest sections have been ex-cluded).

Without the restriction to the above mentioned period, differences are evenmore striking (Table 2).

DISCUSSION

How can these discrepancies be explained? While in the Nagant de Deux-chaisnes search the two missing hits could be attributed to the years 1995 (Numberof Document (ND) 95164842) and 1981 (ND 82089363), the situation is less

Page 6: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

EMBASE ON DATA-STAR, DIALOG, DIMDI, AND STN 499

TABLE 2

NUMBER OF HITS OF EACH SEARCH STATEMENT FOR THE PERIOD ‘‘BEGINNING OF DATABASE’’ OR

1985 UNTIL DAY OF SEARCH

BBeginning of E Gsearch period C D 1974–1995/ F Discrepancy

A (until date Highest Lowest (1985–1995) 1985–1994 between theTopic of search) tally tally difference difference two segments

C. Nagant de ,1974 98 96 2 0 2Deuxchaisnes

J. Lewis ,1974 1,718 1,653 65 43 22JAMA ,1974 18,207 16,168 2,039 23 2,017Pharmacother. 1985 733 728 5 5 0Daltroban 1985 169 166 3 3 0Atenolol/descr.dn ,1974 9,415 9,235 180 170 10Myocard.inf./descr. ,1974 39,181 37,020 2,161 165 1.996Myocard.inf./b.i. 1985 22,454 21,536 918 962 44Hypertension/b.i. ,1974 144,098 128,227 15,871 2,871 13,000

Note. The differences in column E are even more striking than those in column F.

simple in the search for J. Lewis. To retrieve Lewis J$, the indexes of the authorfield were opened. On DATA-STAR and STN the index showed 57 differententries, on DIALOG 58, and on DIMDI 54. This difference may eventually haveled to different tallies.

Searching the complete database on DIALOG resulted in 1657 hits for LewisJ$. Limiting this search to publication year greater or equal 1974 retrieved 1598citations. The 59 documents resulting from this difference, however, revealedthemselves not only as documents from the time before 1974, but the first 10citations displayed showed double publication years, like 1983/84 or other publi-cation years (after 1974). It could not be found out why they had not beenretrieved when the respective publication years had been searched.

On DATA-STAR, DIMDI, and STN there were also differences between thetotal sums on the one hand and the sums of the numbers per publication yearfrom 1974 onward on the other. In these cases, however, all ‘‘extra’’ hits couldbe assigned unambiguously to publication years before 1974.

The search for the journal title JAMA showed that in the newer segment 23hits were missing. All of them dated from 1994 and were not on STN. Theenormous discrepancy in the older segment cannot be explained; however, usingthe journal-specific CODEN instead of the journal title retrieved an additional1879 citations on STN.

For the search of Pharmacotherapy the index display was shown on the screen,then transferred to the search mode, and the result was ORed with the resultobtained by keying in the title manually. As on DATA-STAR the number ofhits exceeded by far those obtained on the other hosts; the search was repeated

Page 7: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

GRETZ, SCHMITT, AND THOMAS500

by using the CODEN. This yielded a more realistic result. The ‘‘false’’ drops onDATA-STAR could probably be explained by titles where Pharmacotherapy ispart of the journal name, like Biomedicine and Pharmacotherapy. While theother three hosts provide a separate field for the journal name, on DATA-STARthe journal title is integrated in the source field; thus any title containing thesearch word will be retrieved.

Daltroban, an investigational drug by Boehringer Mannheim, was searched inbasic index, and in controlled vocabulary, in both cases with and without trunca-tion. For the basic index search only the international nonproprietary name(INN) was used, not, however, the laboratory codes by Boehringer Mannheim(BM-13505) or by SmithKline Beecham (SKF-96148). Then, in a second step,relevant descriptors were identified by means of the descriptor and synonymlists of the index display. These lists are shown in Table 3; the entries markedwith *** were not included into the search as they are broader terms that wouldhave falsified the result. The relevant entries were integrated into the searchstrategy. The search covered the period from 1986 to 1995, as previously no hitsreferring to Daltroban could be found. In this case the citations retrieved weresaved on hard disk for further analysis.

There was a discrepancy of three hits between the minimal and maximal results.The comparison of the printed citations revealed some strange discoveries. Threehits (ND 90367388: author Woodward; ND 92104265: author Thaiss; ND93221570: author Gulbins) were found on DIALOG only. A second searchincluding the fields Author and Number of Document revealed these citationson the other three hosts, too. ‘‘Their’’ documents showed Daltroban in theabstract; however, none of the fields usually searched in a descriptor search (e.g.,chemical abstracts reference number [CAS-Number]) did contain the descriptoror any of the synonyms (Fig. 2).

More discrepancies could be found when comparing respective publicationyears. In 1987 DIMDI and STN show six hits each, while on DIALOG andDATA-STAR only five hits each could be found. Another search on DIALOG,where the EXPAND command of the descriptor was combined with author andthree title words resulted in nil. The document could be spotted, at last (ND87175792: Lefer): there were no entries in the descriptor field, neither on DIA-LOG, nor on DATA-STAR.

In 1994, DATA-STAR, DIMDI, and STN contain 15 hits each, DIALOG 16.This odd citation (ND 94276478: Ogletree) was among the citations of publicationyear 1993 on the other hosts. On DIALOG it had been found while limiting thesearch to ‘‘publication year 5 1994,’’ and accordingly its source field shows1994—in contrast to the respective hits on the other three hosts. The originaljournal was checked, and indeed publication year 1993 was correct. It is hardlyunderstandable why one host shows this difference, since all hosts are supposedto receive the very same raw data from the database producer.

In 1993 all hosts show 23 hits each; however, due to the erroneously assignedcitation by Ogletree (1994) DIALOG should have had 22 hits only. The 23rd(‘‘missing’’) hit was the one that had been found on DIALOG exclusively (ND

Page 8: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

EMBASE ON DATA-STAR, DIALOG, DIMDI, AND STN 501

TABLE 3

SYNONYMS OF DALTROBAN IN THE INDEX DISPLAY OF DESCRIPTORS ON THE HOSTS

DATA-STAR, DIALOG, DIMDI, AND STN

DATA-STAR synonyms (from EVOC)

BM adj 13505.de. 0BM-13505.de. 0SKF-96148.de. 94-(2-(4-chlorobenzenesulfonamido)ethyl)phenylacetic acid.de. 124-(2-(4-chlorobenzenesulfonylamino)ethyl)phenylacetic acid.de. 04-(2-(4-chlorophenylsulfonamino)ethyl)phenylacetic acid.de. 04-(2-(4-chlorophenylsulfonylamino)ethyl)phenylacetic acid.de. 0105218-3-9.rn. 079094-20-5.rn. 46

DIALOGRef Items Type RT Index-term

R1 152 9 *Daltroban***R2 1218 E 32 DC5D16.700.80.10. THROMBOXANE RECEPTOR BLOCKING

AGENTR3 103 S 1 BM 13505R4 10 S 1 SKF 96148R5 13 S 1 4 (2 (4 CHLOROBENZENESULFONAMIDO)ETHYL)PHENYLAR6 0 S 1 4 (2 (4 CHLOROBENZENESULFONYLAMINO)ETHYL)PHENYR7 0 S 1 4 (2 (4 CHLOROPHENYLSULFONAMIDO)ETHYL)PHENYLACR8 0 S 1 4 (2 (4 CHLOROPHENYLSULFONYLAMINO)ETHYL)PHENYLR9 0 S 1 RN5105218-3-9R10 46 S 1 RN579094-20-5

DIMDI

68.01 149 CT5DALTROBAN68.02 46 CT5DALTROBAN68.03 46 CR5105218-03-9 ... daltroban68.04 46 CR579094-20-5 ... daltroban***68.05 361 CC5D16.700.80.10 ... thromboxane receptor blocking agent68.06 0 CT5BM 1350568.07 9 CT5SKF 9614868.08 14 CT54 (2 (4 CHLOROBENZENESULFONAMIDO)ETHYL)PHENYLACETIC ACID68.09 0 CT54 (2 (4 CHLOROBENZENESULFONYLAMINO)ETHYL)PHENYLACETIC

ACID68.10 0 CT54 (2 (4 CHLOROPHENYLSULFONAMIDO)ETHYL)PHENYLACETIC ACID68.11 0 CT54 (2 (4 CHLOROPHENYLSULFONYLAMINO)ETHYL)PHENYLACETIC ACID

STN

E1 149 R Daltroban/CTHNTE Creation date 23 SEP 88RN 105218-3-9RN 79094-20-5

E2 14 UF 4 (2 (4 chlorobenzenesulfonamido)ethyl)phenylacetic acid/CTE3 0 UF 4 (2 (4 chlorobenzenesulfonylamino)ethyl)phenylacetic acid/CTE4 0 UF 4 (2 (4 chlorophenylsulfonamido)ehtyl)phenylacetic acid/CTE5 0 UF 4 (2 (4 chlorophenylsulfonylamino)ethyl)phenylacetic acid/CTE6 0 UF bm 13505/CTE7 9 UF skf 96148/CT***E8 1683 RMN D16.700.80.10./CT

Page 9: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

GRETZ, SCHMITT, AND THOMAS502

FIG. 2. Sample of one of the three documents found in the search on Daltroban in the descriptorfield. This document was retrieved on DIALOG, but not on DATA-STAR, DIMDI, or STN.

Page 10: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

EMBASE ON DATA-STAR, DIALOG, DIMDI, AND STN 503

FIG. 3. Retrieval of hyperten$ (truncated form yielding all entries starting with hyperten) in thebasic index, in the entire database, and in the period publication years 1985–1994.

93221570: Gulbins) as discussed before. On DIMDI, this citation by Gulbins wasmissing, but it was compensated for by the citation by Ogletree.

As for the next search, Atenolol, the difference in the newer segment was 170hits, in the complete file even 180. As the figures retrieved were too high fordetailed analysis, no explanation can be given. However, there was anotherunexpected finding: As we had searched both each publication year separatelyon the one hand and the whole database in one batch on the other, the sumobtained by ORing the year-wise tallies was different from the total numberobtained in the overall search covering an identical time period. This phenome-non occurred only with Atenolol on DIALOG.

The search for the descriptor heart infarction retrieving the concept ‘‘myocar-dial infarction’’ yielded a discrepancy of 165 hits between the highest and thelowest tallies in the segment assumed to be minimally affected by the updateproblem (Table 1). When the whole database was searched the difference roseto 2,161 (STN 39,181 minus DIALOG 37,020: 2,161).

Searching the basic index for myocardial infarction$ (with truncation), led toa difference of 962 in the 85–94 segment. This may be due to the specific selectionof fields integrated into the basic index. The same applies to the search forhyperten$ in the basic index, where the discrepancy for the segment 85–94comprises 2,871 hits (Table 1). A search for the same term over the wholedatabase shows even a discrepancy of 15,871 hits between the maximum onDIALOG and STN and the minimum on DIMDI. The respective tallies aredepicted in Fig. 3.

The assumption that identical search requests on the same database would

Page 11: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

GRETZ, SCHMITT, AND THOMAS504

yield divergent results on different hosts could be confirmed. This analysis is notthe only one to show such a phenomenon; several investigators have found suchdiscrepancies with other databases (4–6).

Search results differed only slightly when source and author fields weresearched. Searches pertaining to descriptors and basic index were more likelyto yield divergent results between hosts. The reasons for these differences canbe manyfold, ranging from the time of update, the respective implementation,or the peculiarities of the retrieval language to unexplicable loss of data.

In this analysis a methodological dilemma could not be avoided. Performingthe search requests on the same day (June 28, 1995) could have meant thatdifferent versions/updates of EMBASE had been used. On the other hand, ifthe authors had made sure that really the same update status was available(provided this information is available to ‘‘normal’’ users), this could possiblyhave meant that the search would have had to be performed on several daysaccording to the respective update cycles. This, however, would definitely havebeen an unrealistic situation, as searchers of bibliographic databases usually donot defer their work because an update can be expected (of which they are notbeing informed anyway).2

The results obtained could be influenced by the fact that DIMDI offers thelatest citations (6 previous weeks) in a separate database, EMBASE ALERT.The other three hosts had incorporated these citations into their ordinary filesat the time of this analysis. It would have been likely for searches on DIMDIto produce fewer hits, which, however, was only true in a few cases. The maximumor minimum tallies found do not allow conclusions in favor of or against onehost or the other.

Some examples analyzed in detail showed that inconsistency of data (Daltro-ban: incongruent entries in the field publication year, missing descriptors, doubleentries in the field publication year that were recognized as missing data by thesystem) is another factor leading to divergent results.

Another interesting point was the question which fields are included in thebasic index (7). On STN it covers the text of indexed fields plus CAS numbers.On DATA-STAR ‘‘all indexing data (text and codes)’’ are included, while onDIMDI the codes are excluded from the basic index. On DIALOG the basicindex comprises also section headings and identifiers, at least for some periods.The searches in the basic index (hypertension, myocardial infarction), however,did not in all cases retrieve the highest numbers on DIALOG or DATA-STAR,although these hosts include more fields into their basic index than DIMDIor STN.

Other factors of influence are descriptors and the retrieval functions relatedto them. The specific implementation of an automatic down search of drugs, asit is realized on DIMDI, achieves the maximum number of hits, whether thedown command was used or not. For the user this is the most convenient form

2 The authors are fully aware that the situation is completely different in the context of financialor press information, as in these fields topicality is of major importance.

Page 12: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

EMBASE ON DATA-STAR, DIALOG, DIMDI, AND STN 505

of implementation, as he need not bother about potential narrower terms in thethesaurus or lists of synonyms.

The synonyms of Daltroban listed in Table 3 show minor divergences that,however, could be responsible for different numbers of hits. Only on DATA-STAR the laboratory codes BM-13505 and SKF-96148 are written with a hyphen,while the version without hyphen is admitted for the BM code, but not for theSKF code. Another puzzling detail: On DIALOG the entry BM 13505 yields103 hits, in contrast to the other hosts where the number of hits for this entryis nil.

The lists of synonyms for descriptors can be displayed and selected or allentries can be transferred to the search mode and retrieved without any problemon the hosts DIALOG, DIMDI, and STN. On DATA-STAR, however, an extradatabase (EVOC: EMBASE vocabulary) has to be accessed. With the help ofthe MAP function synonyms can be transferred to EMBASE where a searchhas then to be performed. The preferred term of EMBASE and the newly foundhits based on the transferred synonyms have to be combined with OR to achievea comprehensive result. This procedure is less user-friendly than the implementa-tions on the other three hosts.

Moreover, it could be of importance if individually adjustable default valueslike the automatic inclusion of plural forms or synonyms were on or off. In thesearches carried out for this analysis these options were off.

As for the second category of observations, the difference between a host-generated sum of annual results on the one hand and the number of hits achievedby searching the same period year-wise on the other (Atenolol on DIALOG)remains inexplicable to the authors. Perhaps the database contains duplicates ormock duplicates which are not counted in an OR combination.3

The examples analyzed show that retrieving the same subject matters ondifferent hosts can in fact produce different results. However, no tendency infavor of or against one of the hosts is discernible. On the other hand, it has tobe taken into account that nine search requests on four hosts are by no meansa statistically safe basis to draw general conclusions. It would have been desirableto analyze all the searches in detail to find out why some of the citations couldbe spotted on one host and not on the other. This would imply a close look at thosecitations left over after a crosscheck. Given the enormous extent of EMBASE (ca.7 million entries) the samples shown in this analysis are minute. Nevertheless,all documents should be provided with descriptors, and correct and unambiguouspublication years should be assigned. The implementation of the database orthe update procedures of the hosts should eliminate such phenomena. Whenplanning this analysis such frequent divergences had not been supposed, andmany questions arose only when results were discussed and compared. Furtherinvestigations should include many more search requests, but also analyze eachindividual result in more detail.

3 The hypothesis of duplicates is supported by single observations where after a number of ANDcombinations and an ensuing automatic check of duplicates 1 or 2 duplicates were found (personalcommunication by C. Cazan).

Page 13: EMBASE Remains EMBASE—On Every Host? A Comparative Analysis of EMBASE on the Hosts DATA-STAR, DIALOG, DIMDI, and STN

GRETZ, SCHMITT, AND THOMAS506

CONCLUSION

In everyday work hardly ever will the same search request be carried out bythe same person on different hosts. Therefore, discrepancies like the ones shownin this analysis usually do not become obvious. Nevertheless, the results pointto the fact that besides familiarity with a retrieval language or the cost factor,the quality of the data as they are processed by the host in fact does play a role.Quality control by the hosts should ensure that identical search requests provideidentical results, irrespective of the host used. On the other hand, users shouldconsider it their duty to communicate puzzling search results, inconvenient han-dling or strange findings to the host, as only such feedback will (hopefully!)lead to change and improvement. This analysis revealed above all quantitativedifferences between the four hosts. Further investigations backed up by a largerquantitative basis should have a closer look at the quality aspect of the data.

REFERENCES

1. THOMAS, M., AND GRETZ, M. From serum cholesterol in elephants to morbidity in Nepal: Anempirical analysis of 6,729 on-line searches at Boehringer Mannheim GmbH. Drug Inf. J. 30,217–236 (1996).

2. EMBASE User Manual, Excerpta Medica/Elsevier Science Publishers, Amsterdam 1989, pp. 1–2.3. MAES, V. Currency of information found in Silverplatter’s MEDLINE CDROM. Online CDROM

Rev. 19, 59–69 (1995).4. BOYD, F., KING, C., MACDONALD, R., OPPENHEIM, C., RODGERS, V., AND WALKER, M. Searching

Textline on different hosts. Bus. Info. Rev. 10, 45–61 (1993).5. GEORGY, U. Die Chemical Abstracts bei sechs Hosts: Ein Vergleich. (Chemical Abstracts on six

different hosts). In ‘‘Information und Medienvielfalt. 16. Online-Tagung der DGD. Proceedings’’(W. Neubauer and R. Schmidt, Eds.), pp. 219–229. Deutsche Gesellschaft fur Dokumentation,Frankfurt am Main, 1994.

6. OJALA, M. The pitfalls of simpler searching . . . and the potential for professional expertise. Info.World Rev. 105, 8 (1995).

7. CROWLESMITH, I. Host differences when searching EMBASE. Online 19, 89–90 (1995).