21
The information seeking behaviour of the users of digital scholarly journals David Nicholas 1 , Paul Huntington 2 , Hamid R. Jamali * , Anthony Watkinson School of Library, Archive and Information Studies, CIBER, 3 University College London, Henry Morley Building, Gower Street, London WC1E 6BT, United Kingdom Received 23 November 2005; received in revised form 1 February 2006; accepted 1 February 2006 Available online 20 March 2006 Abstract The article employs deep log analysis (DLA) techniques, a more sophisticated form of transaction log analysis, to dem- onstrate what usage data can disclose about information seeking behaviour of virtual scholars – academics, and research- ers. DLA works with the raw server log data, not the processed, pre-defined and selective data provided by journal publishers. It can generate types of analysis that are not generally available via proprietary web logging software because the software filters out relevant data and makes unhelpful assumptions about the meaning of the data. DLA also enables usage data to be associated with search/navigational and/or user demographic data, hence the name ‘deep’. In this con- nection the usage of two digital journal libraries, those of EmeraldInsight, and Blackwell Synergy are investigated. The information seeking behaviour of nearly three million users is analyzed in respect to the extent to which they penetrate the site, the number of visits made, as well as the type of items and content they view. The users are broken down by occu- pation, place of work, type of subscriber (‘‘Big Deal’’, non-subscriber, etc.), geographical location, type of university (old and new), referrer link used, and number of items viewed in a session. Ó 2006 Elsevier Ltd. All rights reserved. Keywords: Transaction log analysis; Electronic periodicals; Information-seeking behaviour; Usage statistics 1. Introduction In this article we present and collate the findings of a number of recent investigations that have been conducted under the Virtual Scholar Research Program at University College London, a program that seeks to bring robust evaluation to the digital scholar environment. 4 The robust analysis is the product of a 0306-4573/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2006.02.001 * Corresponding author. Tel.: +44 20 7679 7205; fax: +44 20 7383 0557. E-mail addresses: [email protected] (D. Nicholas), [email protected] (P. Huntington), [email protected] (H.R. Jamali), [email protected] (A. Watkinson). 1 Tel.: +44 20 7679 2477. 2 Tel.: +44 20 7679 7205. 3 Centre for Information Behaviour and the Evaluation of Research. 4 http://www.ucl.ac.uk/ciber/ciber.php. Information Processing and Management 42 (2006) 1345–1365 www.elsevier.com/locate/infoproman

The information seeking behaviour of the users of digital ... · PDF fileThe information seeking behaviour of the users of digital scholarly journals ... to mining the raw log data

Embed Size (px)

Citation preview

The information seeking behaviour of the users ofdigital scholarly journals

David Nicholas 1, Paul Huntington 2, Hamid R. Jamali *, Anthony Watkinson

School of Library, Archive and Information Studies, CIBER,3 University College London, Henry Morley Building, Gower Street,London WC1E 6BT, United Kingdom

Received 23 November 2005; received in revised form 1 February 2006; accepted 1 February 2006Available online 20 March 2006

Abstract

The article employs deep log analysis (DLA) techniques, a more sophisticated form of transaction log analysis, to dem-onstrate what usage data can disclose about information seeking behaviour of virtual scholars – academics, and research-ers. DLA works with the raw server log data, not the processed, pre-defined and selective data provided by journalpublishers. It can generate types of analysis that are not generally available via proprietary web logging software becausethe software filters out relevant data and makes unhelpful assumptions about the meaning of the data. DLA also enablesusage data to be associated with search/navigational and/or user demographic data, hence the name ‘deep’. In this con-nection the usage of two digital journal libraries, those of EmeraldInsight, and Blackwell Synergy are investigated. Theinformation seeking behaviour of nearly three million users is analyzed in respect to the extent to which they penetratethe site, the number of visits made, as well as the type of items and content they view. The users are broken down by occu-pation, place of work, type of subscriber (‘‘Big Deal’’, non-subscriber, etc.), geographical location, type of university (oldand new), referrer link used, and number of items viewed in a session.! 2006 Elsevier Ltd. All rights reserved.

Keywords: Transaction log analysis; Electronic periodicals; Information-seeking behaviour; Usage statistics

1. Introduction

In this article we present and collate the findings of a number of recent investigations that have beenconducted under the Virtual Scholar Research Program at University College London, a program that seeksto bring robust evaluation to the digital scholar environment.4 The robust analysis is the product of a

0306-4573/$ - see front matter ! 2006 Elsevier Ltd. All rights reserved.doi:10.1016/j.ipm.2006.02.001

* Corresponding author. Tel.: +44 20 7679 7205; fax: +44 20 7383 0557.E-mail addresses: [email protected] (D. Nicholas), [email protected] (P. Huntington), [email protected] (H.R.

Jamali), [email protected] (A. Watkinson).1 Tel.: +44 20 7679 2477.2 Tel.: +44 20 7679 7205.3 Centre for Information Behaviour and the Evaluation of Research.4 http://www.ucl.ac.uk/ciber/ciber.php.

Information Processing and Management 42 (2006) 1345–1365

www.elsevier.com/locate/infoproman

methodology we called deep log analysis (DLA) that takes its lead from, but goes much further than, transac-tion log analysis. Together the results of these studies provide a comprehensive, detailed – and a sometimes sur-prising – picture of the information seeking behaviour of the digital scholar (academic and researcher) in regardto two major digital journal libraries, those of EmeraldInsight5 (Emerald Group Publishing Limited, Bradford,England), a business and information studies publisher, and Blackwell Synergy6 (Blackwell Publishing, Oxford,England), a learned journal publisher. The investigation is probably one of the largest ever undertaken, cover-ing as it does, the online transactions of nearly three million virtual scholars. From the individual investiga-tions, we have selected analyses which provide a good overview of the research and which we believe to beparticularly pertinent.

2. Aims, objectives and scope

The major aim of the paper is to demonstrate what deep log analysis can disclose about the kinds of peoplethat search scholarly digital journal libraries and their information seeking behaviour, in the belief that themethodology provides a bigger, more accurate, and fuller picture than what is possible with standard surveytechniques and provides some very powerful types of analyses not obtainable from the standard commerciallog analyzing software. Deep log analysis refers not simply, as the name suggests, to mining the raw log datamore deeply and accurately than proprietary software, but also to relating usage data to user data to providethat all important triangulation. It also generates the questions that interviewers, focus groups, and question-naire originators should be asking, but seldom do. To demonstrate this we have taken server log transactiondata from two digital libraries (publisher platforms) containing large numbers of full-text scholarly journals:those of the publishers, Emerald, which features around 150 business and library studies journals, and Black-well, which contains some 700 journals, with a strong presence in the sciences and medicine.

Both publisher platforms were subject to a range of enhanced or deep log analyses. We have already con-ducted and published some other kinds of analyses on the logs of these two digital libraries (Nicholas, Hun-tington, & Watkinson, 2003, 2005). Here we shall concentrate on, arguably, the two most powerful deep logmetrics which we believe provide especially illuminating data:

! the number of items viewed per online session (something we call ‘site penetration’),! the number of visits (returnee analysis).

These two use metrics were enhanced with user details to provide deeper, more meaningful, data. In thecase of Blackwell, this was obtained by relating the logs to a database containing the demographic data relat-ing to subscribers, and, in the case of Emerald, by means of desk research (obtaining background informationon using institutions from reference works and websites).

While we describe the technical procedures and problems associated with deep log analysis in this paper thisis not our main purpose, which was not so much as to explain ‘‘how’’ it was done, but more to show what canbe produced – really, to demonstrate the utility and significance of the data. (For those wanting more details ofthe techniques please refer to Nicholas, Huntington, Lievesley, & Wasti, 2000; Nicholas, Huntington, Row-lands, Russell, & Cousins, 2004.) Many of the ideas and methods presented in this article were developedas part of work we have conducted in helping the UK government map and evaluate the roll-out of digitalhealth services to the consumer (Nicholas, Huntington, & Williams, 2004). The goal of the Virtual ScholarResearch Program is to do the same kind of thing in the scholarly journal field.

3. Literature review

Awhole range of di!erent methods with di!erent approaches and objectives have been employed to study theuse of digital journals. Questionnaire surveys (Finholt & Brooks, 1999; Nelson, 2001; Rusch-Feja & Siebeky,

5 http://www.emeraldinsight.com.6 http://www.blackwell-synergy.com.

1346 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

1999; Salisbury & Noguera, 2003; Tenopir & King, 2001; Teskey & Urquhart, 2001) and interviews/focusgroups (Bonthron et al., 2003; Talja & Maula, 2003; Teskey & Urquhart, 2001) are favourite methodologies.Transactional log studies are not so common, but they are becomingmore popular (Davis, 2004a; Davis & Solla,2003; Gargiulo, 2003; Ke, Kwakkelaar, Tai, & Chen, 2002; Tulip, 1996; Yu & Apps, 2000; Zhang, 1999).

3.1. Questionnaire survey and interview studies

Generally, social survey methods tell us that the reading of scholarly articles has increased during the lastdecade and has been boosted by the advent of electronic journals. A series of survey studies conducted byTenopir and King (2001) over the last two decades shows that scientists not only read more articles but alsoread from a broader range of journals. On average, nearly one third of journal articles currently being readcome from digital databases, and almost half of all scientists now use electronic journals at least part ofthe time, with considerable variations among disciplines. Tenopir (2002, 2003) reports on findings from a num-ber of research studies, including those by the Council on Library and Information Resources (CLIR) and theOnline Computer Library Center (OCLC), on the use of electronic sources. She notes, ‘‘Although the use ofelectronic versions still varies from discipline to discipline, almost everyone will adopt peer-reviewed electronicjournals that make their work easier and for which the cost is free or subsidized by the library’’. The use ofelectronic journals was highest among physicists. Other surveys (Rusch-Feja & Siebeky, 1999) verified thefinding that use of electronic journals is high among physicists, biologists, and biomedical scientists and thisfits with transaction statistics obtained from publishers. Smith (2003) also found that science faculty membersmake more use of e-journals than those from the social science faculty. Tomney and Burton (1998) found thehighest e-journal use among the business, science, and engineering faculties at a British university, while his-tory faculty members made no use of e-journals. Another survey conducted by Nelson (2001) at another Brit-ish university shows the highest use of e-journals among academics in the business school, while the lowest useoccurred in the art, media, and design faculties.

Scholars from all disciplines point out that a major factor in the non-use of electronic resources was the lackof archival and retrospective material. Lack of archival material has been mentioned as a disadvantage ofe-journals by respondents in some other studies (Institute for the Future, 2002b; Pullinger & Baldwin,2002). Although lack of awareness was once mentioned as one of the contributing factors for non-use ofe-journals (Nelson, 2001; Tenner & Ye, 1999; Teskey & Urquhart, 2001; Tomney & Burton, 1998), it doesnow appear that the awareness and adoption of e-journals is increasing rapidly while convenience of usehas remained the most important concern for users (Tenopir, 2003).

Both Borghuis et al. (1996) and Entlich et al. (1996), as cited in Bishop et al. (2000) report that, in academia,digital journals tend to be used more by students than faculty. The findings of the Tulip project (Tulip, 1996)and a survey by Tomney and Burton (1998) show the same result. A survey by Liew, Foo, and Chennupati(2000) also shows high acceptance of e-journals by graduate students.

Di!erent interfaces among databases makes comparison di"cult but some basic search functions are com-mon throughout all commercial databases. These include a search by journal title, author, publication date,and table of contents. It has been found that users from di!erent subject disciplines search di!erently for bothelectronic and print material (Bonthron et al., 2003; Tenopir, 2003). For example, Finholt and Brooks (1999)surveyed economics and history faculties at the University of Michigan and found that historians use abstractsof e-journal articles less than economists. Users of Internet-based subject gateways prefer to browse ratherthan search for a specific article, and when they do they tend to use ‘‘keyword’’ searching (Monopoli & Nich-olas, 2001). Use of the online help facility is not widespread. Browsing and chaining (following bibliographicreferences already known) is also a popular method (Talja & Maula, 2003). Recent questionnaire surveys illus-trate a tendency among online journals’ users to search rather than browse (Boyce, King, Montgomery, &Tenopir, 2004; Sathe, Grady, & Giuse, 2002).

As it is clear from the above mentioned results, a considerable part of our knowledge of use and users ofdigital journals is based on the results of the questionnaire survey and interview studies. However, both inter-view and questionnaire survey studies are based on self-reported data. They tell us what users say they mightor would do, or what they think they do. They are open to bias for the researcher may prompt respondents tosay a particular response.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1347

3.2. Log and usage data studies

The number of studies which are based on the analysis of log or usage data has been increasing. Log ana-lysis has been applied for di!erent purposes such as assessing system performance, studying users’ searchingand browsing behaviours, investigating the e!ectiveness of Big Deal subscriptions, studying literature decay,and so on.

Log studies have been particularly helpful in understanding the searching and browsing behaviour ofe-journals’ users. Using log analysis, eJUSt project researchers (Institute for the Future, 2002a), found threecommon seeking patterns: (a) journal homepage – TOC – HTML full text – PDF full text; (b) PubMed –HTML full text – PDF full text; (c) journal homepage – search – HTML full text – PDF full text. The findingsshowed that most requests were for full text in HTML, which were then followed by requesting the full text inPDF, as if the final goal of most visits was to take away a PDF version of an article. Log analysis of Science-Direct OnSite (SDOS) in Taiwan shed some light on the searching behaviour of users. The analysis revealedthat roughly 32% of all recorded page accesses related to full-text accesses, 34% of accesses related to brows-ing, 13% related to searching, and 9% of accesses related to abstract page views. In terms of search queries, ofall users, 42% made 1 to 20 queries. A total of 91% of the queries were of the Simple Search type, while only9% of the queries were of the Expanded Search type. ‘‘Any Field’’ was the default query field, matching any ofthe fields that can be searched, and was used in 84% of simple searches. On the other hand, about half (49%) ofExpanded Search usage included fields other than the default field. Article Title, Author’s Name, and Abstractwere the three query fields most frequently used in Expanded Search mode (Ke et al., 2002). The SuperJournalproject showed that researchers were not very good at searching (Eason, Richardson, & Yu, 2000). But thingshave changed in the ten years that have elapsed since the SuperJournal project. The analysis of referral logs ofchemical journals showed that library catalogues and bibliographic databases, which are both searching mech-anisms, were the top two sources that led users to journals (Davis, 2004b). This supports the findings of somerecent questionnaire surveys indicating a tendency among online journals’ users to search rather than browse(Boyce et al., 2004; Sathe et al., 2002). On the other hand, some other studies indicate that browsing seems tobe the favoured method when using electronic journals (Eason et al., 2000; Eason, Yu, & Harker, 2000;Monopoli, Nicholas, Georgiou, & Korfiati, 2002; Tenopir, 2003). These discrepancies in the findings of dif-ferent studies may be due to the fact that users behave di!erently when they have di!erent goals or tasks. Theymay prefer to browse for keeping up-to-date while they may search if they have a task or look for informationon a specific subject. This is another area where log analysis fails to deliver. Log analysis is carried out withouttaking into account the intention of the users. Log analysis is not all that helpful at discovering the value anduse of the articles retrieved, or about what lies behind expressed information seeking behaviour.

Essentially one of the limitations of basic log analysis is the fact that there is not much possibility to link usedata with user data, hence a vague and general picture of users’ information seeking behaviour. This technicalrestriction makes it di"cult to use demographic data of users for finding out about di!erences in informationseeking activities of users with di!erent tasks, statuses, genders and so on. However, those studies such asSuperJournal project and eJUSt that have applied triangulation have been able to illustrate a fuller pictureof user’s information seeking behaviour. The SuperJournal study revealed that task, discipline, and relevanceof the collection are major factors in determining patterns of use. The study showed that social scientists aremore task-driven than scientists are. They search for relevant articles when prompted by tasks, while scientistsbrowse journals on a regular basis to keep up-to-date (Pullinger & Baldwin, 2002). It should be mentioned thatthe range of subjects was very limited in the SuperJournal project. Its data about scientists just refer to someareas of Genetics and Chemistry, and Social scientists include Political Studies, Communications, and Cul-tural Studies. The eJUSt study showed that user’s status is a significant factor on how they search for infor-mation (Institute for the Future, 2002b). As mentioned earlier, this is probably because people in di!erentpositions have di!erent tasks to do or more clearly their di!erent goals require them to use di!erent informa-tion seeking behaviours. Undergraduate students tend to search the Internet first, and then go to library-basedservices, unless they have been provided with and instructed on how to use a specific resource. It turned out inthe SuperJournal project that undergraduate students used electronic journals in a ‘‘binge’’ way – makinggreat use of them in a short time, while those whose primary task was to research (postgraduates and research-ers) used the e-journals the most with undergraduates and academics a little less (Pullinger & Baldwin, 2002).

1348 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

The primary goal of many log or usage data studies is to find out about use rather than users. In terms ofusage studies, previous log studies have led to di!erent conclusions about the success or otherwise of the BigDeal and consortium subscriptions to journals. Davis (2002) challenged the composition of geographic basedconsortia. He recommended libraries create consortia based on homogeneous membership. Obst, 2003 alsosaw ‘‘no future’’ for package deals on the basis of the results of his comparative study. On the other hand,Gargiulo (2003) analysed logs of an Italian consortium and strongly recommended Big Deal subscriptions.

Most of the studies that employ log analysis do not provide details about the log analysis process or thesoftware involved and further investigation is required to determine this. However, in the case of Gargiulo(2003), they used an ‘‘intelligent parser’’ and commercial statistical software for extracting and analyzingthe download statistics from log files. They used SAS (Statistical Analysis Software) to deal with raw log files,extracting statistics, and creating reports. Davis (2002) was provided with data by Academic Press (San Diego,CA) as summary statistics: by journal, by institution, by month, and used SPSS for the statistical analysis.Davis in his most recent research (2004a) used Microsoft Excel and SPSS to analyze referral URLs in trans-action logs of the American Chemical Society journals to find out about information seeking behaviour ofchemists at Cornell University (Ithaca, New York). Ke et al. (2002) studied Elsevier ScienceDirect logs in Tai-wan and used the C programming language for processing log files. They paid more attention to searchingbehaviour (e.g., use of search facilities, browsing, keywords used, and operators). Problems with floating IPaddresses and proxies meant that they could not investigate as deeply as they would have liked. In the Super-Journal project, researchers used a program written by C++ to transform original log files into SPSS format.They emphasized that ‘‘for most SuperJournal tasks SPSS was e"cient and adequate’’ (Yu & Apps, 2000).

4. Methods

All digital information platforms have a facility by which logs are generated that provides an automatic andreal-time record of use by everyone who accesses information services on these platforms. They represent thedigital information footprints of the users and by analyzing them, you can track and map their informationseeking behaviour, and, when enhanced, they can tell us something about the kinds of people that use the ser-vices. The attraction of logs is that they provide abundant and robust evidence of use. With log analysis, it ispossible to monitor the use of a system by millions of people, around the country or world. Logs record use byeveryone who happens to engage with the system there is no need to take a sample. The great advantages ofthe logs are not simply their size and reach, although the dividend here is indeed a rich and unparalleled one.Most important, they are a direct and immediately available record of what people have done: not what theysay they might or would do; not what they were prompted to say, not what they thought they did. The data areunfiltered, speak for themselves, and provide a reality check that both represents the users and complementsimportant contextual data obtained by engaging with real users and exploring their experiences and concerns.

Publishers usually contract out a lot of their log analysis to third parties (e.g., Catch Word/Ingenta, Aty-pon) or rely on proprietary software, like WebTrends, Netracker, etc. Not withstanding the undoubted tech-nical expertise of the third parties and the software suppliers, the analyses performed are very limited and thedangers inherent in this are that publishers (and their clients, libraries, to whom they provide data) are ‘‘onceremoved’’, and typically find themselves in an information dust storm kicked up by the log data. Clearly toobtain rich and accurate data from log files that really inform, it is necessary to go beyond proprietary loggingsoftware, mine the raw data more sophisticatedly and triangulate the data with other datasets or data collec-tion methods. In other words, adopt deep log analysis techniques (DLA). Deep log analysis is best viewed as afour-step process. First, the assumptions on how the data are defined and recorded (e.g., who is a user, what isa hit, what represents success or satisfaction?) are questioned and realigned, and their statistical significanceassessed. This is important, as skewed data is a real problem. This ensures both that incorrect, overinflatedreadings that give a false sense of achievement and progress are avoided. Second, the raw data are re-engi-neered to provide more metrics and powerful combined metrics to ensure that data gathering is better alignedto organizational goals and policies. The third step is to enrich the usage data by adding user demographicdata (e.g., occupation, subject specialty), either with data obtained from a subscriber database (ideal) or onlinequestionnaires (not so ideal, as user data cannot be mapped so closely on usage data). Of course, logs and userdatabases enable us to map the digital environment more accurately but provide only a little in the way of

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1349

explanation, satisfaction, and impacts. They do, however, raise the questions that really need to be asked inquestionnaires, interviews, or by observation, i.e., to explain information seeking behaviour – the fourth stepin our analyses. The research reported here has progressed to only the third step, the fourth step needs furtherplanning and the result might be published later. The main advantage of DLA over the usual log analysisundertaken by proprietary software is that use data is enriched with the data about users and this leads tobetter knowledge of user behaviour. Moreover, DLA is powerful in generating some kinds of metrics thatare not achievable with proprietary software; returnees and site penetration are the two most powerful metricsof DLA, which are demonstrated here.

4.1. Data collection and definitions

Our analyses are based mainly upon two sets of raw server transaction logs obtained from the Emerald andBlackwell journal libraries. The datasets were:

1. One year (January–December 2002) of Emerald’s digital library logs. A year was required to pick-up onreturn visits, a key deep log analysis. Raw logs are enormous in size and the fact that the Emerald databaseis relatively small enabled us to take such a long period.

2. Two month’s worth of logs for Blackwell Synergy (February–March 2003), in which, among other things,usage data was related to user data. Site penetration and type of item viewed analyses were undertaken onlyon February’s data. In addition, one day’s logs (September 17, 2003) were analyzed a little over half a mil-lion user transactions in all, which constituted a test-bed for analyses.

Of course, the fact that the two datasets cover di!erent periods means that any comparisons between thetwo publisher platforms have to be treated with caution.

In all cases the raw logs were obtained and subjected to standard deep log techniques, parsed, and thenprocessed by SPSS. Standard usage (e.g., type of items viewed) and deep log analyses (site penetration andreturnee) analyses were generated. For full details of the methods used, see Nicholas et al. (2000). The sizeof the datasets was enormous; nevertheless, in the case of Blackwell, we are only commenting on a monthor two of data and our results should be looked at in this light. The working definitions for the metricsemployed by the project are as follows:

! User. In the case of Emerald, user identification was based on the ‘‘Urn’’ number the unique identificationnumber used by the server to write and read cookies. A user is e!ectively a computer; sometimes that com-puter represents an individual, (i.e., a professor in his o"ce), in other cases a number of people (i.e., stu-dents in the library). For Blackwell user identification was based on a combination of IP number andbrowser details. Again, a user was e!ectively a computer; sometimes that computer represents an individualand in other cases a number of people. Sessions. They are identified in the logs by a session identificationnumber. Both Emerald and Blackwell had session identification numbers. Logs include a session-beginningtag and a session-ending tag, which enables us to make time calculations as well.

! Items viewed/requests made. A ‘‘complete’’ item returned by the server to the client in response to a useraction. Typically, this might include an abstract, an article, or a table of contents. A complete item mightbe all the pages, charts, etc. from an article, and this is recorded as a single item; hence, the digital librarylogs are quite di!erent from traditional server log files that record pictures and text documents separately.The Blackwell logs also recorded views to the home page and a returned search screen.

For both digital libraries, we embellished and supplemented the usage data with data about the user and/ortheir organization. In the case of Emerald, this was limited to data obtained on the country and type of orga-nization they belonged to. However, in Blackwell’s case, the data collection was much more than that. Userbackground data (on occupation, organizational a"liation, and geographical location) held on a registereduser database was related, via an identification number, to the usage logs generated in February 2002 by reg-istered users. The user database contained records of over 500,000 registered users. The database was not acomplete record of subscribers entering the site. This was because there was a number of ways that subscribers

1350 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

entering the site were recognized, for example, they could be identified as coming via a trusted proxy severuser, a society member, a location such as a university, or were users at a given IP address and so on. Thenumber of subscribers entering the site via their user name and password was relatively small about 10%.The log file of these subscribers was extracted and supplemented with information extracted from the formthat users fill in; this gives information on the users’ occupation, place of work, and how they first heard aboutthe Blackwell online library. Much of the form is free text entry; hence, it was not possible to place all userswithin the user categories we employ in our analysis.

4.2. Websites’ interfaces7

The interfaces of EmeraldInsight and Blackwell Synergy have some common features. They both provideusers with the options of simple and advanced searches for articles. Users can browse the list of journals bysubject or alphabetical order of titles and they can also limit the journals just to those that they subscribe to.The main di!erence between their interfaces is the way users can access the full-text of articles. On a table ofcontents page of a journal issue on EmeraldInsight, users can opt to view the article. By clicking on the option‘View’ they will be taken to another page which includes the abstract of the article as well as options to viewfull-text PDF, full-text HTML (if available), or download the PDF file. This means users have to visit theabstract before viewing the full-text. But on a table of content page of a journal issue on the Blackwell Synergyusers have options to view either full-text of the article or the abstract.

5. Results

For each of the key usage metrics – number of items viewed in a session and return visits, the data are bro-ken down by a range of user characteristics. These two metrics o!er solid platforms for characterizing andcomparing the information seeking behaviour of subgroups of users. We need to do this because generaliza-tions based upon millions of users, while sounding impressive, can prove very misleading indeed, camouflag-ing, possibly, big di!erences between individual user groups, like that between students and professors, forinstance. We will demonstrate this by defining users by:

! occupation (academic status);! place of work;! type of subscriber (big deal, non-subscriber etc);! geographical location of user;! type of university (old and new);! referrer link used;! number of items viewed in a session.

We have not provided the same user analyses for both digital libraries because of the essential di!erencesbetween the content and logs of the two publishers, and the particular emphases of the individual investiga-tions that have been combined for the purpose of this paper. This should not prove a problem as the aim of thepaper is to show what deep logging could o!er and the comparison between the two libraries is of secondaryinterest only.

Table 1 provides a summary of the usage data collected for the two digital libraries. Nearly 3 million users,viewing over 34 million items, are represented in our analyses.

5.1. Type of item viewed

When online to a digital library users can view di!erent kinds of pages or perform a number of di!erenttransactions and by mapping them we can obtain an idea of what they obtain from the site and how they

7 This description is based on the situation of the sites in 2002–2003 when the log data were collected.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1351

use it. We identified the following type of views as being particularly significant, views to: the list of journalissues, individual journal table of contents (ToCs), abstracts, and full text articles (Table 2). On Synergy fulltext articles proved to be the most viewed item, which suggests three possibilities, users: (1) wanted to godirectly to the source itself to make their own opinions as to its relevance, we call such people ‘end-user check-ers’ (Nicholas, Huntington, Williams, & Dobrowolski, 2004); (2) simply did not understand the navigationalqualities of abstracts – and this supposition is partly supported by a separate study micro-tracking two users,which showed that out of the 16 sessions undertaken only one featured an abstract view; (3) used A&I services,like PubMed, for the first trawl, which then took them directly to the article they required, and we shall seeevidence later to show that this also provides part of the explanation. Articles accounted for nearly one third(31%) of all views. About two thirds of these views were undertaken in PDF format and one third in HTML.A quarter of views concerned journal title content lists, 23% individual journal issue table of contents (ToCs)and 20% abstracts.

The picture for Emerald is quite di!erent with abstracts being viewed most – nearly half (49%) of all viewswere to abstracts, 17% of views were to content pages, 8% to issues and 26% to articles. Clearly, site structureplays a role here. Thus in the case of Emerald, when you choose your article in the ToC, you have to see theabstract to choose whether you want full text and in which format, but in the case of the Blackwell ToC youhave options to go directly to PDF, HTML, references or abstract. The distribution of item views is likely tobe biased as the logs only record documents sent and will not record repeated views to locally cached table ofcontents or issue pages stored on the users’ machine. Therefore, the relatively low use of content and issuedocuments may reflect the caching of these pages to the users local machine. However this does not explainthe big di!erences between the two digital libraries.

The following user analyses by type of subscriber and referrer link are just examples to show how deep loganalysis can burrow deeper into the type of item viewed data to seek further explanation and clarification.Further analyses of type of item viewed have published before in Nicholas, Huntington, and Watkinson(2005).

5.1.1. Users as defined by type of subscriber (Emerald)In the case of the Emerald logs users were classified according to whether they were subscribers or not. Sub-

scribers can be categorised in to two types – Big Deal or non-Big Deal. The di!erence between the two is essen-tially their download rights – the former can download full-text articles from virtually any journal on thedatabase, the latter typically a half dozen or so. Non-subscribers have to use their credit card if they wanta full-text article, unless it is one of the journals featured as Journal of the Week, in which case they can down-load this for free. Trialists are a sub-group of non-subscribers who have signed up to a free one month’s trial

Table 2Type of item viewed – comparison between Blackwell and Emerald

Type of item viewed Blackwellsynergy (%)

Emeraldinsight (%)

Issue lists 25 8Table of contents 23 17Abstracts 20 49Full-text articles, of which 31 26a. % in PDF (66) (56)b. % in HTML (34) (44)

Table 1Keynote statistics

Database Number of items viewed Number of sessions conducted Number of users

Blackwell two month study 10,573,353 2,783,727 820,230Emerald one year study 23,564,578 4,789,140 2,013,827

1352 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

during which they can download five full-text articles. Fig. 1 gives the distribution of type of document viewedby type of subscriber. Surprisingly perhaps, it was not the user group who had the most generous downloadingrights – the Big Deal users, who viewed the most full-text articles, but the trialists. For trialists articles madeup 29% of their views, whereas for the Big Deal subscribers this was 24%. This can be put down to a kind ofdigital sales mentality. Non-subscribers (45% of views to abstracts), were plainly using abstracts as a substitutefor the real thing (the article). Non-subscribers obviously used the digital library to check/identify material,56% of views were to lists of various sort, with 36% being table of contents.

5.1.2. Users defined by referrer link used (Blackwell)The referrer link details the site previously visited by the user before arriving at the Synergy site. Many sites

block this information and additionally it is di"cult to categorise sites, as there is no standard convention forcategorising sites. For example, picking out academic library sites involves searching through the dataset andpicking out all referrer links with the word library in the link reference name. However, many libraries will notnecessarily include this in their name. Referrer links were crudely classified into six categories: other, libraryportals, journal links, via Blackwell Publishing (the parent site), via Blackwell Synergy (believed to be internallinks) and Google. For example, the category journal links was based only on users coming to the site via Jour-nal of Nursing and Journal of Addiction, as these two were easily identifiable from the logs. However, not alllinks were so easy to identify, and consequently we have not identified all sessions coming in via journal links.Hence the following will not give a true estimated distribution over referrer categories but is given for illus-tration only as to the kind of analysis that can be done with the assistance of further fieldwork.

The route by which a person reaches the Synergy site possibly says something about them and we investi-gated this possibility. Fig. 2 examines the distribution of what type of items are viewed by referrer link. Thosecoming in via Blackwell Synergy (the internal link) were unsurprisingly most likely to view articles: 36% did so.Those arriving via a library link or Blackwell Publishing (host site) were more likely to view content pages/issue lists, respectively, 66% and 64% did so, and were less likely to view abstracts, between 8% and 9% viewedabstracts.

5.2. Site penetration

A more powerful and illustrative way of examining the number of items viewed is to categorise search ses-sions by the number of items viewed. We call such an analysis ‘site penetration’. Research we have conducted

Fig. 1. Type of item viewed by type of subscriber.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1353

elsewhere in health and media (Nicholas et al., 2004) showed that many web users do not dwell, they examinejust a few items/pages before they leave – sometimes satisfied or, if not, go and search for information else-where. In some cases only a home page or introductory page is visited and in these cases no substantial contentis consumed, although knowledge might have been gained. We call these people ‘bouncers’ or ‘end-user check-ers’. The question we sought to answer was whether there was anything about a digital journal library thatwould make it di!erent, in site penetration terms, to other consumer websites? Thus, in a way, you mightexpect a high level of penetration as a result of: (a) the bibliographic and full-text mix which gives a naturalmovement as a result of the toing and froing; (b) the massive choice of data on o!er (hundreds of full textjournals); (c) the investigative nature of some information seeking; (d) the presence of an embedded searchengine and other retrieval aids. But as our data shows this does not seem to have made much of a di!erence,what we see instead is classic web consumer searching (shopping) behaviour that results from massive choice.

Thus Table 3 shows that well over two-thirds of Blackwell and Emerald users viewed between 1 and 3 itemsand in the case of Emerald 42% of users viewed just one item. The similar figures for the two Blackwell data-sets suggest that the metric is quite stable. How deeply a person penetrates or investigates a site is clearly aninteresting metric, showing variously, interest, satisfaction and ‘busyness’. It might also tell us somethingabout searching style, digital visibility, the structure and nature of the website. A number of hypothesesmay be postulated which explain this distribution. Users might access the site just to see what is there butreturn later to pick up their material. Alternatively, users (students more likely) may be given the exact Inter-net reference of an item in a bibliography or link to an A&I service like PubMed, and thus go directly to viewthe item without investigating other pages (and indeed, as we shall see later, this does happen quite frequently).A further possibility relates to the nature of the Internet itself. In a many cases users will use a search engine tofind the site and these engines return a number of clickable links that the user will cycle through. Clicking on

Fig. 2. Type of item viewed by referrer-link.

Table 3User classification by number of items viewed in a session

Type of user/session Number ofitems viewed

Emerald(January–December 2002)

Blackwell(17th September 2004)

Blackwell(February 2004)

Bouncer/checker 1–3 70 68 67Moderately engaged 4–10 20 24 26Engaged 11–20 6 5 5Seriously engaged Over 21 4 3 2

Total 100 100 100

1354 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

the first link, viewing maybe a page or two to see what’s there and then go on, if their search has not beensatisfied or only partially satisfied, to the next link – hence the term end-user checkers. There are other possibleexplanations as well. For example, users might access a site and look at one article and determine that it meetstheir information need and end the session. Users may be diverted to a more urgent task and prematurely ter-minate a session. If a fee is required, users may terminate the session in favour of using a ‘‘free’’ resource untilthey have narrowed-down or better understand the scope of the search topic before returning to a fee-basedresource. However, these are all hypotheses and follow-up qualitative research is needed to find the reason andrational behind these kinds of behaviours.

The number of views made in a session provides an idea of the degree of penetration of a site, but, the met-ric says little about the quality or substance of content retrieved. For example a session featuring 1 to 3 viewssuggests limited or checking use; however, this would be truer if these pages were what we might term menupages (issue lists & TOCs) rather than article (or abstract) views. Clearly what the user is viewing in a sessionimpacts on the site penetration metric.

5.2.1. Users defined by occupation (Blackwell)Postgraduates turned out to penetrate the site least, with well over one-third viewing three items or less in a

session (Fig. 3). Undergraduates, perhaps contrary to expectation, penetrated the site most, with 19% viewing11 or more items in a session. This may be due to their unfamiliarity with a research topic which requires themto view more items compared to postgraduate students who can better specify their information needs and aremore knowledgeable about a topic, so they are able to disregard (i.e., filter out) irrelevant content, whichallows them to view fewer items. These were registered users and the fact that the bouncer/checker proportionswere about half that of the total population of users (as shown in Table 3) probably reflects the commitmentand loyalty shown by people who had bothered to register.

5.2.2. Users defined by place of work (Blackwell)Interestingly, the user’s place of work is not a statistically significant8 variable (Fig. 4) and there are no real

di!erences in the number of requests in a session by place of work.

Fig. 3. Number of views in a session by occupation. Adopted from Nicholas et al. (2005, p. 274) (with permission from Emerald).

8 There was insu"cient evidence to reject the null hypothesis (chi squared) at the 5% significance level. Hence for example the di!erenceof 29–24% is due to sampling and does not reflect an actual di!erence between place of work and requests in a session.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1355

5.2.3. Users defined by type of subscriber (Emerald)For this analysis subscriber groups have been divided further: trialists have been divided into two types –

(those joining) online or o#ine, and new groups have been identified – users who searched via Ingenta andtook advantage of journal of the week promotions (Fig. 5). Non-subscribers recorded the highest percentage

Fig. 4. Place of work by requests in a session.

Fig. 5. Items viewed in a session by type of subscriber (Chi = 862931, 24df.000).

1356 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

of sessions where only one item was viewed. Three-quarters of non-subscribers (75%) only viewed 1 page.Fewer than 6 times as many non-subscribers viewed a single item compared to Big Deal subscribers. Clearlythese users see what is there and leave without fully exploring the site, perhaps with the view of returning at alater date or simply going somewhere better. We call these people bouncers, and they are a big feature of mostWeb sites, even academic ones.

O!-line trialists were the least likely to view one item in a session, these people were plainly giving the site areal test – 55% of these users viewed 4 or more items in a session. Journal of the Week users also made gooduse of the site as nearly two-thirds (64%) viewed 4 or more items in a session. On-line trialists were least likelyto conduct a session where only a single item was viewed, which does suggest that this metric is one that mea-sures interest.

5.2.4. Users defined by geographical location of the user (Emerald)Fig. 6 shows that UK users were the most active when online – 53% viewed 4 or more items in a session

and Western European users the least active, 35% viewed 4 or more items in a session. This data was obtainedfrom an analysis of IP addresses, and as a result is less robust (UK users may register with a USA serviceprovider).

5.2.5. Users defined by type of university (Emerald)Universities were classified according to whether they were one of the ‘new’ or ‘old’ UK universities. Old

universities tend to be most research active and we wanted to see whether this had an impact on digital infor-mation seeking. We also subdivided them according to whether they subscribed to Emerald’s Big Deal as thiswas clearly an important variable (Fig. 7). Old universities penetrated the site more deeply, and having a BigDeal did not really make much di!erence. Big Deals made a big di!erence in the case of new universities and itwas non-deal new universities that were more likely to have ‘‘bouncer’’ sessions; 15% had as compared to anexpected value of about 9%.

Fig. 6. Items viewed in a session by geographical location of user.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1357

5.3. Return visitors

The number of times someone returns to a site to search is plainly a key metric, which tells us somethingabout site loyalty and satisfaction. Coming back to a site constitutes conscious and directed use. The industrycalls it site stickiness, and everyone wants their site to be sticky. However, in our previous research we havefound, that not only people view very little of a site’s contents, but also they do not come back very often. Weput this down to an information promiscuity that has arisen out of massive digital choice. In theory how fre-quently they return should depend on the nature of the site – a newspaper site, for instance, might be expectedto obtain more return visits. It is not clear what would constitute a natural frequency for a journal site. How-ever, almost by definition, in the case of academics, one would have thought that subscribers would naturallydevelop a repeat behaviour in order to fulfil their current awareness needs. Table of Content email alertservices is one of the factors that can trigger returnees to revisit the sites. Both Blackwell Synergy and Emer-aldInsight o!er this service for their journals.

Table 4 (Column 2) which shows the number of times Emerald users returned to the site during 2002, how-ever, tells us otherwise. It shows that the large majority of people (69%) visited the site once during the 12

Fig. 7. Items requested in a session: Old vs New Universities (Chi = 867.7 12df.000).

Table 4Users grouped by number of visits made during survey period

Number of visits Emerald(January–December 2002)

Blackwell(February–March 2003)

1 69% 63%2 to 5 24% 28%6 to 15 5% 6%Over 15 2% 3%

Total 100 100

1358 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

month period. Just under a quarter of the users visited the site between 2 and 5 times, about 5% of users visitedthe site between 6 and 15 times and just one and half percent of users visited over 15 times. Given the fact, thatin some cases, the user is a multi-user the numbers of individuals returning is probably an overestimation.

Interestingly, the Blackwell data (Column 3), despite being collected over a much shorter period (2months), shows higher levels of return visits, although even here just less than two-thirds of users did not revi-sit within the survey period.

5.3.1. Users defined by occupation (Blackwell)Fig. 8 should go some way to removing the worries that educational policymakers might have regarding the

current awareness activities of academics – current awareness is, after all, an important performance metric.Professors and teachers were the most likely to return to the site over the one month period, 48% did so, whileundergraduates were the least likely; only 32% returned.

5.3.2. Users defined by place of work (Blackwell)Interestingly, the user’s place of work was not a statistically significant variable (Fig. 9) and there are no

di!erences in the number of visits by place of work.

5.3.3. Users defined by type of subscriber (Emerald)Unsurprisingly, non-subscribers were most likely to visit once in the survey period, 87% of them did so

(Fig. 10). The real question here for publishers and librarians is to determine why that should be so, was itbecause they: (a) accidentally arrived at the site; (b) were not pleased with what they saw; (c) saw somethingbetter elsewhere?

On-line trialists were most likely to return to the site. Just under three-quarters (71%) visited two or moretimes; this is really fascinating consumer behaviour because all these people get over and above that of ordin-ary non-subscribers is the right to download 5 full text articles during the one month trial period. Su"cientbait it would seem for the on-line trialists. Nearly half (49%) of these users visited between 2 and 5 times,18% visited 6–15 times and 4% visited over 15 times. With the notable exception of trialists, Big Deal subscrib-ers were the most likely to return to the site, 56% of deal users just visited once, one-third or 33% visited 2–5times, 9% visited 6–15 times and 3% visited over 15 times. It would appear that the Big Deal does engender aloyalty or repeat behaviour – choice and greater opportunity to download proved to be an attraction.

Fig. 8. Number of visits by occupation.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1359

5.3.4. Users by geographical location (Emerald)Fig. 11 examines repeat behaviour by country in which the user was resident (based upon subscriber

details). UK residents come back more often to the site and Western Europeans least frequently. This mayreflect a nationalistic information trait among users. The web may be a wholly international environment

Fig. 10. Number of visits in a year by type of subscriber (Chi = 298268 18df.000).

Fig. 9. Place of work by number of visits.

1360 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

but users do not always share this trait. Alternatively, a language problem (in the case of Western Europe)might also o!er an explanation.

5.3.5. Users defined by type of university (Emerald)There were di!erences between old and new universities in the UK. Old universities were less likely to visit

more frequently, the proportion visiting more than once for old Deal universities was 42% compared to 31%for new universities (Fig. 12). In both cases Deal universities visited more frequently. Old universities with BigDeals visited most frequently – 53% visited more than once.

5.3.6. Users defined by level of site penetration (Emerald)Fig. 13 shows a strong link between the number of visits made and the number of items viewed. Those peo-

ple making most visits were also the people who viewed most items. Well over half (56%) of those people whomade more than fifteen visits a year viewed more than 4 items in a session, whereas the same figure for peoplewho visited once was 26%.

6. Limitations

Standard transaction log analysis has a number of limitations, such as caching, which underreports use,problems with user identification which is normally based on IP authentication, and problems with di!eren-tiating user performance from system performance (Jamali, Nicholas, & Huntington, 2005). Deep log analysis(DLA) methods try to minimise these limitations by enriching the log data and obtaining more robust data.This enrichment procedure can include linking demographic data to log data and categorising users into smal-ler groups rather than looking at a broad picture of the usage. However, even DLA provides little in the wayof explanation, satisfaction and impacts, but what is really does is raise the questions that really need to beasked, in interview and questionnaire. DLA is clearly useful for certain kinds of analyses, like shedding light

Fig. 11. Returnees (grouped) by country.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1361

Fig. 12. Number of visits in a year – UK Old and New Universities (Chi = 1669.0 9df.000).

Fig. 13. Number of visits by number of requests in a session (Chi = 266556 12 df.0000).

1362 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

on the format of the articles scientists read (PDF or HTML), the age of the articles (obsolescence), and theway scientists navigate to the required material (searching and browsing behaviour). But log analysis is notvery helpful in discovering the value and use of the articles retrieved, or about the rationales behind expressedinformation seeking behaviour.

7. Conclusion

We have reported on a large scale deep log analysis that has provided usage data for two digital journallibraries in order to demonstrate the types of analyses that are possible using such techniques. In so doingwe have also provided comprehensive and detailed insights to the nature of information seeking behaviourin the digital scholarly journal environment.

In regard to the type of items viewed, the picture for the two digital libraries was quite di!erent, largely afunction of the heavy use of abstracts by Emerald users. This might be a result of the site structure as usershave to view an abstract if they want to view the full text, or it might be due to greater use of the Emerald siteby non-subscribers (itself a function of easier access), people for whom the abstract was a substitute for thefull-text article. In terms of individual user groups it was particularly noteworthy that there was a digital salesmentality in the case of Emerald, where trialists made greater use of articles (29% of views) than paid-up sub-scribers (24%) who had much greater download choice.

The results for the two digital libraries, in regard to the number of items viewed in a session, were very sim-ilar with well over two-thirds of Blackwell and Emerald users viewing between 1 and 3 items in a session. Thissupports our previously argued proposition (Nicholas et al., 2004) that web users do not dwell, they examinejust a few items/pages before they leave. The key user features were:

! Non-subscribers were more likely to view a single item in a session than subscribers;! Old university users penetrated the site more deeply, and whether they were part of a Big Deal did not makemuch di!erence. Big Deals made a di!erence in the case of new universities and it was non-deal new uni-versities that were more likely to have ‘‘bouncer’’ sessions (viewing 1–3 items); 15% were as compared to anexpected value of about 9%.

However, a follow-up study is needed to explain this trend especially in terms of user satisfaction. Forexample users who penetrate the site more may be doing so because they cannot find exactly what they wantor need and users who view just a few items and leave might do so because they view exactly what they lookfor and then leave the site.

For both digital libraries around two-thirds of visitors did not return within the survey period and we lar-gely put this down to the information promiscuity that has arisen out of massive digital choice (Nicholas et al.,2004). The higher percentage returning to the Blackwell site is thought to be due to the more pressing currentawareness needs of scientists.

Other features of the returnee analysis were:

! Professors and teachers were the most likely to return, 48% did so, while undergraduates were the leastlikely to; only 32% returned.

! Non-subscribers were more likely to visit once than subscribers.! UK users were more likely to return.! Big Deal universities visited more frequently compared to non-deal universities. Old universities with bigdeals visited most frequently – 53% visited more than once.

By using the kind of analysis we have outlined here we can profile key user groups, as the following exampleof the occupational user group shows. Professors/lecturers proportionally conducted more sessions whichviewed 4–10 items, and were the most likely to revisit the site. Undergraduate students conducted the highestproportion of sessions viewing 11 and more items and were the least likely group to revisit. Postgraduatessearch sessions were characterised by the low number of items viewed. This may be because undergraduatestudents are less familiar with the topic for which they are searching compared to postgraduates; hence they

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1363

need to check more items to find what they want. This is of course just a hypothesis and the issue is yet to beexplained by a qualitative follow-up study.

Based on these findings the direction our future research will be two fold:

1. to investigate the possibility of relating use data to user demographic and perception data by means of aquestionnaire filled in by subscribers to the site. This would provide us with the means to explain useand attribute it to various behaviours. This is in fact now taking place in a study of ScienceDirect(2005–2006).

2. to conduct follow-up survey work with users to obtain answers to the questions raised by the logs. This isbeing undertaken with OhioLINK users (2005–2008).

Acknowledgements

The authors acknowledge the following organizations who helped fund the research reported in the article:The Ingenta Institute, Blackwell, and Emerald.

References

Bishop, A. P. et al. (2000). Digital libraries: Situating use in changing information infrastructure. Journal of the American Society forInformation Science, 51(4), 394–413.

Bonthron, K. et al. (2003). Trends in use of electronic journals in higher education in the UK – Views of academic sta! and students.D-Lib Magazine, 9(6). Available from http://www.dlib.org/dlib/june03/urquhart/06urquhart.html.

Borghuis et al. (1996). As cited in Bishop, A. P. et al. (2000). Digital libraries: Situating use in changing information infrastructure. Journalof the American Society for Information Science, 51(4), 394–413.

Boyce, P., King, D. W., Montgomery, C., & Tenopir, C. (2004). How electronic journals are changing patterns of use. The SerialsLibrarian, 46(1–2), 121–141.

Davis, P. M. (2002). Patterns in electronic journal usage: Challenging the composition of geographic consortia. College and ResearchLibraries, 63(6), 484–497 (and E-mail to the author, 01/05/2004).

Davis, P. M. (2004a). Information-seeking behaviour of chemists: A transaction log analysis of referral URLs. Journal of the AmericanSociety for Information Science and Technology, 55(4), 326–332 (E-mail to the author, 02/06/2004).

Davis, P. M. (2004b). For electronic journals, total download can predict number of users. Portal: Libraries and the Academy, 4(3),379–392.

Davis, P., & Solla, L. (2003). An IP-Level analysis of usage statistics for electronic journals in chemistry: Making inferences about userbehaviour. Journal of the American Society for Information Science and Technology, 54(11), 1062–1068.

Eason, K., Richardson, S., & Yu, L. (2000). Patterns of use of electronic journals. Journal of Documentation, 56(4), 477–504.Eason, K., Yu, L., & Harker, S. (2000). The use and usefulness of functions in electronic journals: The experience of SuperJournal Project.

Program, 34(1), 1–28.Entlich, R. et al. (1996). As cited in Bishop, A. P. et al. (2000). Digital libraries: Situating use in changing information infrastructure.

Journal of the American Society for Information Science, 51(4), 394–413.Finholt, T. A., & Brooks, J. (1999). Analysis of JSTOR: the impact on scholarly practice of access to on-line journal archives. In R. Ekman

& R. E. Quandt (Eds.), Technology and scholarly communication (pp. 177–194). Berkely: University of California Press.Gargiulo, P. (2003). Electronic journals and users: The CIBER experience in Italy. Serials, 16(3), 293–298 (E-mail to the author, 10/05/

2004).Institute for the Future (2002a). E-Journal user: Report of Web Log data mining. <http://ejust.stanford.edu/logdata.html> Accessed

24.04.2000.Institute for the Future (2002b). E-Journal user study: Research findings. <http://ejust.stanford.edu/research_findings.html> Accessed

24.04.2000.Jamali, H. R., Nicholas, D., & Huntington, P. (2005). The use and users of scholarly e-journals: A review of log analysis studies. Aslib

Proceedings, 57(6), 554–571.Ke, H.-R., Kwakkelaar, R., Tai, Y., & Chen, L. (2002). Exploring behaviour of E-journal users in science and technology: Transaction log

analysis of Elsevier’s ScienceDirect OnSite in Taiwan. Library and Information Science Research, 24(3), 265–291 (Email to the author,25/05/2004).

Liew, C. L., Foo, S., & Chennupati, K. R. (2000). A study of graduate student and end-users’ use and perception of electronic journals.Online Information Review, 24(4), 302–315.

Monopoli, M., & Nicholas, D. (2001). A user evaluation of subject based information gateways: Case study ADAM. Aslib Proceedings,53(1), 39–52.

Monopoli, M., Nicholas, D., Georgiou, P., & Korfiati, M. (2002). A user-oriented evaluation of digital libraries: Case study the ‘‘electronicjournals’’ service of the library and information service of the University of Patras, Greece. Aslib Proceedings, 54(2), 103–117.

1364 D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365

Nelson, D. (2001). The uptake of electronic journals by academics in the UK, their attitudes towards them and their potential impact onscholarly communication. Information Services & Use, 21(3–4), 205–214.

Nicholas, D., Huntington, P., Lievesley, N., & Wasti, A. (2000). Evaluating consumer Web site logs: Case study The Times/Sunday TimesWeb site. Journal of Information Science, 26(6), 399–411.

Nicholas, D., Huntington, P., Rowlands, I., Russell, B., & Cousins, J. (2004). Opening the digital box: what deep log analysis can tell usabout our digital journal users. In Charleston 2003 conference proceedings, Charleston, SC.

Nicholas, D., Huntington, P., & Watkinson, A. (2003). Digital journals, big deals and online searching behaviour: A pilot study. AslibProceedings, 55(1–2), 84–109.

Nicholas, D., Huntington, P., & Watkinson, A. (2005). Scholarly journal usage: The results of deep log analysis. Journal of Documentation,60(2), 248–280.

Nicholas, D., Huntington, P., Williams, P. (2004). Digital consumer health information and advisory services in the UK: A user evaluationand sourcebook. London; City University/DoH. Available from http://ciber.soi.city.ac.uk/dhrgreports.php.

Nicholas, D., Huntington, P., Williams, P., & Dobrowolski, T. (2004). Re-appraising information seeking behaviour in a digitalenvironment: Bouncers, checkers, returnees and the like. Journal of Documentation, 60(1), 24–39.

Obst, O. (2003). Patterns and costs of printed and online journal usage. Health Information and Libraries Journal, 20(1), 22–32.Pullinger, D., & Baldwin, C. (2002). Electronic journals and user behaviour: Learning for the future from the SuperJournal Project.

Cambridge: Deedot Press.Rusch-Feja, D., & Siebeky, U. (1999). Evaluation of usage and acceptance of electronic journal. D-Lib Magazine, 5(10). Available

fromhttp://www.dlib.org/dlib/october99/rusch-feja/10rusch-feja-full-report.html.Salisbury, L., & Noguera, E. (2003). Usability of e-journals and preference for the virtual periodicals room: A survey of mathematics

faculty and graduate students. Electronic journal of Academic and Special Librarianship, 4(2–3). Available fromhttp://southernlibrar-ianship.icaap.org/content/v04n03/Salisbury_l01.htm.

Sathe, N. A., Grady, J. L., & Giuse, N. B. (2002). Print versus electronic journals: A preliminary investigation into the e!ect of journalformat on research processes. Journal of the Medical Library Association, 90(2), 235–243.

Smith, E. T. (2003). Changes in faculty reading behaviours: The impact of electronic journals on the University of Georgia. The Journal ofAcademic Librarianship, 29(3), 162–168.

Talja, S., & Maula, H. (2003). Reasons for the use and non use of electronic journals and databases: A domain analytical study in fourscholarly disciplines. Journal of Documentation, 59(6), 673–691.

Tenner, E., & Ye, Z. (1999). End-user acceptance of electronic journals: A case study from a major academic research library. TechnicalServices Quarterly, 17(2), 1–14.

Tenopir, C. (2002). Online Serials heat up. Library Journal, 127(October), 37–38.Tenopir, C. (2003). Use and users of electronic library resources: an overview and analysis of recent research studies. Report for the

Council on Library and Information Resources, August 2003. Available from http://www.clir.org/pubs/reports/pub120/pub120.pdf.Tenopir, C., & King, D. (2001). Electronic journals: How user behaviour is changing. In Online information 2001. Proceedings of the

international online information meeting, London, 4–6 December 2001 (pp. 175–181). Oxford: Learned Information Europe Ltd.Teskey, P., & Urquhart, E. (2001). The acceptance of electronic journal in UK higher education. Information Services & Use, 21(3–4),

243–248.Tomney, H., & Burton, P. F. (1998). Electronic journals: A study of usage and attitudes among academic. Journal of Information Science,

24(6), 419–429.TULIP Final Report. (1996). Elsevier Science, Amsterdam. Available from http://www.elsevier.com/wps/find/librarians.librarians/tulipfr.Yu, L., & Apps, A. (2000). Studying e-journal user behaviour using log files: The experience of SuperJournal. Library and Information

Science Research, 22(3), 311–338.Zhang, Z. (1999). Evaluating electronic Journals services and monitoring their usage by means of WWW server log file analysis. Vine, 111,

37–42.

D. Nicholas et al. / Information Processing and Management 42 (2006) 1345–1365 1365