7
Annals of Library Science and fucunentation 1980, 27(1-4), 45-51. ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been defined and the different processes involved in indexing have been enumerated.: Automatic indexing and the various conditions that are to be met have been discussed. The definition of KWIC index, its origin and the method of its production have been delineated. Structure of KWIC index has been illustrated with the analogy of a horizontal wheel and a camera. It is stressed that continuity of the context at the extreme edges should be maintained whenever the be- ginning or the end of a title forms part of a KWIC entry. A modification in the output format of the KWIC index which can be called as the 'Altered KWIC' or AKWIC index, has been suggested. Advantages of AKWIC over KWIC have been mentioned. Some of the limitations of KTNICand AKWIC have been pointed out. Per- mutation on titles and permutation on subject headings for the production. of KWIC index have been compared. It is suggested that adoption of KWIC index by Indian periodicals and orga- ni8ations should be encouraged. 1. INTRODUCTION Indexing has come to be accepted as a well-defined branch of the discipline of in- formation science. It is recognised that in- dexes are indispensable tools for the dissemi- nation and retrieval of information. 1.1 Definition of an Index A good definition of an index could be quoted from the American National Standards Institute (ANSI) standard on indexing (1) "An index is a systematic guide to items contained in or concepts derived from a collection. These items or derived concepts are represented by entries arranged in a known or stated searchable order such as alphabetical, chro- Vol 27 Nos 1-4 (Mar - Dec) 1980 H.Y. MAHAKUTESHWAR National Information Centre for Drugs and Pharmaceuticals (NICDAP) Central Drug Research Institute Lucknow- 22600 I Uttar Pradesh noloqical or numerical". In short, an index is an indicator of content and location. 1.2 Process of Indexing The process of creating the entries in an index is known as 'indexing'. The key operations in the indexing process are: i. Scanning the collection ii. Subject or content analysis iii. Selection of identifiers iv. Addition of locations 1.3 Mental Processes in Inde~ing Now let us examine the mental processes involved in 'indexing'. The main question now is how an indexer goes about while he selects the indexing words. Some indexers particular- ly, the subje~t specialists, can read a text and 'understand' it, their under~tanding being shown, perhaps by their ability to formulate its subject in words other than those used by the author. Very little is known about the mental processes i~volved in su~h human under- standing. Other indexers can not 'understand' a text in this full sense, but they are adept at picking out from it words, .phrases, sen- tences that the author has emphasized as impor- tant, the title and section headings, intro- duction, conclusions, summary and so on. Some indexinq systems like the present one under conside~ation deliberately restrict the selected phrases to thf title words, to mlnl- mise the skill and effort needed at the stage of input to the system,

ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

  • Upload
    docong

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Page 1: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

Annals of Library Science and fucunentation1980, 27(1-4), 45-51.

ALTERED KEYWORD IN CONTEXT(AKWIC) INDEXING

lndexing lws been defined and thedifferent processes involved in indexing havebeen enumerated.: Automatic indexing and thevarious conditions that are to be met havebeen discussed. The definition of KWIC index,its origin and the method of its productionhave been delineated. Structure of KWIC indexhas been illustrated with the analogy of ahorizontal wheel and a camera. It is stressedthat continuity of the context at the extremeedges should be maintained whenever the be-ginning or the end of a title forms part of aKWIC entry. A modification in the outputformat of the KWIC index which can be calledas the 'Altered KWIC' or AKWIC index, has beensuggested. Advantages of AKWIC over KWIC havebeen mentioned. Some of the limitations ofKTNICand AKWIC have been pointed out. Per-mutation on titles and permutation on subjectheadings for the production. of KWIC index havebeen compared. It is suggested that adoptionof KWIC index by Indian periodicals and orga-ni8ations should be encouraged.

1. INTRODUCTIONIndexing has come to be accepted as a

well-defined branch of the discipline of in-formation science. It is recognised that in-dexes are indispensable tools for the dissemi-nation and retrieval of information.1.1 Definition of an Index

A good definition of an index could bequoted from the American National StandardsInstitute (ANSI) standard on indexing (1) "Anindex is a systematic guide to items containedin or concepts derived from a collection.These items or derived concepts are representedby entries arranged in a known or statedsearchable order such as alphabetical, chro-

Vol 27 Nos 1-4 (Mar - Dec) 1980

H.Y. MAHAKUTESHWARNational Information Centre for Drugsand Pharmaceuticals (NICDAP)Central Drug Research InstituteLucknow- 22600 IUttar Pradesh

no loq ical or numerical". In short, an indexis an indicator of content and location.

1.2 Process of IndexingThe process of creating the entries in

an index is known as 'indexing'. The keyoperations in the indexing process are:

i. Scanning the collectionii. Subject or content analysis

iii. Selection of identifiersiv. Addition of locations

1.3 Mental Processes in Inde~ingNow let us examine the mental processes

involved in 'indexing'. The main question nowis how an indexer goes about while he selectsthe indexing words. Some indexers particular-ly, the subje~t specialists, can read a textand 'understand' it, their under~tanding beingshown, perhaps by their ability to formulateits subject in words other than those used bythe author. Very little is known about themental processes i~volved in su~h human under-standing. Other indexers can not 'understand'a text in this full sense, but they are adeptat picking out from it words, .phrases, sen-tences that the author has emphasized as impor-tant, the title and section headings, intro-duction, conclusions, summary and so on. Someindexinq systems like the present one underconside~ation deliberately restrict theselected phrases to thf title words, to mlnl-mise the skill and effort needed at the stageof input to the system,

Page 2: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

2.

MAHAKUFSHWAR

2.4 Mechanised Indexing based on TitlesAUTOMATIC INDEXINGThe phrase automatic indexing is usually

used to indicate the application by a machinefor any or all the stages in the process ofindexing.2.1 Conditions to be met

Automatic indexing in this sense requir-es (a) that the text of documents be in a formthat can be 'read' by a machine (b) that themachine can 'recognise' individual words, and(c) that the machine be provided with the setof rules for selecting certain words andphrases as indexing entries.

2.2 The First Two ConditionsThe first two conditions are met easily

with the advent of the computer technology. Thetext can be made available in a machine readableform through codes on punched cards, punchedpaper tape, magnetic tape or disk etc. Suchcoded text can be 'read' and manipulated bydigital computers.

2.3 Selection of KeywordsSelection of index words can be achieved

by two alternative types of rules. In the firsta human analyst compiles a list of key words ofpotential interest to future users of index,this list is compared by the computer with eachword of the text of a document, if a matchoccurs, it is selected and these selected wordsmake up the index entries for the document.

The second type of machine indexingworks exactly on an opposite principle: a humananalyst compiles a list of words that are notto be selected for indexing. These includeall the common words such as 'the, of, an, on,a, that, are, not, tO,be, for' and so on, andalso 'general' words that have little 'specif icmeaning (e.g. in scientific texts are the words'report' 'theory', 'conclusions', etc.). Therestill remain in each text hundreds or thousands •of other words that can not be used as indexentries. These are controlled by a methodknown as 'statistical indexing'. In this sys-tem, frequency count of each non-general wordin the text is determined and then the mostfrequent words from this group are selectedas the ones representing the subject matter (8).Since automatic indexing, by itself is a subjectof specialization, let us not enter into the .intricacies, as the above discussion providesenough general background to the topic of ourpresent interest.

46

The use of title words for indexing haslong been common because of the least skilland effort needed for creating an index. Itis also claimed that preparing indexes basedonly on titles is a method more widely practisedthan many indexers would be willing to admit.The use of only title words can be successfulif titles do effectively describe a document'scontent. Because this is not always the case,they may be supplemented, to produce enrichedtitle indexing. An important example isBiological Abstracts which examines the titleof each article and if found inadequate, addsextra words to the title.3. KWIC INDEX

During the last two decades there havebeen applications of machines for the produc-tion of indexes using titles which are intendedfor normal manual use. The best known varietyof such indexes is that developed by H.P. 'Luhn(6) and known as permuted or "Key-word-In-Context" (KWIC) index.

3.1 DefinitionKWIC is a multiple entry natural langua-

ge indexing system, which is produced by feed-ing to a computer the titles of the documentsto be indexed and having the computer make oneentry for each word in the title. The Computermoves the title laterally retaining the re-lative order of the words, but brings the entryword in the centre of the page, with the restof the terms 'wrapped round' it, to keep theentryword in context (See figure 1).

3.2 OriginIt should be noted that KWIC indexing

is new only in the method of its productioni.e. by computer, and in its physical layout.The indexing principle involved must be theoldest that exists, in that it consists simplyof using terms in a title to serve as indexterms. The principles were written byCrestadoro (2) as far back as 1856, andcatalogues probably had each term citedseparately at the head of the entries to whichit referred, in the manner of what is nowknown as the Key Word Out of Context (KWOC)index. Here the computer first prints thesignificant term for each entry and cites thetitle below it. Though the principles arevery old, there is no doubt that the cheapnessand speed with which such indexes can now beproduced makes them a very attractive pro-position, for some purposes,

Ann Lib Sci Doc

Page 3: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGA.t"~P.J·YL'.f 5'J"TITIITr.O P1LY:"fC 16)::I=g:~:i~I~I~iei~~V~!J~~~·o~E~Eo;:1!:~~g~g[~~5~~~~~"~~i~~~~~f~)~~rgr5~:iA.t~OT~r'lJLr~I'~ otRIVATIVf.'- '~A 412A.I~OrHI'!JLTL 'L~~.TI'rWO .~rTA.1 414::~i~~~iLiij·:~¥~g~Y;t~~~i.·~~;~:~'t~~~.'PI:rl.Lr' rATTf lrID O[RrVATIV[~- 2'9A'l~~LI:!. 1r.5T~~~[NIC~ ANO ~tTH 444."~1LT:~1 • '4- Dt"YO.O-lq-~ 448A"~RD.IC :"LTII~r rJLLr.OTNG "0 r '04,.&r5T">:TI: l'n"r~.~/_,n_ In- po, i68:~:~~~~~;l~~:~61~~~~~i~~~5ANn "Y Ij;&~~E5THr.TT~~. a~TI.~PR4YTu~TCS. wr 4n6A~lrST~~Tl~~. t~wT~JTOPS A~)TN5T T lQtA'"[ST4rrfr~./~T_ .LKYL"IN~-'.~- ooeA~l~STH~TIC~/qTS nTlrJ~'LlvT·r~'T~TL llBl~lLEPTtC~1 ." RQ.~rH~D lL~rL r~T 4~Ol'IL~~':~/LO.~- CI.' •• TL-4- 'TKTL 274l~lr.~~~Tr 1~~~TS/5~ HyPOT~~5JV~~. }~7l'"L~r~IC l~r"~/loLY~ArrTIC lCIO 400"ALG~~IC ,.r ",'_PY.ETIC ':TI'I' no)lNlLGf~TC l~n lNTI~ypr~Tr~~TV~ ~rr nq7~~lL~~Sre a"n '¥Tr~Y~rRT~~~rvr a~T Oqtl~ll.G~SIC "In .qTI~~Ll~YlTLR" • P 704l~lL~~~,r "'n A~TTDYR[Tlrl pUr~YL 261l~l'.~£SIC a~n ~~Tt~'R~TO~I~ 'CTTVl 005l-'AIJr.f~r~. I~TTC'~VIIL5"'-l'f' A..,"AN Q4nl~&r.Gr~Tt. A~TT~yorRTr~5Tvr./T~'O n4nA~l~G[srr, ~"'TTT~rL'~~ATnqy A~~ ~U OA6A"LG~~'C. v.~n'tL~Tn.y ."~ sP,sMn 2~4"-'lr,r.r.~'::.' 0]9l~aL~~~IC.1 • ~TcnTIN'~I~O- r.T~ ~~,1~Ar,~E5IC./~~ 1~T~Q~EntlTr5 ~QO~ C OA7l"'A',~E(.[CI • Gf,Y~r"'r 1«J1"'GG~SICI '~I,)pY.I~Int.~ "1,. N_ 11~••• r,G~qCI".E ••~ ~'''NA~'T~tON~ 0 20)'"~L~r~'CI'XY'L~YLP"E'YL-CI)- 1X~- )IRlll.LG~~rr~ I~~ A~TtPYRETtCS/4~""L 4t1A~'r,G~~'r~ t \~~TOVRrTTC~ ET~./PT~ 2~~1tt,j"'JG~'SIC" ~ "'-';l"tOVQr;:TIC-;/IITY~JC I, :)16A'UL';F:~T~S It C"'~ :)~DQF:S5A'lTS/CP'J"'a '''4l"LG'~t~5 , •• RC'TIrSI 1-C)- PH~. 21R:~:tg~:J~~:=~::;~~~~~::!:~~~~~s~;~Jl.ALG~ST~~ '~n "T'I'rL •••• T~RT[~1 41'A~lLC~S'~~ l~n A~TTINrLa~~lT'DTr~1 4~.IN''.C~~T~~ ,~~ 'NTTpYR~TrC": ltTO '"'lNl~~€SI:S 'Nn •..,TTDYQ~7JC~/Ct~ ~~ .5~1,~'fJGr~J:~ .~~ 'VrTPyqTTtr~/~~~OXY "1A~lLG~~t=S ~NO a'lrTPYArTTC~/T1~ lr tq~l\l"LG~~T::'~ ,,,,..'Ir,ITTTIIS5Tv~s/nTNn.,( 50'2a~'~~!~I~S a~n ~t~1qTC S~CDrTT~~ 1 0431~".~~~lC~ ~~n uyp'~JParWr~~/[DAlT, ~~lA~_L~~~T:S ''''0 ~yp~~~"~IV[~1 ,. ~[ ~l~~~'~GtSI~~ ,~n ~ypnT~~SIvr~/LP'RD~ 1'~~~'(,~~~t:~'~n ~'P~T["5JV~S/\l'[T41 ~'7lNlr,~~~I~~ ,'In ~aqr~Tlr l~T'~~Vr~T 2nolIrllILG!=:Sr:S lNr'I ..,r.'IQI)"r.PTTrS/&. '''1 3'24."LC.~'I:~ .on .~.4yp~nT'C51 ]~OA~IL~~SIC~ ,''v SPI~~nLYT1CS/ulaZT~ 20'2l~lt.GC:SI~S ,"",., T~Y"'!'U~lLrpTr~~/Yr·T 407"'LGE~TC' ••• n ••TtT~5S"~S/CI- P Z?I'''Lr.fS'~S. "TI~lr''''Tlr.!. urrrll ~16.tULGc::!;t::"S. l~TTr''JVtTf.S'''T5. A'lTT'" <Insl,,"r ..•G~" T.C.C;..a.,TT;")11 D.PU,n,J:'4~. ''l.T Till. Q Q

Fig.I. Sample KWIC index (produced by the author)KWIC and KWOC indexes are often loosely

referred to as "permuted" or 'permutation'indexes, bu{'Sh~rp (7) points out that thfsterm is incorrect since it implies arrangementin alternative orders, which does not happenin this type of index. He prefers the term"cycling" and D.J. Fosket (3) uses the expre-ssion "rotated" indexes.3.3 Illustrations of the Structure

For the purpose of illustration of thestructure of the KWIC index, let us take thefollowing title, as an example, viz.

Histamine receptors in brain.Let us now construct the KWIC entries

for this title as the computer does it. Thetitle is read word by word. The first wordviz. 'Histamine' is read. It is checkedwhether it is a keyword or not. Since it is akeywor.d it is pushed to the central index

Vol 27 Nos 1-4 (Mar - Dec) 1980

column and the remaining words are printed'as they occur i0 the title. Next, the secondword viz. 'receptors' is read and since it isa keyword, the first word is pushed to the lefthand side of the central column and the sec0ndkeyword is brought to the central index columnposition and the rest of the title is continuedafter that. Aqa in the third word viz. 'in'is taken and is ignored without constructingany index entry, since 'in' is not a keyword.Similarly, the word 'brain' is taken and anindex entry is constructed after pushing therest of the title to the left hand side ofthe central index column. For the above title,KWIC index entries for the words histamine,receptors, and brain are constructed whllethe word 'in' is ignored, as follows:

Centra 1indexCo 1umr.~ISTM~!~;E PECEo;GJ::'~ 11\ BRp~'~;

~rSTA~!IIE P.E(EP70~' 1:; 8P'Ai~.'~IST"MH;E P.ECEPTOR~ I~ BP.;i~'

47

Page 4: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

MAHAKUTESHWAR

It is to be noted here that in the aboveillustration the 'wrapping round' has not beenshown. In case the title is longer than thespace on the right hand side cf the centrallndex column then to make use of the fullpotential of the space available, the restof the title is printed on the left hand sideOn the same line, as shown below:

IN BRAIN/. HISTAMINE RECEPTORSHISTAMINE RECEPTORS IN BRAIN/

INE RECEPTORS IN BRAIN/ HISTAMSimilarly in the third entry above, the

part of the first word that spills over onthe left hand side is brought on the righthand side to make use of the empty spaceavailable, for providing maximum possibleinformation to supply the context.3.4 Analogy of a Wheel and a Camera

The process can be explained in a vividmanner by imagining a horizontial wheel of1 or 2 cms thickness on whose curved surfaceis written the title as shown in figure 2.

,,

,,/,,,

/,,,

Fig.2.Illustration of theKWIC entry formation

It should be visualized that the axis of thelense system of the camera passes throughthe centre of the wheel and the horizontalplane of the circumference of the wheel.

As soon as the key-word comes in fronton an imaginary camera, the t i tle availableon the front side, if photographed, one

48

gets a KWIC entry. The K\.:ICentries of the6iological Abstracts are rotated in thisfashion by joining the beginning and end ofthe title in the form of a circular loopwhose diameter is equal to the width of theKWIC index. In such a process there will besome amount of discontinuity in the contextas the last word of the title sometimesmay not have dlrect conceptual link withthe first word of the title.

In the opinion of the author thefollowinq way of creatinq entries is morehelpful than the one explained above. Thatis, the imaginary wheel will not contain anything on the opposite side not facing thecamera. The title after the right hand edgewill immediately continue at the left handedgt as shown in figure 3.

~~-*-M-O-D-U-'L~°L··..~CIIN

,,,,,,,,,.,,,,,,,,,,,

/,,,/

,,/

//

/,/

//

Fig.3. KWIC entry with continuity of contextat the edges.

This means always continuity in thecontext is maintained between the edges exceptwhen neither the beginning of the title nor theend of the title is facing the camera. In the·latter case it is like a slit instead of awheel through which the title is being passedas shown in figure 4. Of course the width ofthe slit is equal to the width of the index.

The slit phenomenon starts operatingparticularly when the title is longer than thewidth of the KWIC index. After the KWIC entrieshave been created for all the input titles,. theentries are sorted in the alphabetical orderof the keywords at the centra~ column.

Ann Lib Sci Doc

Page 5: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING

OF PROLACTIN,,,,,,,,,,,,,,,,,

" ," ,

,,

,",

",,,/

/,,"/

""

Fig.4. Illustration of the slit phenomenon.

If these entries are printed with thecorresponding location, against each of them,then the resultant product is the actual KWICindex ready for use.3.5 Method of Production

KWIC-entries may be prepared in two ways:ta) The title may previously be scanned byan editor and each significant word tagged.An entry will be constructed by the computerfor each tagged word.(b) The computer may be supplied with alist of stopwords known as 'Stop list' whichcontains insignificant words such as articles,prepositions, conjunctions, pronouns, auxillaryverbs together with such general words as'aspect', 'different', 'method', 'problem' and'very' etc. which are believed to be of littlevalue in retrieval. An entry will be construct-ed by the computer for each title word not onthe stop 1ist.

Whichever method is used, the entriesare then sorted into alphabetical order of theentry word and printed out-against each entrybeing its appropriate location indicator. Theconstruction of entries by use of a 'stop list'is much more common than by tagging significantwords since less intellectual effort is requiredat the input stage. Some stoplist are quiteshort, and others extend to several thousandwords. The longer the list the fewer theindex entries one would expect (8).• Helbich (5)attempted to find some relation between thenumber of stopwords and the keywords per

Vol 27 Nos 1-4 (Mar - Dec) 1980

title. It was found that there was no simpledependence between the number of stopwordsand the entries in a KWIC index. He alsoconcludes that selection and tagging of titlewords provides a briefer and better index thanthe use of the stop list.

3.6 Alteration in the KWIC FormatThe author has produced an annual

commulative KWIC index for one of the currentawareness bulletins published by his infor-mation centre (Fig.l). The details of which arebeing reported elsewhere.

When the annual KWIC index along withan introduction and instructions on 'how torefer it', was sent for the users' comments,following few observations were made. Whilesome of the intelligent users could immediatelygrasp the utility of the KWIC index, a numberof users could not understand why the alpha-betical arrangement is at the central columninstead of at the beginning or at the firstcolumn of the index. The continuity of thecontext at the edges was a very inconvenientthing for them since they were exposed tosuch a type of index for the first time. Theyimmediately suggested not to alphabetize theindex at the central column and let thealphabetization be in the first column or atthe beginning of the index. They also opinedthat let the title be printed continuouslyalong the full width of the index in only asingle column like a conventional type ofindex, instend of the double column formatas in a KWIC index.

49

Page 6: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

This meant changing the total structureand format of the KWIC index as accepted every-where. After a serious thinking it occuredthat why not combine the advantages of theconventional index with those of the KWICindex and give the users a newly formattedKWIC index which they might find more conve~nient. Eventhough the whole concept is in theidea plane, it is presented here with a viewof getting comments or suggestions.

The altered KWIC (i.e. AKWIC if oneprefers to call it) format is as follows. Thewidth of the index remains the same. Nowinstead of the camera being placed at themiddle of the wheel it is placed at the extremeleft edge of the wheel as shown in figure 5.Here again obviously the axis of the lensesystem of the camera is tangential at theextreme left edge of the wheel and also passesthrough the horizontal plane of the cir-cumference of the whee 1 •

The AKWIC entries for our previousexamples now get rearranged as shown belowin figure 6.

BRAIN/ HISTAMINE RECEPTORS INHISTAMINE RECEPTORS IN BRAIN/MALE RAT/TION OF PROLACTIN REPORTS INMODULATION OF PROLACTIN REPORTS IN MALEPROLACTIN REPORTS IN MALE RAT/TION OFRAT/TION OF PROLACTIN REPORTS IN MALERECEPTORS IN BRAIN/ HISTAMINEREPORTS IN MALE RAT/TION OF PROLACTIN

Fig.6. Sample AKWIC index without locationpart.When the AKWIC entries are examined care-fully they are exactly similar to the KWICentries except that, now the alphabeticalarrangement is like a conventional index andthe continuity of context at both edges, like,in a KWIC index, is still maintained. Theadvantage of the utilization of the totalavailable space for providing the maximuminformation as in KWIC index, has also beenmade use of AKWIC now has a better readabilityand still maintains the key word in context.While in KWIC the context is immediately onboth the sides of the key word, in AKWIC theleft hand part of the context goes to theextreme right edge as shown in the figure 6.

4. DRAWBACKS OF KWIC INDEXThe main draw back with the indexes of

this kind is that it is possible to search

50

MAHAKUTESHWAR

for only one index term at a time, and thereis no possibility of co-ordinating termsexcept by scanning the block of entries underone of the chosen search terms. If the numberof entries under a given term is small enoughthis process of scanning causes no difficultybut with indexes of the KWIC or AKWIC typethe number of such entries is likely to bevery high for large collections of documents.

Let us take some examples from a hypo-thetical index and examine the nature of theproblem for different conditions. Let us callOn the number of documents in a collection andTn the average number of terms assigned toa document. NOW, if the repetitions of theterms are taken into account, the total numberof terms is the product (On x Tn). In practicerepetitions generally do occur. If eachterm is counted only once, then the totalnumber of terms Vn in the vocabulary is lessthan or equal to the product (On x Tn)'

i.e. Vn<.OnxTn.Hence, An the average number of entries

under one term in a KWIC index can berepresented by the formula

A_On x Tnn -

VnIf the value of An in a given situation

is low, then searching the index will causeno problems. If it is high, then KWIC may beunacceptable and resort may have to be madeto post co-ordinate indexing system. Thiscan be explained using the following illus-tration. \4.1 Illustration

With a collection of 50,000 documents,a vocabulary of 1,000 terms and an averageof 10 terms used for each document in indexing,the average number of entries to be scanned ina search, works out to be 5,000. This is muchtoo high a figure to be acceptable.

If however, we have a collection of25,000 documents, a vocabulary size of 25,000terms, and an average number of terms used perdocument of 5, then only 50 entries would haveto be scanned in a search on an average.

This does not take into account the factthat it may be desirable to be able to performlogical product, sum and difference searches,though where the number of entries to bescanned is low it is now difficult to examineeach to determine whether it meets the search

Ann Lib Sci Doc

Page 7: ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXINGnopr.niscair.res.in/bitstream/123456789/28063/1/ALIS 27(1-4) 45-51.pdf · ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING lndexing lws been

ALTERED KEYWORD IN CONTEXT (AKWIC) INDEXING

requirements. However, titles tend to indicatethe content of documents to a rather shallowdegree and if detailed indexing is carried,~utat all, the KWIC principle is unlikely to be'acceptable.

In whatever way KWIC or KWOC indexesare prepared, they create two problems insearch. First, although each entry word isaccompanied by the rest of the title'in' or'out' of context, scanning the index to coor-dinate two or more search words is not aneasy task. Second, words are used as they areprovided by authors in titles and to ensureadequate recall, the searcher must think ofa~ternative spellings (or even mis-spellings).Slngular and plurals, synonyms and quasi-synonyms or in short, the search terms must beconsiderably expanded multiplying the scanningeffort. Savings at the input may be more thanbalanced by the work to be done at output.5. PERMUTATION ON SUBJECT HEADINGS

A byproduct of a computerised cataloguecan be a KWIC index. KWIC indexes, as alreadystated, are usually permuted on titles, butthe possibility of permuting on subject headingshas been seriously considered. UniversityLibrary Information Systems Project of theUniversity of Illinois, compared permutingon titles and permuting on subject headingsand decided that latter would be more useful inthe general library situation, where titles areless specific. Permutation on subject headingsis not practical for an entire collection, butis a promising means of dealing with limitedspecial areas of the collection(4).6. CONCLUSIONS(i) The KWIC index has been much popularabroad than in India. While major abstractingserivce like Biological Abstracts has adoptedit as its regular feature, several otherperiodicals and organisations abroad haveemployed it for providing better informationretrieval tool, than the conventional index.Inspite of the fact that a number of computersare available in India, there has hardly beenany attempt either to produce KWIC index orto adopt it as a regular feature by anyperiodical or organisation in India.

Even an organisation like Publicationsand Information Directorate, which is pub-lishing a number of scientific and technicalperiodicals is satisfied with the Keyword outof Context (KWOC) index and the conventionaltype of indexes. Even the method of generat-ing KWOC is totally different. It appears asif it is not completely automated. Theacutal methodology is nothing but exactly theKWIC index methodology, but for the outputformat of the index.

Vol 27 Nos 1-4 (Mar - Dec) 1980

. ,The,reason for KWIC not being popular1n Ind1a m1ght be due to the fact that it hasnot been adopted by many. The reason due towhich KWIC has not been adopted by many inIndia might be due to the fact that KWIC indexapparently mig~t look complex in the beginningfor the users and also because of the somewhat' complex proqramrrinq invol ved. But oncethe use~s get familiarity with the KWIC index,they are'likely to find it quite useful.Therefore adoption of KWIC index in Indiamust be encouraged.(ii) Experiments can also be done on theutility and convenience of AKW~C index suggest-ed in this paper. If AKWIC is more acceptablethan KWIC, because of it's lesser complexitythan the latter, then attempts can be made toadopt it in a large scale. '(iii) The generation of KWIC index need notbe an independant attempt. When so much isbeing talked about creating India's owndatabases, the generation of KWIC index canbe done as a byproduct of the computerisedcatalogues or the databases that would becreated.

7. REFERENCES1. American National Standards Institute:

Standard basic criteria for indexesZ-39.4, New York, 1968.

2. Crestadoro A: The art of making cata-logues.

3. Fosket D J: Classification and indexingin the social sciences. London:Butterworths, 1963.

4. Heiliger, Edward M and Henderson,Paul B Jr. : Library automatio'n:Experience, methodology and technologyof the library as an information system.Helbich J: Direct selection of key-wordsfrom the KWIC index. Paper presentedto an Anglo-Czeck DocumentationSymposium, London, 1967.Luhn H P: Key-Word-in-Context indexfor technical literature (KWIC Index).American Documentation 1960, 11(4),288-95.

5.

6.

7, Sharp, John R: Some fundamentals ofinformation retrieval. London: AndreDeutsch, 1965.

8. Vickery B C: Techniques of informationretrieval, 1971.

51