1964 - Problems in Automatic Abstracting

Embed Size (px)

Citation preview

  • 8/7/2019 1964 - Problems in Automatic Abstracting

    1/5

    FIG. 3t h e 1 44 0 s y s t e m i s u n d e r c o n s i d e r at i o n . T h e p r o g r a m c a nb e e a s i ly a d j u s t e d t o s o l v e a la r g e v a r i e t y o f m e n u p r o b -

    l e m s w it l~ d i f f e r e n t s e t s o f o b j e c t i v e s .Acknowledgments. T h e a u t h o r a c k n o w l e d g e s t h e c o -

    o p e r a t io n o f T u l a n e B i o - M e d ic a l C o m p u t i n g S y s t e m a n dT o u r o I n f i r m a r y , N e w O r l e a n s, o n th e p r o j e c t w i t h s p ec i a la p p r e c i a t i o n f o r t h e s u p p o r t o f D r . J a m e s W . S w e e n ey ,C o - P r i n c i p a l I n v e s t i g a t o r .

    RECEIVEr~ DECEM BEa, 1 963; REVrSED JANUARY, 1964.R E F E R E N C E S

    STIGLE~, G. J . Th e cost of subsistence. J. Farm Economics25 (1945), 303-314.SMITH, V. E . Linear p rogramm ing model s for the de te rm ina t ionof pa la tab le human d ie t s . J. Farm Economics 41 (1961),272-283.WATT, B . K. , MERRILL, A . L ., ORR, M. L . , ET At . Compos i t ionof foods- - raw, processed and prepared . U. S . Depar tmentof Agr icu l ture Handbook No. 8 , 1950 .PERYAM, D. JR. , ~OLEMIS, B. W ., K AMEN, J. M ., EINDHoVEN,J . AND PILGRIM, F. J . Food preference s of men in the U. S.Armed Forces . Dept . o f the Army, Quar te rmas te r Researcha n d E n g i n e e r i n g Co mma n d , Qu a r t e r ma s t e r F o o d a n d Co n -ta iner Ins t i tu te for the Armed Forces , Jan . 1960 .i . Recom mended d ie ta r y a llowance. Na t l . Research Counc il , Foodand Nut r i t ion Bd. , Na t l . Rese arch Counc il Publ . 589 , Rev .1958.~!:i 6. DA NTZIG (~EORGE B. Linear Programming and Extensions.Pr ince ton U. Press , Pr ince ton , 1963 .

    ~J ' : : Vo l u me 7 / N u m b e r 4 / Ap r i l , 1 96 4

    H . R . K O L L E R , E d i t o rP r o b l e m s i n A u t o m a t ic A b s t ra c ti n gH . P . E D M U N D S O NThe Bunker-Ramo Corporation, Canoga Park, California

    A v a r i e t y o f p r o b l e m s c o n c e r n i n g t h e d e s i g n a n d o p e r a t i o nof an automat ic a bs trac t ing sys tem are d iscussed. The pu rposei s t o p r e s e n t a g e n e r a l v i e w o f s e v e r a l m a j o r p r o b l e m a r e a s .N o a t t e m p t i s m a d e t o d is c us sd e t a i l s o r t o i n d i c a t e p r e f e r e n c e sa m o n g a l t e r n a t i v e s o l u t i o n s .1 . I n t r o d u c t i o n

    S i n c e a u t o m a t i c a b s t r a c t i n g i s i n i t s i n f a n c y i t i s f e l tt h a t a p a p e r c o v e r i n g t h e s u b j e c t a s a w h o l e i s a p t t o b em o r e h e l p f u l t h a n o n e w h i c h p l e a d s f o r a s i n g l e c o u r s e o fr e s e a r c h . I n m a n y w a y s t h e p r e s e n t s i t u a t i o n i n a u t o -m a t i c a b s t r a c t i n g i n ti le U n i t e d S t a t e s i s a n a l o g o u s t o t h ee a r l y d a y s o f a u t o m a t i c t r a n s l a t i o n . F o r e x a m p l e , o n l yt w o o r t h r e e r e s e a r c h t e a m s , t o t a l i n g 1 2 p e o p l e , a r e p r e s -e n t l y w o r k i n g o n t h e p r o b l e m o f a u t o m a t J l c a b s t r a c t i n g ,w h i l e a d o z e n t e a m s w i t h a t o t a l s t a f f O f s o m e 1 0 0 re -s e a r c h e rs a r e n o w s t u d y i n g a u t o m a t i c t r a n s l a ti o n . M o r e -o v e r , v a r i o u s U n i t e d S t a t e s g o v e r n m e n t a g e n c i e s h a v ei n v e s t e d s e v e r a l m i l l i o n s o f d o l l a r s i n a u t o m a t i c t r a n s l a -t i o n s i n c e 1 95 3 , w h i l e o n l y s e v e r a l h u n d r e d t h o u s a n dd o l l a r s h a v e b e e n m a d e a v a i l a b l e f o r r e s e a r c h o n a u t o -m a t i c a b s t r a c t i n g s i n c e 1 9 5 8 .I n t h is e x p o s i t io n t h e p r o b l e m s o f a u t o m a t i c a b s t r a c t i n ga r e g r o u p e d i n t o t h e f o l l o w i n g m a j o r c l a s s e s : ( 1 ) c o n c e p -t u a l p r o b l e m s , ( 2 ) in p u t p r o b l e m s , ( 3 ) c o m p u t e r p r o b l e m s ,( 4 ) o u t p u t p r o b l e m s , a n d ( 5 ) e v a l u a t i o n p r o b l e m s .

    P r e s e n t s y s t e m s o f a u t o m a t i c a b s t r a c t i n g a r e c a p a b l eo f p r o d u c i n g n o t h i n g m o r e t h a n e x t r a c t s o f d o c u m e n t s ,i . e . a s e l e c t i o n o f c e r t a i n s e n t e n c e s o f a d o c u m e n t . T h i s i sn o t t o s a y , h o w e v e r , t h a t f u t u r e a u t o m a t i c ~ a b s tr a c ti n gs y s t e m s c a n n o t b e c o n c e i v e d i n w h i c h t h e c o m p u t e rg e n e r a t e s i t s o w n s e n t e n c e s b y m e a n s o f a s u i t a b l e g e n e r a -t i v e g r a m m a r p r o g r a m . T h e o r e t i c a l l y t h e r e i s n o l i n g u i s -t i c o r m e c h a n i c a l r e a s o n w h y s u c h a s y s t e m c o u l d n o t b ed e s i g n ed a n d o p e r a t e d . T h e t o t a l s y s t e m w o u l d t h e n c o n -s i s t o f a p r o g r a m w h i c h o p e r a t e s o n t h e o r i g in a l d o c u m e n ts o a s t o p r o d u c e a n e x ~r ac ~ w h i c h i n t u r n i s f e d i n t o t h eg e n e r a t i v e g r a m m a r p o r t i o n t h a t t h e n g e n e r a t e s it s o w ns e n t e n c e s u s i n g c e r t a i n o f t h e o r i g in a l s e n t e n c e s a s g r i s tT h i s s y s t e m i s d e p i c t e d i n F i g u r e 1 . S u c h a s y s t e m , h o w -e v e r , is a p t t o b e c o s tl y b o t h i n t i m e a n d m o n e y .

    S i n ce t h e c r e a t i o n o f a s u i t a b le g e n e r a t i v e g r a m m a rThis r esearch was suppor ted in par t by the Uni ted Sta tes Ai rForce with funds from Contract No. AF 301602)-2223, monitoredby the Rome Air Development Ctr . . Grif l iss Air Force Base, N.Y.

    C o m m u n i c a t i o n s o f t h e A C M 2 5 9

  • 8/7/2019 1964 - Problems in Automatic Abstracting

    2/5

    p r o g r a m l ag s s o m e w h a t b e h i n d t h a t o f a b s t r a c t i n g p r o -g r a m s , a t t e n t i o n h e r e is c o n f i n e d t o a u t o m a t i c a b s t r a c t i n gs y s t e m s t h a t i n v 0P c e o n l y e x t r a c t i n g .

    D oc um e nt ]AutomaticExtracting

    I Extract IGenerative~Grammar

    Fr o . 12 . C o n c e p t u a l P r o b l e m s

    D E F IN IT IO N OF A N A B S TR A C T. A s s u m e t h a t a n e x t r a c to f a d o c m n e n t ( i . e . a s e l e c t i o n o f c e r t a i n s e n t e n c e s o f t h ed o c u m e n t ) c a n s e r v e a s a n a b s t r a c t . I n d e f i n i n g s u c h a na b s t r a c t o f a d o c u m e n t w e 1 h a s t s p e c i fy t h e f o l l o w i n gt h r e e a s p e c t s : c o n t e n t , f o r m a n d l e n g t h . T h e p r o b l e m o fc o n t e n t i n a n a u t o m a t i c a b s t r a c t i s t h a t o f s e le c t in g o rr e j e c t i n g s e n t e n c e s o f t h e o r i g i n a l d o c u m e n t s o a s t o f o r ma n a c c e p t ab l e e x t r a c t o r a b s tr a c t . T h e p r o b l e m o f f o r mi s t h a t o f d e c i d in g h o w t h e s e l e c te d s e n t e n c e s a r e p r e s e n t e dt o t h e r e a d e r i n r e l a t i o n to t h e f o r m a t t i n g o f t h e t i t l e ,a u t h o r s , h e a d i n g s a n d s u b h e a d i n g s , g r a p h i c s , f o o t n o t e sa n d r e f e r e n c e s . T h e p r o b l e m o f l e n g t h i s t h a t o f d e c i d i n gh o w m a n y w o r d s o r s e n t e n c e s w i l l c o n s t i t u t e t h e f i n a lo u t p u t a c c o r d i n g t o f i x e d r u l e s , v a r i a b l e r u l e s a n d t h r e s h -o l d s o f c o m p a c t n e s s .

    A n i n t e r e s t i n g w a y t o v i e w t h e l e n g t h o f a n a b s t r a c t i st o c o m p a r e i t w i t h i t s s i s te r c a t e g o r i e s - - d o c u m e n t , ti t lea n d i n d e x t e r m . I f t h e s e f o u r c a t e g o r i e s a r e r a n k e d i ni n c r e a s i n g l e n g t h , i n t e r m s o f e i t h e r w o r d s o r b i t s o f i n -f o r m a t i o n , t h e o r d e r b e c o m e s : i n d e x t e r m , t i t l e , a b s t r a c t ,d o c u m e n t . M o r e o v e r , c o n s i d e r i n g t h e l e n g t h s o f t h e s ef o u r c a t e g o r i e s t o w i t h i n a n o r d e r o f m a g n i t u d e , o n e o b -s e r v e s t h e g e o m e t r i c p r o g r e s s i o n 1 , 1 0 , 1 0~ , 10 a . I n ot he rw o r d s , a n a b s t r a c t i s a p p r o x i m a t e l y 1 0 t i m e s t h e l e n g t ho f t h e t i t le a n d a p p r o x i m a t e l y 1 / 1 0 t h e l e n g t h o f t h e d o c u -m e n t . S e e n a s a w h o l e t h i s g e o m e t r i c p r o g r e s s i o n r e p r e -s e n t s t h e i n c r e a s i n g d e g r e e o f c o n d e n s a t i o n o f i n f o r m a -t i o n -r a n g i n g fr o m t h e d o c m n e n t , t h r o u g h a b s t r a c t a n dt i t l e , t o t h e i n d e x t e r m .

    I t i s c u r r e n t l y b e l ie v e d t h a t t h e n o t i o n o f t h e a b s t r a c to f a d o c u m e n t i s si m p l e a n d g e n e r a l l y u n d e r s t o o d , i . e .t h a t t o e v e r y d o c u m e n t t h e re c o r r e s p o n d s o n e a b s t r a c t.T o p u t i t m a t h e m a t i c a l l y , t h e a b s t r a c t A i s a f u n c t i o n o ft h e d o c u n m n t D , i . e . A = f(D ). M o r e o v e r , s i n c e a n a b -s t r a c t i s h e r e a n e x t r a c t , A i s a s u b s e t o f D , i . e . A c D .

    H o w e v e r , o n c l os et ' e x a m i n a t i o n i t m a y b e s e e n t h a t ad o c u m e n t c a n a n d d o e s h a v e m a n y a b s t r a c t s w h i c h d i ff e rf r o m o n e a n o t h e r n o t o n l y in c o n t e n t , l e n g t h a n d f o r m a t ,b u t a l s o i n t h e i r i n t e n d e d u s e . H e n c e , t h e a c t o f a b s t r a c t -i n g i s g o a l - o r i e n t e d . W i t h t h e r e a l i z a t i o n t h a t i t i s m i s -l e a d i n g t o c o n c e i v e o f the a b s t r a c t, w e m u s t n o w s p e a k o fan a b s t r a c t o f a d o c u m e n t . T h u s , a n a b s t r a c t i s a f u n c -t i o n of t h e t w o q u a n t i t i e s , t h e d o c u m e n t D a n d t h e u s e2 6 0 C o m m u n i c a t i o n s o f t h e ACM

    u , i . e . A = f (D, U) .D e s p i t e t h e f a c t t h a t t h e p r e c e d i n g o b s e r v a t i o n i s s i np l e a n d i n t u i t i v e l y a c c e p t a b l e , i t s c o n s e q u e n c e s a r e n e i t h eo f t h e s e . I n : fa e 6 , i t p r o v i d e s t h e f o u n d a t i o n f o r a s o l u ti ot o t h e p r o b l e m o f d e f in i n g a n a u t o m a t i c a b s t r a c t . B e c au so f v a r i o u s a l t e r n a t i v e u s e s , i t is n e c e s s a r y t o d e f i n e "abs t r a c t c o n t e n t " e x p l i c it l y i n t e r m s t h a t a r e u s e o r ie n t edT h i s d e f i n i t i o n m u s t b e e x p r e s s e d b y m a c h i n e c r i t e r i aT o d o t h i s r e q u i r e s d e t a i l e d s p e c i f i c a t i o n f a x ' b e y o n d w h am i g h t i n i t i a ll y h a v e b e e n e x p e c t e d . T h u s , w e s e e k te l i m i n a te a r g u m e n t s o v e r w h a t i s a n a b s t r a c t b y r e p la c inu s e l es s g e n e r a l i t i e s w i t h s p e c i f ic o p e r a t i o n a l c r i te r i a .

    D E F I N I T m N O F A G O O D A B S TR A C T. T h i s p r o b l e m c l o se l y r e l a t e d t o t h e s e c t i o n d e v o t e d t o e v a l u a t i o n o f thq u a l i t y o f a b s t r a c t s . I t i n v o l v e s q u e s t i o n s o f t h e e x i s te n co f a c o m p l e t e l y g e n e r a l d e f i n i ti o n o f a n a b s t r a c t v e rs ut h a t o f m a n y s p e c i f ic d e f i n i t i o n s .

    T h i s l e a d s to t h e c o n c e p t o f a t a i l o r - m a d e a b s t r a c t , it h e s e n s e t h a t a n i n d i v i d u a l w i l l b e a b l e t o s p e c i f y in f u t u ra u t o m a t i c s y s t e m s m o r e a c c u r a t e l y w h a t h e w a n t s i n aa b s t r a c t . M o r e o v e r , t h i s f e a t u r e d i st i n g u i s h e s a u t o m a t ia b s t r a c t i n g f r o m a u t o m a t i c t r a n s l a t i o n . I t i s w i de la c c e p t e d t h a t , a s i de f r o m m i n o r s t y l i s ti c v a r i a t io n s , t h e ri s o n l y o n e t r a n s l a t i o n o f a d o c m n e n t . O n t h e o t h e r h a n di t h a s b e e n s h o w n t h a t a d o c m u e n t c a n h a v e s e v e r ad i f f e re n t a b s t r a c t s . T h i s d i f f e r e n c e i s f u n d a m e n t a l t o t hp r o b l e m o f e v a l u a t i n g t h e q u a l i t y o f a u t o m a t i c a b s t r a c t sa n d s u p p o r t s t h e g e n e r a l f e e li n g t h a t t h e p r o b l e m oe v a l u a t i n g t r a n s l a t i o n s i s c o n s i d e r a b l y e a s i e r t h a n t h ao f e v a l u a t i n g a u t o m a t i c a b s t r a c t s .

    RESEARCH METHODS. P r o b l e m s h e r e c o n c e r n t h e s eo f t e c h n i q u e s t h a t a r e u s e d t o g u i d e t h e r e s e a r c h e f f o ri n a u t o m a t i c a b s t r a c t i n g . F o r e x a m p l e , s u c h p r o b l e m s a re n c o u n t e r e d a s h o w t o i m p r o v e i n t e r m e d i a t e p r o d u c t s b yi t e r a t i v e t e c h n i q u e s , h o w t o s p e c i f y o r d e s c r i b e l i n g u i s t i ca n d s t a t i s t i c a l c l u e s o f t e x t u a l b e h a v i o r , a n d w h a t g e n e r ap r i n c i p l e s a r e t o b e f o l l o w e d a s g u i d e l i n es . A m o n g t h es e v e r a l p r i n c i p l e s , w e s t r e s s o n e t h a t s e e m s d o m i n a n t .

    Principle 1. E mpl oy a me t hod t ha t de t ec t s and use s a l l abstrac ting clues (e.g. of me aning, significance, or gan ization , etc.p r ov i ded by t he au t hor , t he ed i t o r and t he p r i n t e r .T h i s p r i n c ip l e f o c u s e s o n c a p t u r i n g a u t o m a t i c M l y am a n y c l u es a s p o s s ib l e t h a t a r e , e i t h e r c o n s c i o u s l y o r a nc o n s c i o u s l y , p r o v i d e d b y t h e c r e a t o r s o f t h e d o c u m e n tF o r e x a m p l e , t h e s k i l l e d a u t h o r s e l e c t s a n a p p r o p r i a t et i t l e , o r g a n i z e s h i s t h o u g h t s i n d i s t i n c t s e c t i o n s w i t ha p p r o p r i a t e s u b t i t l e s , c o n d e n s e s m u c h i n f o r m a t i o n i n t h ec a p t i o n s o f g r a p h s a n d t a b l e s , a n d u s e s f o o t n o t e s a nr e f e r e n c e s i n r e v e a l i n g w a y s .I t i s i n s t r u c t i v e t o r e g a r d t h e p r o b l e m o f a u t o m a t i c a bs t r a c t i n g i n t h e l i g h t o f s e v e r a l o t h e r p r i n c i p l e s .

    Principle 2. Employ mechanizable cr i ter ia of se lec t ion, i .e . sys tem of rewards for des i red sentences .Principle 3. Employ mechanizable cr i ter ia of re jec t ion, i .ea sys tem of penal t ies fer undes i red sentences .Principle ~. E mp l oy a sys t em o f pa r ame t e r s t ha t c an be adjus ted in order to permi t t a i lor -made abs t rac t s .Principle 5. Employ a sys tem which i s a funct ion of severadis t inct fac tors , such as s ta t i s t ica l , semant ic , syntact iclocat ional , e tc .

    V o l um e 7 / N u mb er 4 / A p r i l, 196

  • 8/7/2019 1964 - Problems in Automatic Abstracting

    3/5

    3. Input ProblemsTH~ CoRPus. Documents taken from a particul ar

    corpus or body of text may have important similaritiesamong one another and i~portant dissimilarities withdocuments taken from a different corpus. Thus, one of thefirst problems in conducting research in automatic ab-stracting is tha t of choosing an appropri ate corpus. Forexample, problems arise due to the subject matter (e.g.sociology vs. mathematics), the publishing medium (e.g.newspaper vs. text books), editors' rules regarding accept-ability for' publica tion (e.g. research papers vs. exposito~Tworks), and the a~tthor's style and compactness of presen-tation.

    PRE-~X)H'~NG. The above remarks place difficultiesin the path of the pre-cditing step since at the presenttime one must resort to keypunching the original docu-:ment. Moreover, even when print readers are availablethey may not be equal to the task. Hence, the text nmstbe manually pre-edited according to a set of pre-editinginstructions. The creation of these instructions is nottrivial because it is precisely at this step tha t a choice maybe made to retain or ignore those critical format clueswhich, once lost, can never be restored by any program-ruing tricks. The pre-editing instructions must coverproblems of formatting, graphics, special symbols, spe-ial alphabets, footnotes and references.

    KEYPUNCHING. Despite the fact t ha t k eypunch oper-ators quickly adapt to new problems, it is necessary to

    TitleAuthorsSubtitlesCaptions'FootnotesBibliography

    strip

    P e-edit entire e x t

    Keypunch pre-edlted e x t

    Store document ]in machine emory

    A ss ig n ~ C ~ e t ~ n e ; ~ sd i n a t e s

    Body of t e x t

    Compute wor d scores

    C o m p u t e s e n t e n c e s c o r e

    , ,

    Apply truncation u le 1!

    Reorder e n t e n c e sin natural e q u e n c emerge

    I Print abstract I

    i : V o l u t n e 7 / N u m b e r 4 / A p r i l , 1 .9 64

    Frequencyc o t m t words I

    ,_ score Iscsc ~ Stred ]dictionary

    prepare a set of keypunch instructions. These instructionsare based upon the pre-edit instructions and are subjectto the boundary conditions imposed by available inputand output hardware. They must contain rules of suffi-cient generality to cover a wide variety of textual situa-lions and should also be supported by appropriate ex-amples. The purpose of these keypunch instructions is tominimize decision making by the keypunch operator.4 . C o m p u t e r P r o b l e m s

    SYSTEM ASPECTS. By "system aspect s" we refer tothe functional specification of each of the steps in the auto-matte abstracting system. In general these steps are:pre-editing the textual input, assigning proper sequencenumbers to successive elements of the text, weighting thetextual factors according to some scheme, scoring thetext sentences by combining these weights, ranking thesentences in decreasing magnitude, truncating this de-creasing sequence at some threshold and outputing theset of sentences (Fig. 2).

    In accordance with Principle 2, instead of using theloaded words "topic" or "significant," as has often beendone, to refer to the sentences chosen out of the originalarticle for an abstract, a neutral name might be chosen,such as "A-sentences" for' those t hat are considered ac-ceptable. The use of this notation for the chosen sentenceslends itself very nicely to fur ther operations. For instance,think of the set A~as being those sentences chosen byfactor S~. Similarly, having another group of sentenceschosen by factor Sj, this second set of sentences is de-noted by Aj. It would then be possible to consider sen-tences selected by factor (S~ or Sj) or by factor (S~ andSy). An extension of this notion would be to consider theset of sentences As where vector S is a vector of selectionfactors. If T is another vector of different selection factors,all the sentences chosen by S and by T could then becompared.

    Another use of the A-notation would be to denote thebody of sentences chosen by different stages in the selec-tion process, assuming that it is desired to break the selec-tion process up into stages. If this were done, then A0could be considered to be the entire original article, A1to be those sentences chosen by the first stage of the selec-tion process, d2 to be those sentences chosen by the sec-ond stage, and so on. Thus, A~ is the result of applyingthe transforma tion T~ to Ao , i.e. A1 - TffAo), A2 isthe result of applying transformation T2 to the set ofsentences A~, i.e. A2 = Ty(AD, etc. Similar uses of thi snotion will probably suggest themselves.It is possible to apply various kinds of selection criteria.It might be desired, for instance, to select by the first stageselection process (producing the set of A~ of sentences) allthe sentences which were not rejected by some particularrejection criterion. ~(Note the use of rejection criteria hereas opposed to the acceptance criteria customarily use'&)Another candidate for first-stage selection might be theuse of only nonstatistical criteria for the first stage ofselection, followed by only statistical criteria for the sec-ond stage, or the reversal of the order of these two steps.

    C o m m u n i c a t i o n s o f t h e A C M 261

  • 8/7/2019 1964 - Problems in Automatic Abstracting

    4/5

    Certain sentences might be chosen as being amongthose which must be included in the abstract. In fact, thetitle of the article may be one of these. Such sentencescould be selected and then set aside in an untouchablebody of sentences so that they could not. be rejected byany further processing. The selection process could con-sist of repetitions of the same kind of transformations onthe body of sentences, and the process would end whenA~+, = A,, ; that is, when the sequence of reductionsconverged to a minimal set of sentences. It would benecessary, of course, for such a set of reduction processesto insure that not all sentences were eliminated!

    In accordance with Principles 3 and 4, one can view, in astatistical framework, the problein of selection of sen-fences for an abstract as the problem of selecting the rightanswers versus wrong answers. By "right answers" wemean those sentences which one would want in an ab-stract, and by "wrong answers" those which one wouldnot want. Clearly, in any article there are sentences whichshould be included in every abstract and there are sen-tenees which should not be included in any abstract. Thefact that there might be a large number of indeterminateanswers is not the issue at the moment. The problem ofselecting sentences for an abstract is that of holding thenumber of false answers to a minimum while selectingas many as possible of the right answers..In other words.this is the familiar statistical problem of ttTing to placethe level of acceptance at such a point that the desirednumber of right answers is chosen and, at the same time,as many as possible of the wrong answers are rejected.

    In accordance with Principle 5, one way of selecting'sentences for an abst rac t is by means of various fac tors,and combining them to form a single factor T. Supposethe &'s denote semantic factors, syntactic factors, loca-tional factors, editorial factors and so on. To each S~ asso-ciate a weight w~, and form the linear combinationsT = ~i w~Si of the products w,:Si. The parameters w;then can be adjusted to reflect the relative importance ofthe factors S~.

    An extension of this idea is that different sets of weights,i.e. different vectors w of weights w~ could be formed,with a different column vector of weights for differentjournals. Abstracting an article froln a given journal thenwould have as one of its steps the selection of the properset of weights w for use in an otherwise general program.A possibility deriving from this approach is that rowaverages could be taken of the components of all thesecolumn vectors, and the vector of row averages could beused as a reasonable weighting scheme for abstracting anunfamiliar journal.

    pROGRAMMING. Problems here concern the n ature ofindividual routines and subroutines. For example, it isuseful to separate the total system into three major oper-ating programs: edit program, dictionary program, andabstracting program. In addition to these operatingprograms various research programs must be written.Based upon the theoretical model or structure underlyingthe abstracting system, decisions must be made as to thebest method of using a mixture of computing routines and262 C o m m u n i c a t i o n s o f t h e A C M

    table-lookup routines. The at)straeting system shouldprovide for the readjustment or modification of the nu-merous parameters that are incorporated in the programsor that are stored in the tables. This allows discoveriesmade during periods of research to be easily transfornmdinto improvements in the operating programs.

    TABLES. The s u c c e s s o f a r t automatic abstractingsystem depends materially upon two different aspects.The first aspect concerns the general system or method ofabstracting as given by the sequence of progranuningoperations. The second concerns the specifie entries ofthe several stored tables. An example of a store d table is aglossary or dictionaw of severM thousand words that acteither as cue words that signM the importance of a sen-tence, or as stigma words that signal the unimportanceof a sentence for purposes of abstracting. Sueh a tablemay include, in addition to the word, a code indicating itsgrammatical or semantic function, its importance weight,etc. Another kind of table may be set aside to retain thetitle, author, section headings, footnotes and referencesawaiting use during the final out put step of the program. Athird possibility is the inclusion of a table of synonymsand antonyms which will handle some semantic problemsvia the thesaurus nmthod. In any case the programmer ispresented with the sizable problem of juggling sectionsof the internal memory in order to aeeommodate the inputtext, the program and the tables.5 . O u t p u t P r o b l e m s

    HARDWAI~E. As in the case of input, we are confrontedwith problems imposed by output hardware. Despite thefact that high speed printers are available, the mostserious diificulty is that of an over-restricted number oftype fonts. This forces a replacement of strings of un-usual symbols (e.g. mathematical and chemical) by thefew conventional symbols available at the output printer.Moreover, important segments of textual symbols arcalso forced to be replaced by only one or two such eon-ventional output symbols.

    A second problem here is that of composing. Presentoutput hardware provides little leeway in the compositionof the textual output.

    FORMAT. The forma t of the classical, human-pre paredabstract comprises title, author and a paragraph of con-nected text. However, since present automatic abstractsare in fact nothing more than automatic extracts, it isdesirable to correct the generally disjointed sequence ofselected sentences by other devices. This problem can bepartially solved by capturing in an automatic abstractthose informative features of structure found in sectionheadings and subheadings, together with footnotes andreferences.

    D*SSmaINATmN. Despi te the fae t tha t the problem ofdissemination of automatic abstracts has received littleattention in the literature, it nevertheless will play animportant part in the general aeeeptability and utility ofautomatic abstracts, Both theoretical and practical studiesmust be made to ascertain how the requestor communi-cates with the abstracting systern, how the system collates

    V o l u m e 7 / N u m b e r 4 / A p r i l , 1 96 4

  • 8/7/2019 1964 - Problems in Automatic Abstracting

    5/5

    r e q u e s t s , a n d h o w t h e s y s t e m p r o d u c e s a n d d is -b es m u l t i p l e c o p i e s o f t h e a b s t r a c t s t h r o u g h a: m e d i u m a n d c o m m u n i c a t i o n c h a n ne l.l u a t i o n P r o b l e m s

    A C C EP T AB IL IT Y . T h e f i r s t p r o b l e m o f e v a l u a t i o n c o n -c e rn s th e a c c e p t a b i l i t y o r u t i l i t y o f t h e f i n a l p r o d u c t .T h i s c u s t o m a r i l y r e q u i r e s t h a t s o m e q u a l i t a t i v e o r q u a n -t i t a t i v e c o m p a r i s o n b e m a d e b e t w e e n a n a u t o m a t i c a b -s t r a c t a r id a n " i d e a l " h u m a n a b s t r a c t . H o w e v e r , i t i s o fi n t e r e s t t o n o t e t h a t r e p e a t e d e x p e r i m e n t s c o n d u c t e da m o n g h u m a n a b s t r a e t o r s h a v e r e v e al e d t h a t t h e l in e a rc o e f f i c i e n t o f c o r r e l a t i o n a m o n g h u m a n s v a r i e s f r o m . 2t o . 4 , e v e n w h e n t h e y h a v e o p e r a t e d u n d e r m o d e r a t e l yw e l l - d e f i n e d a b s t r a c t i n g r u l e s . T h i s d i s a p p o i n t i n g r e s u l t ,a l t h o u g h n o t t o t a l l y u n e x p e c t e d , i s d u e i n p a r t t o t h ef a c t t h a t t h e c o r r e l a t i o n c o e f f i c ie n t i s n o t t h e b e s t m e a s u r e .F o r e x a m p l e , i f t w o i n d i v i d u a l s h a p p e n t o s e l e c t d i f f e re n t ,b u t e o i n t e n s i o n a l s e n t e n c e s , t h e n t h e c o r r e l a t i o n c o e f f i c i e n tw i l l n a t u r a l l y b e l o w . T h e p r o b l e m o f w h a t s e n t e n c e s o f ad o c u m e n t a r e c o i n t e n si o n a l i s so l v a b le o n l y b y f u r t h e rs e m a n t i c r e s e a r c h w h i c h , u n f o r t u n a t e l y , h a s y e t t o b ed o n e. T h e g e n e r a l l y p o o r i n t e r h u m a n a g r e e m e n t t e n d s t of o rc e u s i n t h e d i r e c t i o n o f a r b i t r a r i l y , b u t u n i f o r m l y ,d e fi ni n g w h a t a n a b s t r a c t i s a n d t h e n m e c h a n i z i n g t h e s ep r o p e r t i e s .

    C O S T . T h e s e c o n d p r o b l e m o f e v a l u a t i o n i s t h a t o fs y s t e m c o s t i n d o l l a r s a n d i n t i m e . A t p r e s e n t , i n s u f f i c i e n tc o n c r e t e d a t a h a v e b e e n c o l l e c t e d t o p e r m i t r e l i a b l e e s t i -m a t e s o f c o s t p e r d o c u m e n t a n d e s t i m a t e s o f b o u n d s o nt h e e r r o r f o r a n o p e r a t i n g s y s t e m . H o w e v e r , s u c h i n f o r m a -t i o n d o e s e x i s t f o r r e s e a r c h s y s t e m s t h a t d o n o t c l a i mo p e r a t i o n a l p e r f e c t i o n .7 . R e m a r k s

    I n s p i t e o f t h e p r o b l e m s h i g h l i g h t e d a b o v e i t i s f e l t t h a ta u t o m a t i c a b s t r a c t s c a n b e d e fi n ed , p r o g r a m m e d , a n dp r o d u c e d i n a n o p e r a t i o n a l s y s t e m s o a s t o c o m p l e t e w i t hp r e s e n t h u m a n a b s t r a c t i n g . T h e b a s i s f o r t h i s o p t i m i s mis t h e f a c t t h a t s e v e r a l a u t o m a t i c a b s t r a c t i n g s y s t e m s a r ep r e s e n t l y p r o d u c i n g a b s t r a c t s , r e g a rd l e s s o f h o w u n -s o p h is t ic a t ed t h e y m a y b e . T h a t t h e f u t u r e a u t o m a t i ca b s t r a c t s w i ll b e di f fe r e n t b o t h i n c o n t e n t a n d a p p e a r a n c ef r o m c l a s s ic a l o n e s s e e m s c l e a r. H o w e v e r , i t is n o t e x p e c t e dt h a t u s e r s w i ll b e m a t e r i a l l y i n co n v e n i e n c e d b y h a v i n g t oa d a p t t o a n e w f o r m a t . F u r t h e r r e s e ar c h n e e d s t o b e p e r -f o r m e d i n t h i s a r e a o f l i n g u i s t i c d a t a p r o c e s s i n g , b u t t h et r u e n a t u r e o f t h i s p r o b l e m i s b e i n g s e e n c l e a r l y f o r t h ef i r s t t i me .RECEIVED JULY, 1963; REVISED SEPTEMBER, 1963.

    R E F E R E N C E S1. LUHN,H. P . The au tomat ic c rea t ion of l i t e ra ture abs t rac t s .IBM J . Res . Deve lop . 2, 2 (April 1958).2 . F inM repor t on the s tudy of au tom at ic abs t rac t ing . C107-1U12.Thonipson Ramo Wooldr idge Inc . , Canoga Park , Ca l i f . .Sept. 1961.3. EDMUNDSON, H. P . , AND WYLLYS, R. E. Automat ic abs t rac t -ing and indexing survey and recom mendat ions . C o m m .A C M ~4, 5 (196 1) 226 -234 .

    kETTERS~Continued f ro m pa ge 2 0 3l anguage documents , p rograms , t ex t s o f t e l egraphic messages ,e t c . ) to per form a var i e ty of func t ions (ver i f i ca t ion of indexing ,vocabula ry , au tomat ic t r ans la t ion , l abe l checking , e t c . ) .Despi t e Mr . Radford ' s asse r t ion tha t h i s sugges t ions mee t"co ld ly sc ien t i f i c r equi rements , " the re seems no doubt tha t aru le (3(c ) ) which conta ins the phrase "pronounc ia t ion i s mademore obvious" i s an inv i t a t ion to incons i s t ency . Nor i s i t c l ea rwh a t a n "accep ted" combina t ion ( ru le 2) n f igh t be . Unfor tu-na te ly for those of us who have poor in tu i t ions about acceptedcombina t ions , the example g iven for ru le 2 occur red a t the e n dof the l ine in the t ex t o f Mr . Radford ' s no te and was there foreh y p h e n a t e d ! Ho w d i d i t a p p e a r i n t h e ma n u s c r i p t ?T h e p o i n t i s s imp l y t h a t u n d e r o u r p r e s e n t s t a n d a r d s f o rhyphena t ion and the i r use by humans , incons i s t enc ies do occur .I n a n y p r o d u c t i o n o p e r a t i o n o n a c o mp u t e r t h e s e g e n e r a t ee i ther f a i lu res in the match ing opera t ion or r equi re r e l a t ive lyc o mp l i c at e d p r o g r a mmi n g t r i ck s t o b r in g t o g e t h e r t h e a l t e r n a -tive spellings.The d i f f i cu l ty wi th hyphens i s tha t the re i s no s ing le waytha t they can be used wi th cons is t ency . Nor , fo r tha t ma t t e r ,can one s t a t e unequivoca l ru les for the use of in te rvening spaces .Miss Grems ' sugges t ion to combine t e rms has a t l eas t the v i r tueof cons i s t ency . The fac t tha t i t saves a f ew charac te r s here andthere is incidental .

    T . R. SAVAGEDocumentation, Inc.4833 Rugby Rd .Bethesda 1~, Md.

    E m p i r i c a l B o u n d s f o r B e s s e l F u n c t i o n sDe a r E d i t o r :Thi s no te i s concerned wi th the a r t i c l e publ i shed in Com-munications 1, 5 (May, 1958) , en t i t l ed "Note on Empi r i ca lBo u n d s f o r Ge n e r a t i n g Be s s e l F u n c t i o n " b y J a me s B . Ra n d e l sa n d Ro y F . Re e v e s .

    &,*(X)For Jn(X) = K J , * ( X ) , r ead J ~ (X ) = K ;for K = Jo*(X) + 2 ~ .I~,~(X),n = l ~/2r e a d K = Ju*(X) - + 2 ~ J ~ ( X ) ;

    n = l

    r e a d Y o (X ) ! I J o ( X ) ( ~ / - - f - ln ~ ) - 2 : ~ ( - 1 ) ~ 2 '* (X !W i t h t he se r e v i s io n s B r u ce L e m m a n d I h a v e d e vel op e d a

    d o u b l e p r e c is i o n ( I B M 7 0 9 0 ) co d e t h a t s u p p o r t sb se r va t i o n 2 i nthe conclusion sect ion; i .e ., for al l values of J , , ( X ) an d Y , , (X )wh ere 0 =< n _< 9, and 0.1 -< X N 25 tile generated values agreedwi th those in the Br i t i sh Assoc ia t ion Table of Bessel Functionsto a m a x i m u m error of 1 in the s ixth significant dig i t wheneverthe so lu tion was grea te r than 0 .1 . Moreover , us ing the H arv ardT a b l e s o f J ~ ( X ) this conclusion is valid fo r 0.1 $ X =< 100. W hen -ever the solut ion is less than 0.l . the answer suffers a greater lossin significant figures. R. L. PEXTON

    University of CaliforniaLawrence Radiation LaboratoryLivermore, California

    V o l u m e 7 / N u m b e r 4 / A p r i l , 1 964 C o m m u n i c a t i o n s o f t h e A C M 2 6 3