Upload
tim-ostler
View
2.567
Download
0
Embed Size (px)
DESCRIPTION
A study of the discourse-analytical and other textual criteria people use to select words when they are highlighting a text for others.
Citation preview
InformationInformation Highlighting Highlighting
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
Tim OstlerTim OstlerCognitive ArchitectureCognitive ArchitectureAnaphora Ltd Anaphora Ltd
[email protected]@cogarch.com
InfoVis’99InfoVis’99London 16 July 1999London 16 July 1999
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisation Highlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
11 HighlightersHighlighters
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 OriginsOrigins
2 2 Cognitive functionCognitive function
33 Highlighting for Highlighting for othersothers
Highlighters 1/3Highlighters 1/3 OriginsOrigins
1960s: use of 1960s: use of yellow fibre or felt pensyellow fibre or felt pens to highlight text to highlight text begins in the USAbegins in the USA
1971: 1971: Schwan-StabiloSchwan-Stabilo of West Germany launches first of West Germany launches first fluorescentfluorescent highlighter penhighlighter pen
Highlighters 2/3Highlighters 2/3 Cognitive functionCognitive function
Highlighting Highlighting feelsfeels as though it helps revising, perhaps as though it helps revising, perhaps by encoding or by encoding or primingpriming material for incorporation into material for incorporation into long-term memorylong-term memory
Partly confirmed by research: Hult et al. (1984) found Partly confirmed by research: Hult et al. (1984) found that note-taking does involve that note-taking does involve semantic encodingsemantic encoding
Also used to mark up a text for Also used to mark up a text for selective attentionselective attention of of another personanother person
ThisThis function chosen for study, because of clear function chosen for study, because of clear application to application to information overloadinformation overload
Conducted Conducted user studyuser study to define suitable heuristics for to define suitable heuristics for text selectiontext selection
Highlighters 3/3Highlighters 3/3 Highlighting for othersHighlighting for others
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisation Highlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
22 Highlighting as information visualisationHighlighting as information visualisation
11 Syntax highlightingSyntax highlighting
2 2 SeeSoftSeeSoft
33 TextLightTextLight
44 Readers vs. AuthorsReaders vs. Authors
Highlighting as info visualisation 1/4Highlighting as info visualisation 1/4 Syntax Syntax highlightinghighlighting
Highlighting can be seen as a means of visualising the Highlighting can be seen as a means of visualising the logical or logical or conceptual structureconceptual structure of a text of a text– Enhances understanding of textEnhances understanding of text– Guides eye to most important passages Guides eye to most important passages
Principle is widely demonstrated by the syntax highlighting in Principle is widely demonstrated by the syntax highlighting in text-editors for programmerstext-editors for programmers – UsefulUseful: need to : need to visualize logical structurevisualize logical structure acute acute– EasyEasy: programming languages offer finite and : programming languages offer finite and precise set of cuesprecise set of cues for for
editors to detect and coloureditors to detect and colour
Highlighting as info visualisation 2/4Highlighting as info visualisation 2/4 SeeSoftSeeSoft
One of a suite of One of a suite of text structure text structure visualisationvisualisation tools from team tools from team led by Stephen Eick at Lucent led by Stephen Eick at Lucent (formerly Bell) Laboratories(formerly Bell) Laboratories
Each line of code reduced to aEach line of code reduced to a line of single pixel thicknessline of single pixel thickness, , coloured according to a range coloured according to a range of user-specified criteriaof user-specified criteria
Thousands of lines of codeThousands of lines of code can be displayed on the screen can be displayed on the screen at onceat once
Highlighting as info visualisation 3/4Highlighting as info visualisation 3/4 TextLightTextLight
TextLightTextLight– Conceived as a tool to Conceived as a tool to
Detect certain attributes of a text’s cognitive structureDetect certain attributes of a text’s cognitive structure Encode them in visual, non-lexical formEncode them in visual, non-lexical form Superimpose them in place on the corresponding textSuperimpose them in place on the corresponding text
– Like a GIS, can reveal attributes of its data set that would Like a GIS, can reveal attributes of its data set that would otherwise be obscured, throwing the underlying structure into otherwise be obscured, throwing the underlying structure into high reliefhigh relief
Highlighting as info visualisation 4/4Highlighting as info visualisation 4/4 Readers vs. Readers vs. authorsauthors
For For readersreaders,, no benefitsno benefits from using from using different coloursdifferent colours for for different categories of "new" informationdifferent categories of "new" information
But for But for authors and text analystsauthors and text analysts extending TextLight to identify extending TextLight to identify text attributes is as valuable as text attributes is as valuable as colouring different CAD layerscolouring different CAD layers to architectsto architects
Revealing the pattern of distribution of attributes such as Revealing the pattern of distribution of attributes such as readability or levels of completion like a readability or levels of completion like a knowledge discovery knowledge discovery system for authorssystem for authors
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisation Highlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
33 Past studies on visual cueingPast studies on visual cueing
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 Judging importanceJudging importance
2 2 Choosing words 1Choosing words 1
33 Choosing words 2Choosing words 2
44 Core contentCore content
55 How many words?How many words?
66 Large varianceLarge variance
Herbert Dreyfus: is the ability to tell the important from the Herbert Dreyfus: is the ability to tell the important from the unimportant a unimportant a fundamentally human fundamentally human cognitive operation?cognitive operation?
Perhaps, butPerhaps, but in some genres in some genres widespread agreement on widespread agreement on signalssignals for different stages in a discourse for different stages in a discourse
So while we can’t tell what So while we can’t tell what seemsseems important for important for everyevery person, person, we can assess whatwe can assess what is being is being presentedpresented as important as important
Past studies 1/6Past studies 1/6 Judging importanceJudging importance
Weakness of all research: no formal rules on Weakness of all research: no formal rules on which textwhich text to cue to cue– Foster (1979): 26 students and lecturers given 3400-word text and asked to underline Foster (1979): 26 students and lecturers given 3400-word text and asked to underline
sentences containing sentences containing key ideaskey ideas author trying to put over author trying to put over
– Half subjects told not to underline more than 16 sentences, half not more than 8Half subjects told not to underline more than 16 sentences, half not more than 8
– First case: 213 selections spanned 80 sentences, with only 9 sentences selected by 6 First case: 213 selections spanned 80 sentences, with only 9 sentences selected by 6 or moreor more
– Second case: 102 selections distributed over 52 sentences, with only 2 selected by 6 Second case: 102 selections distributed over 52 sentences, with only 2 selected by 6 or moreor more
Foster’s conclusion: difficult to identify sections for cueing Foster’s conclusion: difficult to identify sections for cueing
Past studies 2/6Past studies 2/6 Choosing words 1Choosing words 1
Other experiments Other experiments – Klare et al (1955) cued Klare et al (1955) cued single wordssingle words
– Dearborn et al (1949) emphasised word carrying the Dearborn et al (1949) emphasised word carrying the "peak "peak stress"stress" in a sentence (did not describe how word selected) in a sentence (did not describe how word selected)
– Crouse & Ildstein (1972) cued Crouse & Ildstein (1972) cued statementsstatements or or sentencessentences
Past studies 3/6Past studies 3/6 Choosing words 2 Choosing words 2
Past studies 4/6Past studies 4/6 “Core” content“Core” content
Most Most specificspecific suggestions by Hershberger & Terry (1965) suggestions by Hershberger & Terry (1965)– ““Core” content made up 1/3 of total text length: Core” content made up 1/3 of total text length:
New key wordsNew key words
Familiar key wordsFamiliar key words
Key statementsKey statements
Basic core statementsBasic core statements
Key examplesKey examples
Rephrasing of key statementsRephrasing of key statements
Crouse & Ildstein (1972)Crouse & Ildstein (1972)– DensityDensity of cued material influences its effect of cued material influences its effect
Foster (1979)Foster (1979)– Optimal proportionOptimal proportion of text to be highlighted of text to be highlighted still not establishedstill not established
Past studies 5/6Past studies 5/6 How many words?How many words?
Fowler & Barker (1974)Fowler & Barker (1974)– Pointed to the Pointed to the large variancelarge variance (4% to 32%) observed in the proportion of (4% to 32%) observed in the proportion of
text highlighted by members of the test group who were asked to highlight text highlighted by members of the test group who were asked to highlight for themselvesfor themselves
Rickards & August (1975)Rickards & August (1975)– Asked to highlight passages of structural importance, test subjects all Asked to highlight passages of structural importance, test subjects all
chose passages that Rickards & August considered chose passages that Rickards & August considered relatively relatively unimportantunimportant
Past studies 6/6Past studies 6/6 Large varianceLarge variance
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisation Highlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
44 User studyUser study
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 Experimental Experimental procedureprocedure
2 2 Analytical procedure Analytical procedure
33 Analysis of resultsAnalysis of results
44 ObservationsObservations
11 subjects provided with an 1111-word article from the financial 11 subjects provided with an 1111-word article from the financial times IT supplement, with instructions to imagine they were times IT supplement, with instructions to imagine they were corporate librarianscorporate librarians identifying the identifying the key pointskey points in an article for a in an article for a board memberboard member
Questionnaire sought: Questionnaire sought: – Subjects’ Subjects’ past experiencepast experience of highlighting of highlighting– CriteriaCriteria for text selection for text selection– At what pointsAt what points made their selection made their selection– Other commentsOther comments
User study 1/4User study 1/4 Experimental procedureExperimental procedure
ArticleArticle input into spreadsheet as input into spreadsheet as left axisleft axis of spreadsheet spanning 1111 rows of spreadsheet spanning 1111 rows (one word per row)(one word per row)
Along the Along the toptop of the spreadsheet entered the of the spreadsheet entered the attributesattributes for each word (36 for each word (36 categories) categories)
For each word For each word probability of lying in a highlighted passageprobability of lying in a highlighted passage given a given a decimal figure between 0 and 1decimal figure between 0 and 1
All other parameters All other parameters rebasedrebased to fall between 0 and 1 to fall between 0 and 1
Gave Gave correlationcorrelation of any given parameter with the probability that a word fell of any given parameter with the probability that a word fell within a within a highlighted highlighted group of wordsgroup of words
User study 2/4 User study 2/4 Analytical procedureAnalytical procedure
Results show Results show wide variancewide variance in in numbernumber of words highlighted of words highlighted– Minimum of 50 (4.5%)Minimum of 50 (4.5%)
– Maximum of 396 (35.64%) Maximum of 396 (35.64%)
– (Fowler & Barker 1974: 4-32%)(Fowler & Barker 1974: 4-32%)
Marked difference between Marked difference between malemale and and femalefemale subjects subjects– Males averaging 15%Males averaging 15%
– Females 25.5%Females 25.5%
Little correlationLittle correlation between between part of speech/syntactic rolepart of speech/syntactic role and and probability of highlightingprobability of highlighting
Noticeable association with Noticeable association with longer wordslonger words
User study 3/4User study 3/4 Analysis of resultsAnalysis of results
None of subjects made highlighting decisions before having read None of subjects made highlighting decisions before having read at least one paragraphat least one paragraph
Large majority (70%) Large majority (70%) delayeddelayed highlighting until whole passage highlighting until whole passage readread
Conclusion: decisions made at a Conclusion: decisions made at a discourse-analyticaldiscourse-analytical and not a and not a strictly strictly linguisticlinguistic level level
User study 4/4User study 4/4 ObservationsObservations
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisation Highlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
55 HeuristicsHeuristics
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 Correlation with average Correlation with average choicechoice
2 2 Key correlationsKey correlations
33 Best heuristicsBest heuristics
44 Highlighting by humans 1Highlighting by humans 1
55 Highlighting by humans 2Highlighting by humans 2
66 Highlighting by best Highlighting by best heuristicsheuristics
77 Performance of best Performance of best heuristicsheuristics
Average correlation between any Average correlation between any one person’sone person’s highlighting decisionshighlighting decisions and the scores for and the scores for probability probability of given words being highlightedof given words being highlighted was was 0.440.44
For any For any individual wordindividual word probability varied between probability varied between 0 0 andand 0.83 0.83, offering clear guidelines for assessing any , offering clear guidelines for assessing any trial selection criteria trial selection criteria
Heuristics 1/7Heuristics 1/7 Correlation with average choiceCorrelation with average choice
0 0.1 0.2 0.3 0.4 0.5 0.6
Combination of best criteria
First statement in discourse segment
Proximity to start of sentence
Solution stage
First statement in quote
Present tense
List status
Proximity to start of paragraph
Heuristics 2/7Heuristics 2/7 Key correlationsKey correlations
Most successful heuristics:Most successful heuristics:
11 Word should be part of Word should be part of first statement in a discourse first statement in a discourse segmentsegment
22 Word should be part of first statement in Word should be part of first statement in any quote not an any quote not an immediate continuation of a previous quoteimmediate continuation of a previous quote
33 Word should be part of a Word should be part of a listlist
44 Word should be part of Word should be part of “solution”“solution” stage stage
Heuristics 3/7Heuristics 3/7 Best heuristicsBest heuristics
Heuristics 4/7Heuristics 4/7 Highlighting by humans 1Highlighting by humans 1
Verity was ideallypoised to help users copewith the enormousvolume of informationon the Internet. “Weknew it was important,so we sought outNetscape as a partner,”says Mr Courtot.“If you know what you
are looking for, you candescribe it in words andTopic will find itquickly,” explains MrCourtot. “The problem isif you don't know whatinformation you want,because you don't havethe words to describe it.“Information used to be
found by askingcorporate librarians togather it together. Theydid this on an iterativebasis, as they searched
through catalogues andindexes, refining theirqueries as they worked toimprove the quality ofthe result. This doesn'twork any more becausethere is too muchinformation.”Organisations now have
to give direct access toinformation to usersbecause they can nolonger afford to getspecialist corporatelibrarians to search.Software vendors such asVerity give end-userstools to navigate throughthe informationavailable, withoutreading it, and guidethem down a path bygiving them choices,such as “Do you wantEurope or North
America?"Mr Courtot is keen to
point out that Topic is farmore effective than thepopular Internet searchengines because it readseach document andtherefore returns a moreaccurate answer toqueries. “The searchengines on the Web tendto return thousands ofirrelevant answers,” heexplains. “If you type in“President of the UnitedStates”, Lycos andYahoo! will give you10000 answers and thefirst few may not evenmention Bill Clinton.”
Areas where Areas where probability of probability of highlighting is highlighting is greater thangreater than 0.4 0.4
Heuristics 5/7Heuristics 5/7 Highlighting by humans 2Highlighting by humans 2
Verity was ideallypoised to help users copewith the enormousvolume of informationon the Internet. “Weknew it was important,so we sought outNetscape as a partner,”says Mr Courtot.“If you know what you
are looking for, you candescribe it in words andTopic will find itquickly,” explains MrCourtot. “The problem isif you don't know whatinformation you want,because you don't havethe words to describe it.“Information used to be
found by askingcorporate librarians togather it together. Theydid this on an iterativebasis, as they searched
through catalogues andindexes, refining theirqueries as they worked toimprove the quality ofthe result. This doesn'twork any more becausethere is too muchinformation.”Organisations now have
to give direct access toinformation to usersbecause they can nolonger afford to getspecialist corporatelibrarians to search.Software vendors such asVerity give end-userstools to navigate throughthe informationavailable, withoutreading it, and guidethem down a path bygiving them choices,such as “Do you wantEurope or North
America?"Mr Courtot is keen to
point out that Topic is farmore effective than thepopular Internet searchengines because it readseach document andtherefore returns a moreaccurate answer toqueries. “The searchengines on the Web tendto return thousands ofirrelevant answers,” heexplains. “If you type in“President of the UnitedStates”, Lycos andYahoo! will give you10000 answers and thefirst few may not evenmention Bill Clinton.”
Areas where Areas where probability of probability of highlighting is highlighting is greater thangreater than 0.33 0.33
Heuristics 6/7Heuristics 6/7 Highlighting by best heuristicsHighlighting by best heuristics
Verity was ideallypoised to help users copewith the enormousvolume of informationon the Internet. “Weknew it was important,so we sought outNetscape as a partner,”says Mr Courtot.“If you know what you
are looking for, you candescribe it in words andTopic will find itquickly,” explains MrCourtot. “The problem isif you don't know whatinformation you want,because you don't havethe words to describe it.“Information used to be
found by askingcorporate librarians togather it together. Theydid this on an iterativebasis, as they searched
through catalogues andindexes, refining theirqueries as they worked toimprove the quality ofthe result. This doesn'twork any more becausethere is too muchinformation.”Organisations now have
to give direct access toinformation to usersbecause they can nolonger afford to getspecialist corporatelibrarians to search.Software vendors such asVerity give end-userstools to navigate throughthe informationavailable, withoutreading it, and guidethem down a path bygiving them choices,such as “Do you wantEurope or North
America?"Mr Courtot is keen to
point out that Topic is farmore effective than thepopular Internet searchengines because it readseach document andtherefore returns a moreaccurate answer toqueries. “The searchengines on the Web tendto return thousands ofirrelevant answers,” heexplains. “If you type in“President of the UnitedStates”, Lycos andYahoo! will give you10000 answers and thefirst few may not evenmention Bill Clinton.”
KEYKEY
First statement in a First statement in a quotequote
““Solution” stageSolution” stage
First statement in a First statement in a discourse segmentdiscourse segment
Best combination of heuristics produced correlation with actual Best combination of heuristics produced correlation with actual highlighting probability of highlighting probability of 0.560.56 (average of 0.43(average of 0.43 for test subjects)for test subjects)
In other words, selecting text according to specified criteria In other words, selecting text according to specified criteria achieved a correlation that was achieved a correlation that was greater than all but one of the greater than all but one of the test subjects achievedtest subjects achieved and considerably higher than the and considerably higher than the averageaverage
BUT: challenge is to BUT: challenge is to identify the markersidentify the markers denoting relevant denoting relevant features in a discourse features in a discourse
Heuristics 7/7 Heuristics 7/7 Performance of best heuristicsPerformance of best heuristics
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisationHighlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
66 Identifying discourse markersIdentifying discourse markers
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 SegmentsSegments
2 2 StatementsStatements
33 Solution stagesSolution stages
44 Stage labelsStage labels
55 Cue words as signalsCue words as signals
6 “Solution” signals6 “Solution” signals
Identifying discourse markers 1/6Identifying discourse markers 1/6 SegmentsSegments
Different means of discourse segmentation beyond the Different means of discourse segmentation beyond the scope of this paperscope of this paper
Segments most often coincide with beginning of Segments most often coincide with beginning of paragraphs, and normally paragraphs, and normally begin with a propositionbegin with a proposition or or assertionassertion
Most effective technique found: Most effective technique found: select opening select opening statementstatement in its simplest form in its simplest form
Identifying discourse markers 2/6Identifying discourse markers 2/6 StatementsStatements
Sometimes preceded or followed by a Sometimes preceded or followed by a coherence coherence relationrelation — a question or other linguistic feature that — a question or other linguistic feature that makes proposition’s relevance to the preceding text makes proposition’s relevance to the preceding text clear clear
Following text tends to Following text tends to fill out detailsfill out details and/or provide and/or provide supporting evidencesupporting evidence for the assertion for the assertion
Identifying discourse markers 3/6Identifying discourse markers 3/6 Solution stageSolution stage
““Situation-problem-solution-evaluationSituation-problem-solution-evaluation”” structure structure– NarrativeNarrative structures structures
Boy meets girl – boy loses girl –Boy meets girl – boy loses girl – boy regains girl boy regains girl – boy & girl live – boy & girl live happily ever afterhappily ever after
– Feature articlesFeature articles Dogs make great pets – however they can get fleas – Dogs make great pets – however they can get fleas – Winalot have Winalot have
now launched a new anti-flea dog foodnow launched a new anti-flea dog food – owners have declared it – owners have declared it a success)a success)
Identifying discourse markers 4/6Identifying discourse markers 4/6 Stage signalsStage signals
Hoey (1994) — elements of structure often signalled by Hoey (1994) — elements of structure often signalled by characteristic wordscharacteristic words
Stage signals Stage signals as the most basic level as the most basic level – ““Cars are a common way of getting from A to B. Cars are a common way of getting from A to B. HoweverHowever, ,
the congestion that they cause is a problem. the congestion that they cause is a problem. The solution isThe solution is to get people to use public transport. to get people to use public transport. In this wayIn this way everyone everyone can get to work quickly.” can get to work quickly.”
Identifying discourse markers 5/6Identifying discourse markers 5/6 Cue words as signalsCue words as signals
Hoey (ibid.): Discourse structure essentially Hoey (ibid.): Discourse structure essentially evaluativeevaluative – e.g. “If thyristors are used to control the motor of an electric e.g. “If thyristors are used to control the motor of an electric
car, the vehicle moves smoothly but with poor efficiency at car, the vehicle moves smoothly but with poor efficiency at low speeds” low speeds”
– ““Problem” stage signalled by negative evaluation “poor”Problem” stage signalled by negative evaluation “poor” So stages can be identified by spotting So stages can be identified by spotting cue wordscue words or or
phrases phrases
Identifying discourse markers 6/6 Identifying discourse markers 6/6 “Solution” signals “Solution” signals
TextLight need only be concerned with TextLight need only be concerned with “solution” “solution” signalssignals
Two examples of such signalsTwo examples of such signals
– Words to do with “Words to do with “solvingsolving”, “”, “developingdeveloping” or “” or “inventinginventing””– Change of verb form into the Change of verb form into the present perfect tensepresent perfect tense, as in , as in
"have -ed". Tense then reverts to simple present to denote "have -ed". Tense then reverts to simple present to denote that a new situation exists as a result of the solutionthat a new situation exists as a result of the solution
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisationHighlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
77 “Given” and “new” information“Given” and “new” information
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 Highlighting the newHighlighting the new
2 2 Narrative stagesNarrative stages
33 ImportanceImportance
44 IntonationIntonation
55 First statementFirst statement
66 ListsLists
77 “Solution” as “new”“Solution” as “new”
88 Quasi-revisionQuasi-revision
99 Levels of “new”Levels of “new”
““Given” and “new” information 1/9Given” and “new” information 1/9 Highlighting the newHighlighting the new
WhyWhy were best heuristics more effective than others? were best heuristics more effective than others?
Prague school (1930s) — information is composed of a Prague school (1930s) — information is composed of a mixture of mixture of “given”“given” and and “new” information “new” information
Proposition: essential factor behind the choice of text to Proposition: essential factor behind the choice of text to highlight is that they are all wayshighlight is that they are all ways in which “new” in which “new” information is signalled at the discourse levelinformation is signalled at the discourse level
"Given" and ”new" information 2/9"Given" and ”new" information 2/9 Narrative stagesNarrative stages
Theory supported by the fact that Theory supported by the fact that 80%80% of subjects of subjects stated that they were highlighting words that “stated that they were highlighting words that “marked marked significant stages in the narrativesignificant stages in the narrative.” .”
This implies information that is This implies information that is new in the context of new in the context of preceding textpreceding text
““Given” and “new” information 3/9Given” and “new” information 3/9 ImportanceImportance
We can argue that an idea’s We can argue that an idea’s perceived importance perceived importance is is judged according to the extent to which it is:judged according to the extent to which it is:
– NewNew as opposed to as opposed to givengiven– Matches a Matches a perceived gapperceived gap in the structure of the in the structure of the
reader’s reader’s domain knowledgedomain knowledge
When highlighting on behalf of When highlighting on behalf of othersothers, we have to make , we have to make informed judgement on informed judgement on how ultimate reader will how ultimate reader will define importancedefine importance
““Given” and “new” information 4/9 Given” and “new” information 4/9 IntonationIntonation
Halliday (1970) — in spoken discourse,Halliday (1970) — in spoken discourse, intonation intonation is is used to signal to the listener used to signal to the listener what the speaker what the speaker understands to be newunderstands to be new information information
Could Could highlightinghighlighting perform equivalent function? perform equivalent function?
““Given” and “new” information 5/9 Given” and “new” information 5/9 First statementFirst statement
First statement in a paragraph can be considered as First statement in a paragraph can be considered as supporting structuresupporting structure for the statement at the for the statement at the beginningbeginning of the discourse segment that contains it of the discourse segment that contains it
Operates as one of Operates as one of primary statementsprimary statements containing containing most of the “new” information in document most of the “new” information in document
““Given” and “new” information 6/9 Given” and “new” information 6/9 ListsLists
Lists typically act as Lists typically act as systematic tabulationsystematic tabulation of what the author of what the author believes to be important (i.e. “new” and relevant) information believes to be important (i.e. “new” and relevant) information
Often used for Often used for predictivepredictive purposes within a discourse, or for purposes within a discourse, or for enumeratingenumerating significant points significant points
People therefore tend to identify lists as People therefore tend to identify lists as concentrated sources concentrated sources of meaningof meaning, and as such eligible for highlighting, and as such eligible for highlighting
Speaker might very well emphasise this by Speaker might very well emphasise this by counting the points counting the points offoff using the fingers of his hand using the fingers of his hand
““Given” and “new” information 7/9Given” and “new” information 7/9 “Solution” as “new“Solution” as “new””
Solution stages comprise “new” information: a Solution stages comprise “new” information: a climactic climactic point of noveltypoint of novelty in schema, justifying status as in schema, justifying status as “highlightable” text“highlightable” text
If article modelled as histogram with columns depicting If article modelled as histogram with columns depicting sentences plotted against new information content, sentences plotted against new information content, highlighting like highlighting like slicing across the graph using a slicing across the graph using a threshold valuethreshold value
““Given” and “new” information 8/9Given” and “new” information 8/9 Quasi-revisionQuasi-revision
Criteria and procedure would have been different for Criteria and procedure would have been different for quasi-revision quasi-revision – Shorter rangeShorter range– More spontaneouslyMore spontaneously applied applied
Reader has Reader has more detailed knowledgemore detailed knowledge of what is “new” of what is “new” info for him/herself info for him/herself
Highlighting can be doneHighlighting can be done– In In real timereal time– With With greater precisiongreater precision
““Given” and “new” information 9/9Given” and “new” information 9/9 Levels of “newness”Levels of “newness”
Information can also be perceived as “new” at Information can also be perceived as “new” at several levels:several levels:– Within a Within a sentencesentence, particular , particular wordswords can be seen as new can be seen as new – Within a Within a paragraphparagraph, some , some sentencessentences can be interpreted as can be interpreted as
newnew and others as contextual or and others as contextual or supporting informationsupporting information– Within a discourse segment or Within a discourse segment or discoursediscourse, still , still longer longer
passagespassages may be perceived as containing “new” information may be perceived as containing “new” information
SummarySummary
11 HighlightersHighlighters
22 Highlighting as information visualisationHighlighting as information visualisation
33 Past studies of visual cueingPast studies of visual cueing
44 User studyUser study
55 HeuristicsHeuristics
66 Identifying discourse markersIdentifying discourse markers
77 “Given” and “new” information“Given” and “new” information
88 Future directionsFuture directions
66 Future directions Future directions
Coping with thedeluge of data.
Technology developed during the cold war is helping organisations to cope with the information explosion
Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.
“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.
“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.
ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .
Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was
interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says
Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”
Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.
“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.
“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”
Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"
Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”
The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the
Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.
Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.
Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.
The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.
The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”
Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part
of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.
“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”
With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.
“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”
Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.
A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.
His association with Verity began during his tenure as president and chief executive ofcc:Mail.
FT 7 May 1997
11 Highlighting long Highlighting long neglectedneglected
2 2 Virtues of Virtues of highlightinghighlighting
33 TextLight: to doTextLight: to do
Future directions 1/3Future directions 1/3 Highlighting long neglectedHighlighting long neglected
The study of the The study of the selection of words for highlightingselection of words for highlighting previously neglectedpreviously neglected
Potential of Potential of automatic highlighting as a toolautomatic highlighting as a tool to handle to handle information overload also neglectedinformation overload also neglected
Future directions 2/3 Future directions 2/3 Virtues of highlightersVirtues of highlighters
Output Output familiar to usersfamiliar to users
Highlighting shown to be helpful in Highlighting shown to be helpful in content recallcontent recall
Addresses issue of Addresses issue of confidenceconfidence– Highlighting acts Highlighting acts not as a censor but as a guidenot as a censor but as a guide: non-: non-
selected text (and therefore the context) always in viewselected text (and therefore the context) always in view Suitable as a Suitable as a plug-in moduleplug-in module for other programs for other programs
Future directions 3/3 Future directions 3/3 TextLight: to doTextLight: to do
Incorporate discourse Incorporate discourse segmentation algorithmssegmentation algorithms
Complete lexical dictionaryComplete lexical dictionary for cue recognition for cue recognition
Port from Prolog to Port from Prolog to JavaJava for greater portability for greater portability
TextLightTextLight URLsURLs
http://www.cogarch.demon.co.uk/textlight.htmlhttp://www.cogarch.demon.co.uk/textlight.html