15
Information Visualization (2008) 7, 118 -- 132 ©2008 PalgraveMacmillan Ltd.All rightsreserved 1473-8716 $30.00 www.palgrave-journals.com/ivs Jigsaw: supporting investigative analysis through interactive visualization John Stasko 1 Carsten Görg 1 Zhicheng Liu 1 1 School of Interactive Computing & GVU Center, Georgia Institute of Technology, Atlanta, GA, U.S.A. Correspondence: John Stasko, School of Interactive Computing & GVU Center, Georgia Institute of Technology, 85 5th St., NW, Atlanta GA 30332-0760, U.S.A. Tel.: +1 404 894 5617; Fax: +1 404 894 3146; E-mail: [email protected] Received: 3 March 2008 Revised: 26 March 2008 Accepted: 26 March 2008 Abstract Investigative analysts who work with collections of text documents connect embedded threads of evidence in order to formulate hypotheses about plans and activities of potential interest. As the number of documents and the corre- sponding number of concepts and entities within the documents grow larger, sense-making processes become more and more difficult for the analysts. We have developed a visual analytic system called Jigsaw that represents documents and their entities visually in order to help analysts examine them more efficiently and develop theories about potential actions more quickly. Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually illustrating connections between entities across the different documents. Information Visualization (2008) 7, 118 -- 132. doi:10.1057/palgrave.ivs.9500180 Keywords: Visual analytics; investigative analysis; intelligence analysis; information visualization; sense-making; multiple views Introduction Investigative analysts seek to make discoveries and uncover hidden truths from large collections of data and information. Scientists follow this process when they read research papers to learn about related efforts; news reporters perform such analyses when they investigate new stories; law enforcement and intelligence analysts carry out these kinds of investiga- tions when they review case reports. One common element of all these analytic activities is that they are cognitively very challenging, frequently involving large collections of data and text that tax a person’s memory, deduction, reasoning, and general analytic capabilities. In the latter example above, law enforcement and intelligence, an inves- tigative analyst gathers individual chunks of evidence that may range from incident reports filed by operatives in the field to open source news reports such as articles gathered via web searches. Often, the investiga- tive process involves analysts pouring over large collections of evidence, reading and reviewing the documents to make connections between seem- ingly disparate facts, essentially ‘reading between the lines’ to link sepa- rate activities into a larger plot or narrative. Sometimes the connections can be clear, for instance, a particular individual mentioned in different reports. Alternatively, the connections can be more difficult to discern, for instance, approximate overlaps in time or location. While reading documents and digesting the information therein, analysts gradually form internal mental models of the people, places, and events discussed in the documents. As the number of documents grows larger, however, it becomes increasingly difficult for an investigator to track the connections between data and make sense of it all. The sheer number of entities involved may make it very difficult for a person to form a

Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Information Visualization (2008) 7, 118 -- 132© 2008 Palgrave Macmillan Ltd. All r ights r eser ved 1473-8716 $30.00

www.palgrave-journals.com/ivs

Jigsaw: supporting investigative analysisthrough interactive visualization

John Stasko1

Carsten Görg1

Zhicheng Liu1

1School of Interactive Computing & GVUCenter, Georgia Institute of Technology,Atlanta, GA, U.S.A.

Correspondence:John Stasko, School of InteractiveComputing & GVU Center, GeorgiaInstitute of Technology, 85 5th St.,NW, Atlanta GA 30332-0760, U.S.A.Tel.: +1 404 894 5617;Fax: +1 404 894 3146;E-mail: [email protected]

Received: 3 March 2008Revised: 26 March 2008Accepted: 26 March 2008

AbstractInvestigative analysts who work with collections of text documents connectembedded threads of evidence in order to formulate hypotheses about plansand activities of potential interest. As the number of documents and the corre-sponding number of concepts and entities within the documents grow larger,sense-making processes become more and more difficult for the analysts.We have developed a visual analytic system called Jigsaw that representsdocuments and their entities visually in order to help analysts examine themmore efficiently and develop theories about potential actions more quickly.Jigsaw provides multiple coordinated views of document entities with aspecial emphasis on visually illustrating connections between entities acrossthe different documents.Information Visualization (2008) 7, 118--132. doi:10.1057/palgrave.ivs.9500180

Keywords: Visual analytics; investigative analysis; intelligence analysis; informationvisualization; sense-making; multiple views

IntroductionInvestigative analysts seek to make discoveries and uncover hidden truthsfrom large collections of data and information. Scientists follow thisprocess when they read research papers to learn about related efforts; newsreporters perform such analyses when they investigate new stories; lawenforcement and intelligence analysts carry out these kinds of investiga-tions when they review case reports. One common element of all theseanalytic activities is that they are cognitively very challenging, frequentlyinvolving large collections of data and text that tax a person’s memory,deduction, reasoning, and general analytic capabilities.

In the latter example above, law enforcement and intelligence, an inves-tigative analyst gathers individual chunks of evidence that may rangefrom incident reports filed by operatives in the field to open source newsreports such as articles gathered via web searches. Often, the investiga-tive process involves analysts pouring over large collections of evidence,reading and reviewing the documents to make connections between seem-ingly disparate facts, essentially ‘reading between the lines’ to link sepa-rate activities into a larger plot or narrative. Sometimes the connectionscan be clear, for instance, a particular individual mentioned in differentreports. Alternatively, the connections can be more difficult to discern, forinstance, approximate overlaps in time or location.

While reading documents and digesting the information therein,analysts gradually form internal mental models of the people, places, andevents discussed in the documents. As the number of documents growslarger, however, it becomes increasingly difficult for an investigator to trackthe connections between data and make sense of it all. The sheer numberof entities involved may make it very difficult for a person to form a

Page 2: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al119

clear understanding of the underlying concepts andrelationships in the document collection.

Investigative analysis activities are a major aspect ofthe broader field of sense-making.1 Sense-making hasbeen defined as ‘A motivated, continuous effort to under-stand connections (which can be among people, places,and events) in order to anticipate their trajectories andact effectively’.2 Sense-making activities certainly includethe types of investigations on which we focus, but theyinclude other activities as well, for instance, decidingwhich digital camera to buy or which hotel in London tostay at during a vacation.

At the core of our research is a belief, shared by manyothers, that visual representations of data and documentscan help people better examine, analyze, and understandthem. For example, Norman3 has described how visualrepresentations can help augment people’s thinking andanalysis processes. Card et al.4 refer to visuals used in thismanner as ‘external cognition aids.’

The objective of our research is to develop visual repre-sentations of the information within textual documentand report collections in order to help analysts search,review, and understand the documents better. We seekto create interactive visualizations that will highlight andidentify connections between entities in the documentswhere entities may be people, places, dates and organi-zations, for instance. Fundamentally, we want to buildvisual representations of the documents that help analystsbrowse and explore them, making sense of all the factsand information contained therein.

Our goal is not to replace the documents, however.We firmly believe that analysts must carefully read docu-ments to best understand them. What we seek to provideis a type of interactive visual index onto the documents,a visual analytic system5,6 that connects and links enti-ties discussed therein and thus guides analysts towardthe documents to read next. Furthermore, the interactivevisualizations should provide representations that assistanalysts in building accurate and informative concep-tual models of the underlying themes, plots, and storiesembedded in the document collection. In our experience,investigative analysts have expressed a reluctance to use‘black-box’ systems; they instead prefer a more trans-parent explorative method. Our approach is decidedlyhuman-centered but is augmented with analytic support;we want to design an easy-to-use system that puts theanalyst in charge.

Pirolli and Card7 performed a cognitive task analysisof intelligence analysts and their work that resulted in anotional model of the intelligence analysis process. Theirmodel is organized around two major activity loops,foraging and sense-making. Our work touches on bothloops, helping analysts to choose useful documents toexamine next and also to develop schema and hypothesesthat fit the available evidence. Pirolli and Card iden-tify several leverage/pain points particularly in need ofassistance within analytic processes. Two, in particular,that our work addresses involve (1) the cost structure of

scanning and selecting items for further attention and (2)analysts’ span of attention for evidence and hypotheses.They comment on the two leverage points, respectively:

‘Our analysts spent considerable time scanning data seekingrelevant entities (names, numbers, locations, etc.). Theassessment of whether or not an item is relevant also takestime. Techniques for highlighting important informationwith pre-attentive codings, or re-representing documents(e.g., by summaries) appropriate to the task can improvethese costs.’

‘Techniques aimed at expanding the working memorycapacity of analysts by offloading information patternsonto external memory (e.g., visual displays) may amelioratethese problems.’

To address such objectives, we have designed a suite ofinteractive visualizations and built a prototype systemcalled Jigsaw that implements the visualizations as sepa-rate views onto a (text) document collection. The views areconnected so that actions within one view can be reflectedin the others. We named the system Jigsaw because wethink of all the different entities and facts in a documentcollection as the pieces of a puzzle. The Jigsaw systemshould help an analyst ‘put the pieces together.’

Our research adopts Pirolli and Card’s7 conceptualmodel of intelligence analysis. We also draw upon thework of Heuer8 and Johnston9 to help understand theprocesses and practices of analysts. Furthermore, ourown earlier research on task analysis for informationvisualization10,11 and on the importance and use of inter-action in information visualization12 directly influencesthe design of Jigsaw.

This article is an expanded and updated version of apaper13 presented at the VAST 2007 Symposium. In thenext section, we provide more details about the types ofdocuments that are the focus of analysis in Jigsaw. Wealso describe the importance of identifying the differenttypes of entities in a document since these entities serveas the primary basis for the visualizations. The SystemDescription section reviews the Jigsaw system in detail,its underlying data structures, system architecture, eventmessaging, and each of the different views. In the subse-quent section we provide a short example scenario ofuse to better help the reader understand how the systemfunctions. A video that demonstrates the scenario actionsis available on the Jigsaw website.14 In the VAST 2007Contest section we describe our experience about partic-ipating in the VAST 2007 Contest. In the next sectionwe outline other, more general sense-making domains,Jigsaw could be applied in. The paper concludes with adiscussion of related work and a list of ongoing and futureefforts planned for the system.

Analyzing documentsThe target artifact of our study is a textual documentdescribing a set of facts or observations from the domain

Information Visualization

Page 3: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al120

of interest. We assume that the documents will be ina natural language, loose narrative format, and will beabout 1–6 paragraphs. While our visualization techniquesstill operate for larger documents, pragmatically the tech-niques become less useful as the document size increasesbecause the higher number of entities per documentswamps the display. Our intended target is a smallerdocument with a few nuggets of information containedtherein. News stories and case reports are good examplesof the type of document we focus on, and this type ofdocument is often the focus of analysts in law enforce-ment or intelligence.5,15

Analysis can draw on data from varied and distributedsources. The distributed nature of information leads toheterogeneity across the documents in terms of topic,authors, content, style, date, and so on. Furthermore,different reports will contain information that is unclear,confusing, or even contradictory. Organizational toolshave to consider both the complexity of the informationand the analysis task.

Below is an example document, taken from the VAST2007 Conference Contest,16 that provides a flavor of thetypes of documents on which we are focusing. A largenumber of different events, items, themes, and stories canbe embedded throughout a collection of thousands oreven just hundreds of such reports.

Report 20040510-4_16: May 14, 2004

VANCOUVER, British Columbia -- A Canadianimmigration panel is considering whetheraccused environmental saboteur Tre Arrow canapply for refugee status in Canada.

Arrow, 30, who is wanted for fire bombinglogging and cement trucks in Oregon, asked theCanadian authorities to remain in Canada as apolitical refugee at a hearing in Vancouver onTuesday.

A key issue will be whether Arrow is affiliatedwith a terrorist group, which would immediatelydisqualify him from receiving refugee status inCanada, authorities said.

The Immigration and Refugee Board is scheduledto decide by May 31 whether Arrow is affiliatedwith the Earth Liberation Front, a group theFBI considers a terrorist organization respon-sible for scores of attacks on property overthe past dozen years.

Each document in the collection includes a number offacts and refers to people, places, objects, dates, actions,and so on. Individual documents are relatively focused,however, and typically describe a particular event oroccurrence. So, one may think of each document aspresenting a small bit of evidence. Thus, investigativeanalysis in our context is the process of connecting a

series of individual bits of evidence to construct a larger,broader story or narrative. It is about understanding howthe individual events and entities referred to in the docu-ments relate to each other and, when composed together,reveal a larger plot.

For instance, consider police investigators who have alarge collection of case reports. Suppose that one reportdescribes a crime in which a blue Ford car was seenfleeing from the scene. Investigators may wonder if othercase reports mention a blue Ford, and if so, what relatedactivities are mentioned in those reports. Suppose thatanother such report exists and it describes a known crim-inal who used a blue Ford in a recent theft. Investigatorsmight extend their search to examine activities of thisindividual, places he has been, known accomplices, andso on, keeping a careful watch for unexpected, serendip-itous connections and coincidences that arise (and arenot initially uncovered by a simple ‘blue Ford’ search).Ultimately, investigators are likely seeking to uncoverlarger criminal plans and threats in order to either thwartplanned crimes before they occur or to apprehend thecriminals being sought.

While other well-known systems such as IN-SPIRE17

focus mainly on themes or concepts across documentcollections, the primary unit of analysis from documentsfor Jigsaw is an entity. Within any document one canidentify a set of entities. Example entity types includeperson, place, date, organization, time, and money.Jigsaw allows any type of entity to be included in theanalysis.

The goal of the Jigsaw system is to highlight andcommunicate connections and relationships betweenentities across a document collection. We believe thatthese connections, when assimilated, help to provide theanalyst with a better global understanding of the broaderthemes and plans hinted at by the particular events andfacts documented in the reports.

Obviously, an initial requirement for Jigsaw is to iden-tify and extract the entities from each document and,ideally, store them in a format that allows easier analysisand manipulation. Named entity identification, however,is not the focus of our work, but it is a process that hasbeen studied extensively.18,19 We presently incorporatesoftware from the ANNIE package of the GATE system20

into Jigsaw to perform entity identification and we allowthe analyst to manually identify or correct the identifi-cation of entities too. Alternative free and commercialentity identification systems also exist, such as Balie,21

FreeLing,22 Connexor,23 and Cicero.24 We chose GATEbecause it is open source and built in the Java program-ming language as is Jigsaw.

Further automated analysis of the document text couldbe performed as well. For instance, algorithms deter-mining the content of documents with respect to frameanalysis and social movement theory exist,25 which tellsus how people try to influence target audiences and howthose audiences respond. Other computational linguisticsanalysis techniques exist too, but presently Jigsaw does

Information Visualization

Page 4: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al121

not incorporate this type of analysis. We believe that thisis a fruitful area for future research.

We adopted analysis exercises created by Frank Hughesof the Joint Military Intelligence College26 to serve asinitial sample test data for Jigsaw. The exercises involvecollections of fabricated reports with an embeddedmaster plot. Different reports in the collection hint atthis plot and the goal of the exercise is to discover andarticulate the plot. To make identifying the master plotmore challenging, threads of other unrelated plots aresuggested in the reports as well. Analysts in trainingperform exercises like this as part of their educationalprocess.

The 2006 and 2007 IEEE VAST Symposium Contestsprovide large synthetic document collections of this typeas well. We entered the 2007 Contest using Jigsaw asan analytic aid and won the university component of thecontest. In fact, we use documents from the 2007 VASTContest in the descriptions of Jigsaw and its views in thenext section. The sample analytic scenario described laterin the fourth section uses a report set from one of theHughes’ exercises.26 We have altered some of the namesand other entities from the already fabricated documentsof that exercise for further anonymization.

System descriptionOverviewJigsaw provides an analyst with multiple perspectives ona document collection. The system’s primary focus is ondisplaying connections between entities across the docu-ments and providing a type of visual index onto the docu-ment collection. We use a simple and easy-to-understandmodel of entity ‘connection’: two entities are connectedif they appear in one or more documents together. Other,more semantically rich models of connection could beincorporated into Jigsaw as well.Jigsaw presents information about documents and

entities through multiple distinct visualizations, calledviews. Each view provides a different perspective ontothe data. The views, which are discussed in more detailin the following subsections, include:

• a List View containing multiple reorderable lists ofentities in which connections between entities areshown by coloring related entities and drawing linksbetween them, multiple sorting options are available;

• a Graph View displaying connections between enti-ties and documents in a node–link diagram, allowinganalysts to dynamically explore the documents byshowing and hiding links and nodes;

• a Scatter Plot View highlighting pairwise relationshipsbetween any two entity types and supporting focus ona specific subset of the displayed entities using rangesliders;

• a Document View displaying the original text docu-ment with highlighted entities, showing how often a

document already has been viewed, and supportingentity modification;

• a Calendar View providing an overview of the docu-ments and the entities within them according to thepublication date of the document;

• a Document Cluster View representing all documentsin the collection and providing manual and automatedcommands to partition the documents in meaningfulclusters;

• a Shoebox supporting the analyst in the evidencemarshalling process, providing functionality to buildhypotheses and organize collected evidence.

The views implement two different perspectives onthe document collection. The Scatter Plot View, theDocument View, and the Document Cluster View aredocument based: documents are units of interaction andentities such as place or person are only shown in thecontext of a document. The List View, the Graph View, andthe Calendar View, on the other hand, explicitly presententities as well as documents as units of interaction.

While designing the Jigsaw system we focused onkeeping the user interface as easy to use as possible.The most common operations are available by singleand double mouse clicks. A single click selects an item,a double click expands an item. More specifically,expanding a document shows all its contained entitiesand expanding an entity shows all the documents inwhich it can be found. Each of the views handles thetwo events differently and provides its own style of visualfeedback. Other advanced operations, such as displayingitems in other views or removing items from a view, areavailable in a right click-initiated pop-up menu.

User interaction with one view is translated to an eventand communicated to all other views which then updatethemselves appropriately. Through such communication,different aspects of the documents can be examinedsimultaneously under different perspectives. For eachview, users can turn on and off event receipt dependingupon whether they want the view to stay synchronizedwith the most recent interactions in other views (thesatellite dish icon in the right upper corner of eachview displays its event receiving state). Users can createmultiple instances of each view type and clone existingviews as well. This capability allows a view to be frozenat an interesting state (event receipt turned off) while anew version of that view continues to receive events andupdate its state. Additionally, the user can bookmark aview which saves the current state of the view so thatit can be returned to later in the investigative process.This operation provides a first step for asynchronouscollaboration using Jigsaw: via bookmarks analysts cansave and exchange views that document important statesduring an investigation.

Owing to the large amount of screen real estaterequired to display its views, Jigsaw ideally should beused on a computer with multiple and/or high-resolutionmonitors. We run the system on a computer with four

Information Visualization

Page 5: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al122

Figure 1 Jigsaw being used on a multiple-monitor computer.

displays, shown in Figure 1. Such a configuration allowsone to minimize window flipping and reordering duringanalysis. Decreasing prices and smaller footprints of LCDmonitors have made such configurations more common.

System architecture and infrastructureJigsaw is written in Java and adopts a model–view–controller architecture that separates the data (model)and user interface (view) components. Jigsaw createsobjects for all documents and extracted entities and storesthem in a general data structure model – the entities aretagged by their respective category name and bundled perdocument. This data structure is encapsulated in a classthat provides an interface for the different view classes tocall and retrieve entity-document information in orderto build visualizations.

A controller class coordinates event communicationfrom and to the views. Messages dispatched by views firstgo to the controller which then forwards the message to allreceiving views. (Recall that each view provides a buttonto enable/disable event receiving.) A number of differentevent types exist: select, display, expand, remove, entity-modification, and add-to-Shoebox. A select event occurswhen a user selects either an entity or a document in aview – such a selection is usually performed by a mouseclick on the object. As feedback, the entity or documentchanges color or is highlighted visually. A display eventoccurs when a user explicitly indicates that an entity ora document should be displayed in all views where itis not currently visible. Users can initiate display eventsby performing a particular mouse gesture or by issuinga search query. An expand event is usually performed by adouble click on an item and the views display the itemsconnected to the one that was expanded. The remainingevents occur when a user performs the correspondingcommand in the pop-up menu and the views updatetheir state accordingly to the chosen command.

The Jigsaw Control Panel, shown in Figure 2(A),provides a variety of menu commands for use in thesystem. The active project (consisting of the documentcollection and the identified entities) and the workspace(consisting of all views that are currently open) can besaved and restored. New sources can be imported andadded to the current or to a new project. Entities areextracted from the sources during the import process.Currently, Jigsaw can import plain text files or xml fileswith already identified entities. We plan to implementmore filters to import webpages as well as pdf and Worddocuments.

The Control Panel also shows the color coding schemefor the available entity types and it provides a queryinterface for users to search for any entity as well as anystring mentioned in the documents. When a query isissued, either the matching entities and/or the documentscontaining that string return as a result, and a displayevent is dispatched to the receiving views telling them toshow the result(s). Finally, the Control Panel allows theuser to instantiate the different views.

List ViewThe List View, illustrated in Figure 2(B), shows connec-tions between sets of entities. The view consists of anumber of lists, and each list shows entities of one specifictype. The user can add and remove lists as desired – thenumber of lists is pragmatically constrained only by thehorizontal space in the view. Once a list is displayed, aselection box at the top allows the user to change theentity type shown in that list. Thus, even the same typeof entity can be placed side by side in the view; wheninvestigating a social network, for example, it is useful tohave two person lists side by side. The List View shownin Figure 2(B) contains three entity lists: places, persons,and organizations.

Selected entities are highlighted in bright yellow and allconnected entities in all lists are highlighted in a shade oforange. The brightness of the highlighting on a connectedentity indicates the strength of the connection: if the twoappear together in only one document, a light orange isused, but if the two appear together in multiple docu-ments, an increasingly dark orange is used as the numberof co-appearances rises. Furthermore, the view draws linesbetween connected entities in adjacent lists to indicate theconnection even further. In Figure 2(B), two entities areselected and highlighted in yellow: ‘saboteur Tre Arrow’ inthe person list and the ‘Immigration and Refugee Board’in the organization list.

The items in a list can be sorted either alphabet-ically, by the strength of the connection, or by thefrequency of appearance in different documents in thedocument collection. This appearance frequency for anentity is represented by a small bar at the left end ofeach item in the list. In Figure 2(B), the organizationlist is sorted by frequency, the two other lists are sortedalphabetically.

Information Visualization

Page 6: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al123

Figure 2 (A) The Control Panel. (B) The List View. Selected entities are shown in yellow and connected entities are indicated bythe joining diagonal lines and the orange shading. Darker shading represents stronger connections to the selected entities. (C) TheGraph View. Documents are white rectangles and entities are circles colored by type. Edges connect documents to the entities theycontain. (D) The Scatter Plot View. Each axis enumerates a list of entities. Diamonds in the center indicate documents containingparticular pairs of entities, one from each axis at the relative x, y position.

For some tasks it is useful to consider all entities of aspecific type, for example to find out what place has thehighest frequency in a document collection. The button‘Add all’ allows the user to add all entities of the selectedtype to a list. If a list of entities is too long for all the itemsto fit in the view, scrollbars appear to aid navigation. Inthis situation, many connected items may not be visiblein the view. Thus, the List View also provides a mode inwhich all selected and connected entities are automati-cally moved to the top of the list.

Graph ViewThe Graph View, illustrated in Figure 2(C), representsdocuments and their entities in a traditional node–link

graph/network visualization common in many othersystems. Entities are depicted as labeled circles coloredaccording to their type. Documents are representedas white rectangles. The edges from documents to theentities they contain are shown as well.

Following the query-based exploration approach ofJigsaw, the Graph View does not automatically drawall the documents and entities as one large network. Wethought that a layout of such a large network wouldbe overwhelming and difficult to understand, and thuswould not be as helpful to the analyst in our context.Instead, Jigsaw’s view is incremental. Display events(triggered from a query or another view) place documentsand entities on the display, and then mouse clicks onitems can expand or collapse their connections.

Information Visualization

Page 7: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al124

The view uses a simple layout algorithm. Both docu-ments and individual entity nodes are randomly posi-tioned in the plane when they are first shown. When adocument is being expanded all its entities are displayedas a group, and are drawn at random positions in asmall circle around the document like satellites orbitinga planet. When an entity is being expanded, all thedocuments in which it occurs are displayed in a biggercircle around the entity to create some space for furtherexpansion of those documents. If documents or entitiesare positioned outside of the visible screen area, the viewautomatically pans and zooms to make sure that all itemsare visible.

We have found that this simple layout provides reason-able drawings for Jigsaw’s needs. In addition, the usercan click on any entity or document and drag it to a newlocation. To ease this manual re-layout process, dragginga document or an entity while holding down the SHIFT-key also moves all its connected items along.

The entities-as-satellites graph visualization providesanother important connections view in Jigsaw sincethe user can see all the different entities mentioned in adocument together. Furthermore, the visualization showsan entity mentioned in multiple documents via the linesdrawn from the different documents to that entity. Wehave found the view to be useful in an interactive explo-ration mode – the analyst displays an initial document orentity, then expands the item to reveal its relations, andexpands one of those items to reveal more, and so on.This type of interaction alternately reveals documentsand connected entities.

Multiple nodes can be selected via CTRL-key clicks or byrubber-banding a rectangular or circular selection region.A small plus sign marks nodes that are not fully expanded.The system also provides an inverse selection operationthat toggles the selected/unselected state of each node.Other commands allow node(s) to be hidden (they retainthe same position if they are subsequently shown again),different types of entities to be filtered from the display,and edges or labels to be removed.

To ease navigation, zooming and panning is imple-mented through mouse wheel interactions. In addi-tion, the Graph View also provides a ‘Circular Layout’command that repositions all the visible documentsequidistant around a large circle in the view. Entitiesconnecting to only one document are drawn near thatdocument, but outside the circle. Entities connecting tomore than one document are drawn inside the circle.Thus, the set of entities easily noticeable inside the circlerepresents a more highly connected network of enti-ties that may be related in important ways and likelyshould be examined more closely. Figure 5 in the section‘Other Application Domains’ illustrates the circular layoutapproach.

Scatter Plot ViewThe Scatter Plot View, as shown in Figure 2(D), high-lights pairwise connections between entities and shows

the documents containing the coincidences through apseudo Starfield display.27 The user specifies, through aselection box on each axis, the entity type to be placedon that axis. All entity names of that type that have beendisplayed by queries or display events from other viewsare then (logically) listed along the axis – in chronologicalorder for dates and in alphabetical order for other enti-ties. If entities from each of the two axes appear togetherin a document, a diamond is drawn in the view at theconjunction of the two entity’s positions along the respec-tive axes. Since a document can contain more than oneentity of the same type, multiple visual representations(diamonds) of the same document can appear together inthe view at the same time. When moving the mouse overa document, all instances of that document in the vieware highlighted. The user can also mark a document witha specific color. This mark will be applied to the docu-ment even if the user changes entity types on the axes,thus allowing the user to follow it across other entity-typeinvestigations.

Representative entity labels are drawn in a readable fontsize at equally spaced intervals along each axis to help theviewer. However, it is likely that many more entities existin each category than can be shown this way. The viewdisplays these other labels in a tiny illegible font size toprovide a hint about the quantity of labels missing. Whenthe user moves the mouse pointer over an entity name,the scatter plot magnifies that item to be readable.

With a large set of entities displayed on the axes, thedisplay area can become cluttered with many diamondsrepresenting relevant documents. To address that problemand help the viewer focus on sets of entities, the ScatterPlot View provides range sliders on each axis so that theviewer can zoom in on a segment of the axis. The viewthen updates to only show documents containing entitiesin that smaller range. We have found this capability partic-ularly useful when dates are shown on an axis as a typeof time-series view. The viewer can narrow the display tofocus on a small interval of time.

Document ViewReading and understanding the actual text documents isa crucial part of an investigative process. To facilitate thisneed, Jigsaw includes a textual Document View as shownin Figure 3(A). Multiple documents can be stored in oneDocument View – they are organized in a list at the leftborder and the user can select a particular document tobe displayed. All the entities in the document are high-lighted in colors consistent with the color coding of theentity types. The tag cloud at the top of the view describesthe contents of the marked documents in the documentlist.

Documents that already have been viewed are coloredblue in the list of available documents and a number atthe left side of their list entry shows how often they havebeen viewed. The list of available documents can be sortedeither, alphabetically, by view-count, or it can represent

Information Visualization

Page 8: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al125

Figure 3 (A) The Document View. The document discussed in the second section is displayed with entities highlighted and coloredby type. (B) The Calendar View. Small diamonds represent documents or entities connected to the date in which they appear. (C)The Document Cluster View. Each small rectangle represents a document in the collection, and can be colored and grouped basedon entities that do or do not appear in it. (D) The Shoebox. The group, hypothesis, sentences, and link feature are used to organizeevidence.

the order in which documents have been added to theview. The find function at the bottom of the view allowsthe user to quickly find entities or text strings in the opendocument or in all documents that are currently availablein the view.

The automated entity identification process in Jigsawtypically is not perfect. Some entities may not be iden-tified at all, some may have an incorrect entity typeassigned, and some identified simply may not be entities.The Document View provides functionality to correctsuch errors. Identified entities can easily be removed ortheir assigned type changed to another existing type or

to a new entity type by right clicking on the identifiedentity and selecting the relevant command from a pop-upmenu. If an entity was missed during the identificationprocess, it can be interactively designated as an entity byselecting the text and assigning an entity type. All theseoperations to modify entities can be applied to a singledocument or across all documents in the collection.This allows the user to make local corrections (e.g. theentity ‘Paris’ is always identified as a location but in onedocument it is a name) as well as global changes (‘BMW’is not identified as an entity and should be added as anew entity type ‘car’ in all documents).

Information Visualization

Page 9: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al126

Calendar ViewThe Calendar View, shown in Figure 3(B), presentsdifferent documents and entities from the data set in thecontext of a familiar calendar showing years, months,weeks, and days. The small diamond items drawn ona particular day represent documents (colored gray) orentities (colored according to its type) in the context ofthe date(s) noted in documents in which they appear.When the number of items associated with a day is toolarge to draw them all in that region, a number is drawnindicating how many others appear on that day but arenot shown. As the user moves the mouse over that day, alarger rectangle pops up and shows all the items. Whenthe user moves the mouse pointer over a document-representation diamond drawn in the calendar, all theentities appearing in that document are shown on theleft.

The Calendar View is particularly useful to explorepatterns over time that are not visible by examiningconnections within documents. For example, if Peter isdropping off packages on Mondays (mentioned in onedocument) and Mary is picking up packages on Tuesdays(mentioned in another document) there exists a poten-tial connection between Peter and Mary even thoughthey are never mentioned together in one document. TheCalendar View can reveal these types of connections byvisualizing the time pattern.

Document Cluster ViewThe Document Cluster View, shown in Figure 3(C), repre-sents all the documents in the collection, or some subset ofthe documents, as rectangles. The user can drag individualdocuments or collections of documents to create differentclusters. In addition, all queries and display events trig-gered by other views are stored in a list of filters (the topleft tree in the view) and the user can apply those filters topartition the collection of rectangles into different clus-ters. In Figure 3(C) the user displayed different places andthen clustered the document collection according to thosedisplay events. The user can also highlight the documentsviewed or not viewed so far, or all the documents thateither include or do not include some entity (using thedocument/entity tree at the bottom left in the view).

The Document Cluster View is helpful to gain anoverview of the document collection, to organize thedocuments according to keywords, to decide what docu-ments could be relevant for a specific task and shouldbe inspected, and what documents are not relevant andshould be removed from the document collection.

ShoeboxThe Shoebox, shown in Figure 3(D), is not really adocument/entity view. Instead, it provides rudimentaryevidence marshalling support. The analyst can add rele-vant entities and documents from other views to theShoebox; they appear first in the ‘inbox-area’ on the left

side of the Shoebox. These added items can be distributedacross different layers, combined to groups or hypothesesand linked together to form sentences. It is also possibleto integrate hyperlinks to bookmarks of other views.Provided with these basic functions, the analyst can orga-nize the collected evidence. These sense-making artifactssupport the analyst’s thinking process in a visual wayand begin to provide marshalling support as exhibited byother systems.28–31

ScenarioIn this section we walk through an analysis scenariowith a fictional data set to demonstrate more specificallyhow Jigsaw supports an analyst. The Jigsaw website14

provides a video that demonstrates the scenario actions.Suppose that an analyst received information regarding

a suspicious person named Michael Jones. To learn moreabout him, the analyst starts Jigsaw, opens the data set,displays the List View, selects Person as the entity to beshown in the left list, adds all person entities to the list,and sorts the list by frequency. Michael Jones appears atthe second position and the long bar next to the nameindicates that he is mentioned in a number of docu-ments. In order to explore people associated with Jones,the analyst places Person entities in the second list aswell and moves the people associated with Jones to thetop. The color mappings imply that Martin Clark has thestrongest connection to Jones since his name is coloredin a dark shade of orange (see Figure 4(A)).

To verify this connection the analyst examines theDocument View to read the documents about MichaelJones. He is mentioned together with Martin Clark in twodocuments (FBI_11 and FBI_35) and thus the connectionseems plausible.

The analyst returns to the List View, selects both Clarkand Jones, and puts Organization entities into a third listwhich reveals that both men have connections to thesame organizations. The Revolution Now ScholarshipFund has the strongest connections of any organization,so the analyst continues the exploration on it.

The Document View shows two documents mentioningthe Scholarship Fund. Report FBI_35 mentions thatMichael Jones donated $48,000 to the fund on the stip-ulation that the donation be equally split among sixstudents, Martin Clark being one of them. The analystalso notes that the six students form three pairs – wherestudents in each pair live close to each other. This raisessuspicions that the students might be collaborating.

Proceeding, the analyst brings up the three documentsabout Martin Clark and William Brown (who both livein Virginia) by displaying them in the List View. Two ofthe documents were already encountered in this investi-gation and the third, FBI_41 (see Figure 4(B)), states that amonth ago Clark and Brown took a cruise together fromHampton to Kingston, Jamaica. Furthermore, both areagain on this cruise right now. The document also says

Information Visualization

Page 10: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al127

Figure 4 (A) The List View, (B) Document View, and (C) GraphView from the example scenario.

that two other scholarship recipients, Thomas Taylor andRobert Johnson, took a cruise together from New York Cityto Montego Bay last month and they are also currently onthis cruise again.

Now, to gain a deeper understanding of the connec-tions between the people and places, the analyst exam-ines report FBI_41 in the Graph View. After expandingthe node for the report and filtering out the date entities,the analyst expands the nodes representing Kingston andMontego Bay. The view reveals that both are connected tothree document nodes: FBI_14, CIA_10, and NSA_6. The

analyst selects and reads these documents in the Docu-ment View.

All three documents mention the person DanielHarris who works in Montego Bay. The analyst issues aquery on Harris, showing the man’s entity in the GraphView. She expands his node and connections to sevenmore documents show up (see Figure 4(C)). Upon readingthese documents, the analyst learns that Harris traveledfrom Montego Bay to Kingston on 1 December and passeda package to a person named Edward Thompson.

The analyst concludes the investigation hypothe-sizing that suspicious activities are planned involvingsome of these individuals traveling on cruise ships inthe Caribbean and with potential packages of interest. Theanalyst suggests that further investigation be conductedfocusing on Daniel Harris and related activities.

VAST 2007 contestWe participated in the VAST 2007 Contest16 usingJigsaw as an analytic aid, and we won the universitydivision of the competition.32,33 While working on ourcontest entry, we learned a great deal about how to effec-tively use the system, we fixed usability problems, andwe implemented more functionality that was useful forspecific tasks. The system matured quite a bit during thatprocess. One key finding was that it is crucial for Jigsawto have correctly identified entities since they definethe connections across the documents. If entities aremissed or wrongly identified, connections are missing ormisleading. We added the entity modification function-ality to the Document View to address that importantproblem.

We also were invited to participate in a special livecontest at the conference. We worked with a professionalanalyst on a smaller data set using Jigsaw. Feedbackfrom the analyst was very beneficial. He enjoyed workingwith Jigsaw, in particular the circular layout functionof the Graph View, since it facilitates the visual explo-ration process and shows at a glance what entities a setof documents has in common. Some other points forimproving Jigsaw’s capabilities arose as well: adding aview to visualize geographic information and providingbetter support to build a timeline in the Shoebox wouldbe helpful for an analyst.

Other application domainsSo far, we have concentrated on applying Jigsaw indomains like intelligence and law enforcement but wecan envision using it in a more general sense-makingprocess and in other domain areas as well.

We already applied Jigsaw to explore connectionsbetween people, places, and organizations in the Bible.We built a set of documents by dividing the Bible intoits chapters, imported that data set into Jigsaw, andinvestigated the connections. The Graph View in Figure 5shows the social network of ‘Cain’ and ‘Abel’ as a circular

Information Visualization

Page 11: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al128

Figure 5 The Graph View showing Cain's and Abel's socialnetwork in the Bible along with the places they are associatedwith.

layout along with places mentioned in chapters in whichthey appear.

One area that seems well-matched to Jigsaw’s capabil-ities is academic research involving journal and confer-ence papers. People new to a discipline may want to learnabout the important researchers and papers in the area.Existing tools like PaperLens34 address such scenariosalready but trails of connected articles are not so easy tofollow. In Jigsaw we can use metadata of papers such asauthors, institutions, categories, etc., as entity types andthen visualize and explore connections between theseentities. A second scenario is to define entity types rele-vant for a specific research area. For example, in biologypapers about research on genes, the different genes couldmap to a set of entities and biologists could benefit bylearning about connections of genes across all relevantpublications.

Another possibility would be to use Jigsaw to exploreonline articles such as web news reports or blogs on topicslike political debates. Using the import mechanism ofJigsaw, a lay user could build a set of loosely related docu-ments about a particular topic or person (retrieved using asearch engine), and then use Jigsaw to find connectionsbetween entities among the documents. That would helpto filter the collection to a smaller number of documentsfor careful reading.

In another paper35 we discuss in more detail how to useJigsaw in a human-centered sense-making process usingvisual exploration.

Related work

A growing number of research and commercial systemsare using visualization and visual analytic techniques tohelp support investigative analysis.

Analyst’s Notebook from i2 Inc.36 provides a semanticgraph visualization to assist analysts with investigations.Nodes in the graph are entities of semantic data types suchas person, event, organization, bank account, etc. Whilethe system can import text files and do automatic layout,its primary application appears to be helping analysts inmanually creating and refining case charts.

Oculus Info Inc. provides a suite of systems for differentaspects of investigative analysis. GeoTime37 visualizesthe spatial inter-connectedness of information over timeoverlaid onto a geographical substrate. It uses an inter-active 3D view to visualize and track events, objects,and activities both temporally and geospatially. TRIST38

allows analysts to formulate, refine, organize and executequeries over large document collections. Its user interfaceis a multi-pane view that provides different perspec-tives on search results including clustering, trend anal-ysis, comparisons, and difference. Information retrievedthrough TRIST then can be loaded into the SANDBOXsystem,31 an analytical sense-making environment thathelps to sort, organize, and analyze large amounts of data.The system’s goal is to amplify human’s insights withcomputational linguistic, analytical functions, and byencouraging the analyst to make thinking more explicit.The system offers interactive visualization techniquesincluding gestures for placing, moving, and groupinginformation, as well as templates for building visualmodels of information and visual assessment of evidence.An evaluation experiment of the SANDBOX system showedthat analysts using the system did higher quality analysisin less time than using standard tools. Jigsaw providesa different style of visual representation of documententity data to analysts; TRIST and SANDBOX provide moreauthoring and organizational infrastructure.

The COPLINK system15,39 and related suite of tools wasdeveloped to help law enforcement officials more easilyextract information from police case reports and analyzecriminal networks. The system uses data mining tech-niques to set up a concept space of entities and objectsthat can be searched to find related items. Visualizationsupport consists of a hyperbolic tree view and a spring-embedder graph layout of relevant entities. Jigsaw’sGraph View shares the exploratory sense of the hyper-bolic tree but does it for a connected graph. Beyond that,Jigsaw provides a more varied suite of visualizations ofdocuments and their entities, but it does not presentlyinclude the sophisticated mining and analytic capabilitiesof COPLINK.IN-SPIRE17 is a system for exploring textual data

in document collections and it has been used exten-sively for intelligence analysis.40 The system generates a‘topical landscape’, either through a 3D surface plot ora galaxy-style view, that supports queries, provides the

Information Visualization

Page 12: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al129

possibility to analyze trends over time, and allows analyststo discover hidden information relationships amongdocuments. Its goal is to identify and communicate thedifferent topics and themes, and then allow the analystto inspect the documents more deeply through inter-active analysis. Jigsaw differs in its focus on exploringrelationships among the entities in documents.ENTITY WORKSPACE28 is a tool to amplify the usefulness

of a traditional evidence file that is widely used by analyststo keep track of facts. It provides an explicit model ofimportant entities to help the analyst to find and re-findfacts rapidly, discover connections and identify importantdocuments and entities to continue the exploration. Thesystem is just one of a suite of tools from PARC directedat assisting sense-making.41

The two systems probably closest to our work, KANI andProSPECT, come from PNNL. KANI,29 like Jigsaw focuseson visualizing entities from textual documents. KANI hastwo main views, a document viewer that highlights enti-ties and their selected relationships and a graph view thatshows different entities connected in a node–link struc-ture. The system provides extensive filtering capabilitiesto the analyst and includes automated associate compo-nents that help with activities like hypothesis refinementand assumption testing. Jigsaw goes beyond KANI inthe variety and style of the interactive visualizationsprovided, but KANI has a more complete infrastructurefor supporting reasoning and hypothesis formulation.

Sanfillipo and colleagues at PNNL30,42 created theProSPECT system that extracts scenario informationfrom unstructured intelligence data sources. Their systemwas built to provide multiple views on multiple moni-tors as does Jigsaw. More specifically, they providethree main views or spaces: a source space for collectingdocuments and sources; a marshalling space for exam-ining evidence and reasoning; an analysis space forconstructing hypotheses. ProSPECT performs languageanalysis and uses ontologies to facilitate reasoning, andit supports analysts to put confidence levels on evidenceand hypotheses. Jigsaw provides a richer set of docu-ment and entity views, with a focus on entity connection,than does ProSPECT, but the source intake and hypoth-esis exploration facilities of ProSPECT go beyond whatJigsaw currently provides.

Our approach differs primarily from that of the systemsdescribed above in its focus on visually representingconnections and relationships between entities in docu-ment collections. We advocate a multitude of data viewswith simple, consistent user interactions for promotingfast exploration and discovery. Furthermore, Jigsawprovides a system model where user interaction is a first-class object, helping to expose the entity connections,and providing for easier extensions to new styles of views.

Discussion and future directionsWhile Jigsaw provides a number of capabilities that webelieve will be useful for investigative analysis, our work

has only begun to scratch the surface of what is possiblein this area. Numerous avenues of research and extensionsto the system are possible in future work. In fact, we havemany already underway.

Because the system has yet to be rigorously evaluated,that is an obvious next step. We have conducted initialusability assessments of the individual system viewsto improve each’s effectiveness and we are presentlyconducting studies to compare groups of people (in thiscase, graduate students) performing document analysisboth with and without Jigsaw present. Obviously, trialuse of the system and evaluation by professional analystsis a key future action to understand Jigsaw’s strengthsand weaknesses and to reflect on its utility. Our use ofthe system in the VAST 2007 Contest provides at least alittle empirical evidence that it is useful in the kinds ofinvestigative scenarios exemplified by the contest.Jigsaw currently does not ‘start cold’ well. By this, we

mean that Jigsaw does not provide an overview of thekey themes and trends across a document collection tohelp an analyst begin exploration as is done well by, forinstance, the IN-SPIRE17 system. Jigsaw is more usefulwhen an analyst has a few target entities of interest andbegins an investigation exploring them. The documentanalysis and clustering capabilities of IN-SPIRE recentlyhave been exported into web services, however, and we areplanning to incorporate these capabilities into Jigsaw.Additionally, we are exploring more ‘discount’ methods ofproviding topical overviews and key themes in the docu-ments such as the present use of tag clouds to presentimportant words used in and across the documents.

Presently, Jigsaw provides a higher level of visual-ization support than automated analytic support. In anidealized visual analytics system, these two aspects areeach deeply present and they are blended seamlessly. Asdiscussed earlier in the article, enhancing the compu-tational linguistic support of the system could providepowerful new capabilities. Furthermore, analytic tech-niques from the link analysis community could providemore sophisticated graph analytic capabilities to Jigsaw.Jigsaw has been used to examine document collec-

tions numbering in the thousands and we believe that itcould handle collections in the tens of thousands reason-ably well, but much larger document collections clearlymay be arise, particularly if web searches are performed.Scalability is always an issue in any kind of visual analyticsystem. In order to address larger collections, we haveconsidered notions such as a document ‘holding area’where documents are available for analysis but not fullyintegrated into the system views yet. Analysts shouldbe able to fluidly add and dismiss documents withoutaffecting system performance or utility.Jigsaw’s model of entity connection being calculated at

the document level is both simple and useful, but as docu-ment sizes increase, this model becomes less accurate andinformative. We plan to adopt a more fine-grained modelby classifying connections according to their locality:entities are loosely connected if they appear in the same

Information Visualization

Page 13: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al130

document, they are moderately connected if they appearin the same paragraph, and they are strongly connectedif they appear in the same sentence. Determining entityconnections on a semantic level much like that done bySanfilippo30 may be needed in order to facilitate analysison larger documents.

Our use of the system has suggested the need for a dedi-cated geospatial view. The GeoTime system37 providesa good example here. In sample analysis sessions, wehave noted the absence of explicit views supportinggeolocation-related analysis. Of course, adding even moretypes of views raises issues concerning the multiplicityof views – could an abundance of representations over-whelm analysts rather than assist them? How manydifferent kinds of views can profitably be used together?

Unstructured text documents are currently the analyticfocus of Jigsaw, but many other types of data andevidence exist. For instance, structured data (often inthe form of spreadsheets or databases) might include airtravel records, vehicle registrations, bank account trans-actions, immigration records, cargo container logs, etc.By expanding our system to also read structured data, webelieve that its analytic capabilities will be greatly ampli-fied. The key research question here is how to integratestructured data with the current document and entity(type) model of Jigsaw. How are rows and columns(objects and attributes) interpreted? Analysts shouldnot be forced to specify the mapping of each field of adatabase to add its data to an investigation.

In the views presented by Jigsaw, entity connectionsto documents and other entities are taken as facts or truth,as is commonly done in visualization.43 System viewsillustrate these connections using the straightforwardmodel of connection employed now. We have learnedthrough informal discussions with investigative analyststhat reports often may come from sources with varyinglevels of reliability or the reliability of a document simplymay not be known. Furthermore, reports may describeevents that are uncertain.

For instance, a police report may come from a sourcethat is unreliable, thus raising questions about the activ-ities described in the report. Alternatively, intelligencereports may include language such as ‘It is probablethat. . .’, ‘We believe that. . .’, or ‘There is little chancethat. . .’. Clearly, to analyze reports like these, relativelysophisticated linguistic analysis is needed and even then,determining the probability of the event is challengingand debatable.8 Such analysis is not our expertise nor isit a focus of our work, but other researchers do addressissues like these. A future challenge for us is to developrepresentations of the uncertainty and reliability of docu-ment information.

Amplifying Jigsaw’s support for collaborative anal-ysis is another potential future direction. Our experienceworking with the system has shown that two analystscan work effectively together at one local machine ifsufficient screen space is available, for example, as existsin our lab configuration shown in Figure 1. If more

than two analysts collaborate, even this configurationbecomes too restrictive as the bottleneck of only onekeyboard and mouse surfaces. Other information visual-ization researchers have proposed different approachesfor scenarios like these. One involves a multi-touch tabletop computer where multiple people can interact withthe display at the same time.44 This approach has thedisadvantage that the types of interactions presentlysupported are restricted. An alternative approach wouldbe to use a large, hi-resolution wall-style display45 withpersonal interaction devices for each analyst. Further-more, Jigsaw could support distributed synchronouscollaboration so that multiple analysts at different loca-tions could work together on an investigation. Jigsaw’ssystem architecture and event passing actually makes thistype of collaborative scenario relatively feasible withoutextensive work.

Trial exposure to and use of the system by professionalanalysts uncovered a desire for better review capabilitiesof the investigation state and history. Because investiga-tions often involve a large amount of data and can becarried out over weeks, months, or even years, it is crucialfor analysts to know what part of the data they alreadyhave or have not considered and when data first becameavailable. For instance, analysts may have elaborated ahypothesis but then new evidence is found that refutesthis hypothesis. Similarly, a particular angle of investiga-tion may not prove fruitful, so the analyst needs to backup and resume exploration from some earlier state. To beable to quickly develop an alternate model or view, it iscrucial for analysts to know what data they had alreadyconsidered, what data led to that hypothesis, what otherhypotheses were already considered and discarded, whatleads are still open, and what data has not yet been consid-ered at all.

The need for better tools to augment the process ofdocumenting and presenting the results of analysis, so-called production, presentation, and dissemination, hasbeen identified.5 Views in Jigsaw can be bookmarkedand linked in evidence marshalling, but better support forannotation to provide visual summaries of the evidenceused to reach actionable conclusions is needed.

Finally, one could use capabilities like those in Jigsawas a general-purpose means to explore search engineresults. An environment incorporating a powerful searchengine and Jigsaw’s exploration facilities could be ahelpful sense-making aid.

ConclusionEvery day investigative analysts are faced with the chal-lenging task of assessing and making sense of largebodies of information. Technological aids that promotedata exploration and augment investigators’ analyt-ical reasoning capabilities hold promise as one way ofassisting analysis activities.5,31,41 In a workshop of intel-ligence analysis professionals,46 working groups gener-ated a list of the top 10 needs for intelligence analysis tool

Information Visualization

Page 14: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al131

development. One item was ‘Dynamic Data Processingand Visualization’ that was further elaborated as follows:

‘Solutions are needed that transcend what is typicallydescribed as ‘visualization’ -- in contrast to a predominantly‘passive’ relationship between the system that displayscomplex visualizations and the analyst who still must digestand interpret them. What is needed is a much more interac-tive and dynamic relationship in which the analyst is betterable to explore the information within the visualization.’

Herein we present Jigsaw, a system designed to assistanalysts with foraging and sense-making activities acrosscollections of textual reports in just this manner. Jigsawpresents a suite of views that highlight connectionsbetween entities within the reports. Through interactiveexploration, analysts are able to browse the entities andconnections to help form mental models about the plansand activities suggested by the report data.Jigsaw is not a substitute for careful analysis of the

reports, however. Instead, it acts as a visual index thatpresents entity relations and links in forms that aremore easily perceived, thus suggesting relevant reportsto examine next. Other systems sometimes put toomuch information into a single complex view, withthe result that though information may be present, itis harder to discern and is much less flexible from theanalyst’s viewpoint. Our approach hinges on multiple,easy-to-understand views with simple, clear interactions.In creating the visualizations we leveraged well-knownrepresentations from the field of information visualiza-tion and augmented them with interactive operationsuseful for showing connections between entities.Jigsaw provides four main contributions to the

growing area of visual analytics for investigative analysis:

• A suite of interconnected, cooperating, interactive visu-alizations that present multiple perspectives on a docu-ment collection and, in particular, connections betweenentities across the collection.

• Advocacy and illustration of the value of large pixelspace approaches to investigative analysis wheremultiple computer displays enable richer portrayals ofthe documents and data being examined.

• A fundamentally simple user interaction modelin views (single-click is select and double-click isexpand/collapse) in order to make the system lesscomplex for analysts to use.

• Synthesis of all these capabilities into a working systemthat has already exhibited investigative value and thatholds potential for further use by professional analysts.

AcknowledgementsThis research has been supported in part by the NationalScience Foundation via Award IIS-0414667 and the NationalVisualization and Analytics Center (NVACTM), a U.S. Depart-ment of Homeland Security Program, under the auspices ofthe Southeast Regional Visualization and Analytics Center.

Carsten Görg also has been supported by a fellowship withinthe Postdoc-Program of the German Academic ExchangeService (DAAD). Past and present Jigsaw team membersMeekal Bajaj, Vasilios Pantazopoulos, Neel Parekh, KanupriyaSinghal, Gennadiy Stepanov, and Sarah Williams have madehelpful contributions to the implementation and evaluationof the system.

References1 Russell DM, Stefik MJ, Pirolli P, Card SK. The cost structure of

sensemaking. CHI ’93: Proceedings of the INTERACT ’93 and CHI ’93Conference on Human Factors in Computing Systems (Amsterdam,Netherlands) ACM: New York, NY, USA, 1993; 269–276.

2 Klein G, Moon B, Hoffman R. Making sense of sensemaking 1:alternative perspectives. IEEE Intelligent Systems 2006; 21: 70–73.

3 Norman D. Visual representations. In Things That Make Us Smart:Defending Human Attributes in the Age of the Machine. Addison-Wesley: Reading, MA, 1994.

4 Card SK, Mackinlay J, Shneiderman B. Readings in InformationVisualization: Using Vision to Think. Morgan Kaufmann: Los Altos,CA, 1999.

5 Thomas JJ, Cook KA. Illuminating the Path. IEEE Computer Society:Chicago, 2005.

6 Wong PC, Thomas J. Visual analytics. IEEE Computer Graphics andApplications 2004; 24: 20–21.

7 Pirolli P, Card S. Sensemaking processes of intelligence analystsand possible leverage points as identified through cognitivetask analysis. Proceedings of the 2005 International Conference onIntelligence Analysis (McLean, VA). https://analysis.mitre.org/proceedings/index.html, May 2005.

8 Heuer R. Psychology of Intelligence Analysis. Center for the Study ofIntelligence, Central Intelligence Agency: Washington, DC, 1999.

9 Johnston R. Analytic Culture in the United States IntelligenceCommunity: An Ethnographic Study. Central Intelligence AgencyWashington, DC, 2005.

10 Amar R, Eagan J, Stasko J. Low-level components of analyticactivity in information visualization. In Proceedings of IEEESymposium on Information Visualization (InfoVis ’05) (Minneapolis,MN), IEEE Computer Society Press: Siver Spring, MD, October2005; 111–117.

11 Amar R, Stasko J. Knowledge precepts for design and evaluationof information visualizations. IEEE Transactions on Visualizationand Computer Graphics 2005; 11: 432–442.

12 Yi JS, Kang YA, Stasko J, Jacko J. Toward a deeper understandingof the role of interaction in information visualization.IEEE Transactions on Visualization and Computer Graphics 2007; 13:1224–1231.

13 Stasko J, Görg C, Liu Z, Singhal K. Supporting investigativeanalysis through interactive visualization. Proceedings of IEEESymposium on Visual Analytics, Science, and Technology (VAST’07)(Sacramento, CA), IEEE Computer Society Press: Silver Spring,MD, 2007; 131–138.

14 Jigsaw website. http://www.cc.gatech.edu/gvu/ii/jigsaw (accessed21 March 2008).

15 Chen H, Zeng D, Atabaksh H, Wyzga W, Schroder J. Coplink:managing law enforcement data and knowledge. Communicationsof the ACM 2003; 46: 28–34.

16 IEEE VAST 2007 CONTEST. http://www.cs.umd.edu/hcil/VASTcontest07 (accessed 21 March 2008).

17 Wise JA, Thomas JJ, Pennock K, Lantrip D, Pottier M, Schur A,Crow V. Visualizing the non-visual: spatial analysis andinteraction with information from text documents. Proceedings ofIEEE Symposium on Information Visualization (InfoVis’95) (Atlanta,GA), IEEE Computer Society Press: Silver Spring, MD, October1995; 51–58.

18 Chau M, Xu J, Chen H. Extracting meaningful entities from policenarrative reports. 2nd National Conference on Digital GovernmentResearch (Los Angeles, CA), 2002.

Information Visualization

Page 15: Jigsaw: supporting investigative analysis through ...stasko/papers/iv08-jigsaw.pdf · Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually

Jigsaw John Stasko et al132

19 Jackson P, Moulinier I. Natural Language Processing for OnlineApplications: Text Retrieval, Extraction, and Categorization. JohnBenjamins: Amsterdam, Netherlands, 2002.

20 GATE – General Architecture for Text Engineering. http://gate.ac.uk/ (accessed 21 March 2008).

21 Balie. http://balie.sourceforge.net/ (accessed 21 March 2008).22 FreeLing. http://garraf.epsevg.upc.es/freeling/ (accessed 21 March

2008).23 Connexor. http://www.connexor.eu/ (accessed 21 March 2008).24 Cicero. Language Computer Corp. http://www.language

computer.com/ (accessed 21 March 2008).25 Sanfilippo A, Franklin L, Tratz S, Danielson G, Mileson N,

Riensche R, McGrath L. Automating frame analysis. FirstInternational Workshop on Social Computing, Behavioral Modeling,and Prediction (Phoenix, AZ), Springer Science+Business Media:New York, NY, April 2008; 239–248.

26 Hughes FJ. Discovery, proof, choice: the art and science of theprocess of intelligence analysis, case study 6, ‘All Fall Down’.Unpublished report, 2005.

27 Ahlberg C, Shneiderman B. Visual information seeking: tightcoupling of dynamic query filters with starfield displays.Proceedings of ACM CHI ’94, (Boston, MA), ACM: New York, NY,April 1994; 313–317.

28 Bier EA, Ishak EW, Chi E. Entity workspace: an evidence filethat aids memory, inference, and reading. Proceedings of theIEEE International Conference on Intelligence and Security Analysis(ISI ’06), (San Diego, CA), Springer-Verlag: Berlin, May 2006;466–472.

29 Chappell AR, Cowell AJ, Thurman DA, Thomson JR. Supportingmutual understanding in a visual dialogue between analystand computer. Proceedings of Human Factors and ErgonomicSociety 48th Annual Meeting (New Orleans, LA), Human Factorsand Ergonomics Society: Santa Monica, CA, September 2004;376–380.

30 Sanfilippo A, Baddeley B, Cowell AJ, Gregory ML, Hohimer R,Tratz S. Building a human information discourse interface touncover scenario content. Proceedings of the 2005 InternationalConference on Intelligence Analysis (McLean, VA), https://analysis.mitre.org/proceedings/index.html, May 2005.

31 Wright W, Schroh D, Proulx P, Skaburskis A, Cort B. Thesandbox for analysis: Concepts and methods. Proceedings of ACMCHI ’06 (Montreal, Quebec), ACM: New York, NY, April 2006;801–810.

32 Görg C, Liu Z, Parekh N, Singhal K, Stasko J. Jigsaw meetsBlue Iguanodon – The VAST 2007 Contest. Proceedings of IEEESymposium on Visual Analytics Science and Technology (VAST ’07)(Sacramento, CA), IEEE Computer Press: Silver Spring, MD,October2007; 201–202.

33 Plaisant C, Grinstein G, Scholtz J, Whiting M, O’Connell T,Laskowski S, Chien L, Wright TAW, Görg C, Liu Z, ParekhN, Singhal K, Stasko J. Evaluating visual analytics: the 2007visual analytics science and technology symposium contest. IEEEComputer Graphics and Applications 2008; 28: 12–21.

34 Lee B, Czerwinski M, Robertson G, Bederson BB. Understandingresearch trends in conferences using PaperLens. CHI ’05: CHI’05 Extended Abstracts on Human Factors in Computing Systems,Proceedings of ACM CHI ’05 (Portland, OR), ACM: New York, NY,2005; 1969–1972.

35 Stasko J, Görg C, Liu Z. Sensemaking across text documents:human-centered, visual exploration with Jigsaw. SensemakingWorkshop @ CHI 2008 (Florence, Italy), http://dmrussell.googlepages.com/sensemakingworkshopchi2008, April 2008.

36 i2 – Analyst’s Notebook. http://www.i2inc.com/ (accessed 21March 2008).

37 Kapler T, Wright W. Geotime information visualization.Information Visualization 2005; 4: 136–146.

38 Jonker D, Wright W, Schroh D, Proulx P, Cort B. Informationtriage with TRIST. Proceedings of the 2005 InternationalConference on Intelligence Analysis (McLean, VA) https://analysis.mitre.org/proceedings/index.html, May 2005.

39 Xu J, Chen H. Criminal network analysis and visualization.Communications of the ACM 2005; 48: 101–107.

40 Hetzler E, Turner A. Analysis experiences using information visuali-zation. IEEE Computer Graphics and Applications 2004; 24: 22–26.

41 Card SK. Leverage points and tools for aiding intelligence analysis.Unpublished report presented at the 2007 Human–ComputerInteraction Consortium, 2007.

42 Sanfilippo A, Baddeley B, Posse C, Whitney P. A layered dempster-shafer approach to scenario construction and analysis. Proceedingsof the IEEE International Conference on Intelligence and SecurityAnalysis (ISI ’07) (New Brunswick, NJ), Springer-Verlag: Berlin,May 2007; 95–102.

43 Finger R, Bisantz A. Utilizing graphical formats to conveyuncertainty in a decision-making task. Theoretical Issues inErgonomic Science 2002; 39: 1–25.

44 Isenberg P, Carpendale S. Interactive tree comparison for co-located collaborative information visualization. IEEE Transactionson Visualization and Computer Graphics 2007; 13: 1232–1239.

45 Ball R, North C, Bowman D. Move to improve: promotingphysical navigation to increase user performance with largedisplays. Proceedings of ACM CHI ’07 (San Jose, CA), ACM: NewYork, NY, April 2007; 191–200.

46 Badalamente RV, Greitzer FL. Top ten needs for intelligenceanalysis tool development. Proceedings of the 2005 InternationalConference on Intelligence Analysis (McLean, VA), https://analysis.mitre.org/proceedings/index.html, May 2008.

Information Visualization