17
Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July 7, 2008 SAND2008-4470P Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Embed Size (px)

Citation preview

Page 1: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Coupling Informatics Algorithm Development

and Visual Analysis

Danny Dunlavy, Pat Crossno, Tim SheadSandia National Laboratories

SIAM Annual MeetingJuly 7, 2008

SAND2008-4470P

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Page 2: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

<DOC><DOCNO><s docid="APW19990519.0113" num="1" stype="-1">APW19990519.0113</s></DOCNO><DATE_TIME><P><s docid="APW19990519.0113" num="2" stype="-1"> 1999-05-19 21:11:17</s> </P></DATE_TIME><BODY><CATEGORY><s docid="APW19990519.0113" num="3"stype="-1"> usa</s></CATEGORY><HEADLINE><P><s docid="APW19990519.0113" num="4" stype="0"> Pulses May Ease SchizophrenicVoices</s> </P></HEADLINE><TEXT><P><s docid="APW19990519.0113" num="5" stype="1"> WASHINGTON (AP)Schizophrenia patients whose medication couldn't stop the imaginary voices in theirheads gained some relief after researchers repeatedly sent a magnetic field into asmall area of their brains.</s> </P><P><s docid="APW19990519.0113" num="6"stype="1"> About half of 12 patients studied said their hallucinations becamemuch less severe after the treatment, which feels like ``having a woodpeckerknock on your head'' once a second for up to 16 minutes, said researcherDr.Ralph Hoffman.</s> <s docid="APW19990519.0113" num="7" stype="1">The voices stopped completely in three of these patients.</s> </P><P><sdocid="APW19990519.0113" num="8" stype="1"> The effect lasted for up toa few days for most participants, and one man reported that it lasted seven weeksafterbeing treated daily for four days.</s> </P><P><sdocid="APW19990519.0113" num="9" stype="1"> Hoffman stressed that thestudy is only preliminary and can't prove that the treatment would be useful.</s> <sdocid="APW19990519.0113" num="10" stype="1"> ``We need to do much moreresearch on this,'' he said in an interview.</s> </P><P><sdocid="APW19990519.0113" num="11" stype="1"> Hoffman, deputy medicaldirector of the Yale Psychiatric Institute, is scheduled to present the workThursday at the annual meeting of the American Psychiatric Association.</s> </P><P><s docid="APW19990519.0113" num="12" stype="1"> Not all people withschizophrenia hear voices, and of those who do, Hoffman estimated that maybe 25percent can't control them with medications even when other disease symptomsabate.</s> <s docid="APW19990519.0113" num="13" stype="1"> So the workcould pay off for ``a small but very ill group of patients,'' he said.</s> </P><P><sdocid="APW19990519.0113" num="14" stype="1"> The treatment is calledtranscranial magnetic stimulation, or TMS.</s> <s docid="APW19990519.0113"num="15" stype="1"> While past research indicates it mightbe helpful in lifting depression, it hasn't been studied much in schizophrenia.</s> </P><P><s docid="APW19990519.0113" num="16" stype="1"> In TMS, anelectromagnetic coil is placed on the scalp and current is turned on and off to create apulsing magnetic field that reaches into a small area of the brain.</s> <sdocid="APW19990519.0113" num="17" stype="1"> The goal is to make brain cellsunderneath the coil fire messages to adjoining cells.</s> </P><P><sdocid="APW19990519.0113" num="18" stype="1"> The procedure is muchdifferent from electroconvulsive therapy, called ECT, which applies pulses ofelectricity rather than a magnetic field to the brain.</s> <sdocid="APW19990519.0113"num="19" stype="1"> Unlike TMS, ECT creates a briefseizure and is performed under general anesthesia.</s> <sdocid="APW19990519.0113" num="20" stype="1"> ECT is used most often fortreatingsevere depression.</s> </P><P><s docid="APW19990519.0113" num="21"stype="1"> In TMS, the magnetic pulses are thought to calm the affected partof the brain if they're given as slowly as once per second, Hoffman said.</s> <s docid="APW19990519.0113" num="22" stype="1"> He and colleagues targeted an area involved in understanding speech, above and behind the left ear, on the theory that hallucinated voices come from overactivity there.</s> </P><P><s docid="APW19990519.0113" num="23" stype="1"> The treatment can make scalp muscles muscle contract, leading tothe woodpecker feeling, he said, but patients could tolerate it.</s> <s docid="APW19990519.0113" num="24" stype="1">Headachewas the most common side effect, and there was no sign that the treatment affected the ability to understandspeech, he said.</s> </P><P><s docid="APW19990519.0113" num="25" stype="1"> To make sure the study resultsdidn'treflect just the psychological boost of getting a treatment, researchers gave sham and real treatments to each studyparticipant and studied the difference in how patients responded.</s> <s docid="APW19990519.0113" num="26" starting with

Page 3: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Latent Semantic Analysiste

rms

documentsd1 d2 dn

t2

t1

tm

…d3 d4

.

.

. Truncated SVD

Concept space

Information retrieval

Clustering

Doc & term relationships

Text corpus

low rank

approximation

term

s

concepts documents

con

cep

ts

singular values

Page 4: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Concept Space∆policy∆planning∆politics∆tomlinson∆1986oSport in Society: policy, Politics and Culture, ed A. Tomlinson (1990) oPolicy and Politics in Sport, PE and Leisure eds S. Fleming, M. Talbot and A. Tomlinson (1995) oPolicy and Planning (II), ed J. Wilkinson (1986) oPolicy and Planning (I), ed J. Wilkinson (1986) oLeisure: Politics, Planning and People, ed A. Tomlinson (1985)

∆parker∆lifestyles∆1989∆partoWork, Leisure and Lifestyles (Part 2), ed S. R. Parker (1989) oWork, Leisure and Lifestyles (Part 1), ed S. R. Parker (1989)

[Leisure Studies of America Data]

Page 5: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

• Document parsing, matrix creation and weighting• SVD:• Truncated SVD:• Query scores (query as new “doc”):• LSA Ranking:• Document similarities:• Term Similarities:• Similarity statistics

– Mean, standard deviation

ParaText™ Operations

(thresholded → sparse)

term

concepts documents

co

nc

ep

ts

singular values

T

DT

Page 6: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Document similarity matrix

Document similarity graph• Each document (or term, entity, etc.) is a vertex• Each row defines an edge

Document Similarity Graphsd

ocu

men

ts

concepts documents

con

cep

ts

singular values

threshold

sparse coordinate format

Page 7: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Statistics on edges– One graph: one-sample t statistic

– Two graphs: two-sample t statistic

Similarity Statistics

Graph 1 Graph 2

Page 8: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Doc Sim Graph Comparison

k = 10 k = 40

[DUC 2003, Task 2 Data: 297 documents, 30 manual clusters]

Page 9: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Layout Comparison

Force directed Simple 2D

[DUC 2003, Task 2 Data: 297 documents, 30 manual clusters]

Page 10: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Sparse Matrix View

Page 11: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Rank Comparison

k = 40k = 10

[DUC 2003, Task 2 Data: 297 documents, 30 manual clusters]

Page 12: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Matrix Differences

k = 40k = 10

[DUC 2003, Task 2 Data: 297 documents, 30 manual clusters]

Page 13: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Small Multiples

k = 10

k = 20

k = 30

k = 40

k = 40k = 20 k = 30 k = 50

[DUC 2003, Task 2 Data: 297 documents, 30 manual clusters]

Page 14: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

LSAView

Page 15: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

20 40 60 800

0.5

1

1.5

2

20 40 60 800

0.5

1

1.5

2

20 40 60 800

0.5

1

1.5

2

20 40 60 800

0.5

1

1.5

2

k

LSAView Impact

• Document similarities:

• Inner product view:

• Scaled inner product view:

• What is the best scaling for document similarity graph generation?

original scaling no scaling inverse sqrt inverse

[Leisure Studies of America Data]

Page 16: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Conclusions• LSAView

– Analysis and exploration of impact of informatics algorithms on end-user visual analysis of data

– Aids in discovery process of optimal algorithm parameters for given data and tasks

• Impact– Used in developing and understanding ParaText™ and LSALIB algorithms

• Future Work– Other graph-based metrics

• Diameter, cycles, vertex degree distribution, shortest cycle length, etc.

– Other Decompositions and algorithms• Incremental SVD, SDD, CUR, Clustering

– Other statistics/inference tests and visualization

– New problem domains

Page 17: Coupling Informatics Algorithm Development and Visual Analysis Danny Dunlavy, Pat Crossno, Tim Shead Sandia National Laboratories SIAM Annual Meeting July

Coupling Informatics Algorithm Development

and Visual Analysis

Danny Dunlavy

Email: [email protected]

URL: http://www.cs.sandia.gov/~dmdunla