Chi 2008 katsanos et al auto_cardsorter_final

Christos Katsanos | [email protected] Tselios | [email protected] Avouris | [email protected]

AutoCardSorter: Designing the Information Architecture of a Web Site Using Latent Semantic Analysis

ACM SIGCHI | Florence, Italy | 5-10 April, 2008

2

Purpose & Motivation

Automate Structural Design of Information Spaces Increase efficiency and flexibility for practitioners

3

Why it is important?

Structural design greatly affects user experience

Current approaches (e.g. Card Sorting) often neglected: Time constraints Cost to recruit users and run the studies Increased complexity for data analysis Challenging for large sites (>100 pages)

4

Our tool-based Methodology

Page Text Descriptions

Semantic Similarity Measure (e.g. LSA)

Hierarchical Clustering Algorithms

Interactive Tree Structure

Additional Support1. Number of Groups2. Cross-Hierarchy Links

Semantic Similarity Matrix

5

The tool Interface: AutoCardSorter

Page Descriptions

Analysis Options:- Semantic Similarity Measure- Clustering Algorithm Type

Interactive Dendrogram

Validation Study Design

6

Validation Study Design

vs

7

Card SortingAutoCardSorter

Investigate quality of results & efficiency Health & Nutrition Site Same content item descriptions 18 representative users

Measures & Analysis

8

P1 P2 P3 P4 P5

P1 -

P2 0.94 -

P3 0.11 0.33 -

P4 0.33 0.28 0.11 -

P5 0.50 0.83 0.06 0.06 -

P1 P2 P3 P4 P5

P1 -

P2 0.62 -

P3 0.21 0.14 -

P4 0.49 0.51 0.83 -

P5 0.61 0.11 0.21 0.92 -

ValiditySimilarity-Matrices Correlation

9

AutoCardSorter Card Sorting

LSA (P5,P1)Frequency Users placed in Same Pile P1 and P5

Validity % Agreement of Design

1) Hierarchical Cluster Analysis of Card Sorting Data

2) AutoCardSorter vs User-Data Dendrograma) Eigenvalue Analysis to ‘cut’ objectively

b) User structure => Ideal

c) In Agreement => Longer sequence of pages grouped together in the same category as Ideal

10

EfficiencyTotal Time Required

11

AutoCardSorter

Card Sorting

Study Results

12

Study Results - Validity

13

AutoCardSorter produced results of comparative quality with Card Sorting:

Similarity-Matrices Correlation = 0.80 (P<0.01)

% Agreement of Design = 100%

AutoCardSorter Card Sorting

Study Results - Efficiency

14

Discussion - Advantages

Increased efficiency (x27)

Reduces resources required

Explore alternative solutions early

Simple to learn and apply

Easy to apply for large sites (>100)

15

Possibility for wider adoption

16

Discussion – Current Limitations

Lack of qualitative feedback

No insight to category-labels

Future Research

More validation studies in different domains

Additional constraints (e.g. group size)

Improvements to algorithm

Dynamic semantic similarity algos (e.g. LSA IR)

Alternatives to Hierarchical Clustering (e.g.

Factor Analysis)

17

A Demo - Sit back and enjoy

18

Summary & Questions

Proposed an approach that automates structural design of an information space.

Validation study depicted substantial effectiveness gain, with similar results to a user-based technique

Cheap + Fast + Easy = Possibility for wider adoption

19

Complementary to user-based methods

Christos Katsanos | [email protected]

Extra Slides

20

More Validation StudiesDetails

21

Health & Nutrition

Educational Portal

Travel & Tourism Site

# of Participants in Card Sorting 18 26 34

# of Items to Sort 16 27 38

More Validation StudiesSummary of Results

22

Health & Nutrition

Educational Portal

Travel & Tourism Site

Similarity-Matrices r (p<0.01) 0.80 0.52 0.59

% Agreement of Design 100% 93% 87%

Efficiency (X Times Faster) 27 11 14

23

More Validation StudiesEfficiency

24

More Validation StudiesNumber of Proposed Categories

25

More Validation StudiesAvg. Items/Proposed Category

26

More Validation StudiesCorrelation against No of items

Statistical Semantic Similarity Measures - Overview

LSA: Latent Semantic Analysis (Landauer & Dumais, 1997) LSA-IR (Falconer et al, 2006) PLSA (Hofmann, 1999)

PMI: Point-wise Mutual Information (Manning & Schutze, 1999) PMI-IR (Turney, 2001) GLSA (Matveeva et al, 2005)

HAL: Hyperspace Analogue to Language (Lund & Burgess, 1996) COALS (Rhode et al, 2004)

27

Latent Semantic Analysis

Similar documents tend to have common words

1) Parse corpora representing users’ understanding skills

2) Calculate each word’s frequency of occurrence (TDM)

3) Weight by word’s importance (document, domain)

4) Apply Singular Value Decomposition

5) LSA Index = Cos(Angle of Document Vectors) => [-1,1]

29

Card SortingTypical Effort in person days

http://www.intranetleadership.com.au

Complementary to user-based methods => What’s the point?

Far better than doing nothing

User-based applied in a more focused & formative manner

Get insight before running a study

Fast redesigns30

Why 2 validation measures?

Similarity-matrices Correlation strictest approach (compares

measurements of semantic similarity) more general (does not presuppose

cluster analysis)

% Agreement of Design Less strict How close the ‘proposed’ designs are?

31