Gary Marchionini, PhD University of North Carolina at Chapel Hill ils.unc/~march

Practice and Theory in Digital Libraries: The Case of Open

Video

Libraries in the Digital Age (LIDA05)Dubrovnik, Croatia

Gary Marchionini, PhDUniversity of North Carolina at Chapel Hill

www.ils.unc.edu/[email protected]

May 30, 2005

Gary Marchionini, UNC-CH LIDA 2005

Outline

• Digital Libraries as phenomena• Multimedia and video challenge our

text biases• Open Video concepts and system

Moebius– User studies

• Conclusion


Pragmatics

• Useful theory and practice are a Moebius strip• DL practice in informed by multiple theories

related to:– Information structure– Human behavior– System design– Social-political-economic constraints and organizational

behavior– History and epistemology

• “We want principles, not only developed—the work of the closet—but applied, which is the work of life.” Horace Mann, Thoughts, 1867


Theories of What and Why

• Digital extensions of physical libraries

• Augmentations of intellect• Collaborative spaces: sharium• Cultural institutions• World Brain• Economic models• Complex information systems


Theories of How

• Reuse and open source information• Levels of abstraction• Information retrieval• Information interaction• Iterative design and evaluation• Resource management


Digital Library Design Space1999: What Has Changed in 2005?

Technology

ContentServices

Community

Adapted from Marchionini & Fox, IP&M, 1999


Provocation: Text no longer rules:

• The Net generation depends much less of reading (they are entering universities as students and soon, as professors; Oblinger & Oblinger, 2005 Educause book). In the US:– Children age 6 or younger: average of 2 hrs/day using screen

media, 1.6 hrs/day playing outside, 39 min. reading– 13-17 yr olds: average 3.1 hrs/day watching TV and 3.5

hrs/day with digital media. They multitask– >2M million US children (ages 6–17) have their own Web site.

Girls are more likely to have a Web site than boys (12.2 percent versus 8.6 percent).

– Ability to use nontext expression—audio, video, graphics—appears stronger in each successive cohort.

• Multimedia and Multitasking the trend of 21st century• Information specialists MUST get over our text bias


Open Video DL Case

• Open– Public good– Reusable

• Files not streams• Chunking

• Agile views user interface– Alternative representations (views)– Agile control mechanisms


Open Video Vision/Contributions• An open repository of video files that can be re-used in a

variety of ways by the education and research communities– Encourages contributions– A testbed for interactive interfaces

• An easy to use DL based upon the agile views interface design framework– Multiple, cascading, easy to control views (pre, over, re,

shared, peripheral)– Views based upon empirically validated surrogates– An environment for building theory of human information

interaction• A set of methods and metrics that reveal how people

understand digital video through surrogates


Background & Status• Begun 1995 with colleagues at UMD & BCPS• Funding: NSF, NASA, NSF/LoC• Collaborators/Contributors: I2-DSI, ibiblio, CMU,

UMD, NIST, Prelinger and Internet Archives, NASA, ACM

• ~2600 video segments• ~2000 different titles • ~15000 unique visitors per month• MPEG-1, MPEG-2, MPEG-4, QT• OAI provider• Ongoing user studies • New Preservation initiative


Agile Views Interface Research

• Provide a variety of access representations (e.g., indexes) and control mechanisms

• Usual search and browse capabilities• Leverage both visual and linguistic

cues• Create and test surrogates for

overview preview, shared and history views


User Study Framework

GOALSlearning, work, entertainment

TIMEtime spent searching and viewing results

MENTAL LOADperceptual loadcognitive load

PHYSICAL LOADamount of muscle

movement

EFFORT TASKS

VIDEO CHARACTERISTICS

INDIVIDUAL CHARACTERISTICS

SURROGATES, AGILE VIEWS

PERFORMANCEretrieval (precision, recall)

recognition (objects, action)gist comprehension

(linguistic, visual)

SATISFACTIONperceived usefulnessperceived ease of use

flowuser satisfaction

OUTCOMES

domain experiencevideo experiencecultural experiencecomputer experienceinfo seeking experiencemetacognitive abilitiesdemographicsdisplay controls

keywordsstoryboard w/ text, audioslide show w/ text, audiofast forward w/ audioposter frames

select video for viewingselect scene for viewingcopy and use scenescopy and use framesother tasks?

genre: documentary, narrativetopic: literal, figurativestyle: visual, audio, textual, place


The Surrogates• Storyboard with text keywords (20-36 per board@

500 ms)• Storyboard with audio keywords• Slide show with text keywords (250ms repeated

once)• Slide show with audio keywords• Fast forward (~ 4X)• Fast forwards 32X, 64X, 128X, 256X• Poster frames• Real time clips• Text titles


Surrogate ExamplesType of surrogate Examples

Text surrogate Title, keyword, description, etc.

Still image surrogate Poster frame, storyboard/filmstrip, slide show, video stream, key-frame-based table of contents, etc.

Moving image surrogate Skim, fast forward, etc.

Audio surrogate Spoken keywords, environmental sounds, music, etc.

Mutlimodal surrogate Text surrogate + still image surrogate, still image surrogate + audio surrogate, etc.


MetricsText gist Still image Action

Recognition Object recognition (text) Object recognition (graphical) Action recognition

Inference Gist determination (free text)Gist determination (multiple-

choice)

Visual gist (vist) determination


User Studies• Study 1: Qualitative Comparison of Surrogates

(ECDL 02)• Study 2: Fast Forwards (JCDL 03)• Study 3: Narrativity (CHI 02; ASIST 03 paper)• Study 4: Shared views and History Views (Geisler

dissertation)• Study 4: Poster frames and text (eye tracking, CIVR

03)• Study 5: TREC evaluations (03 and 04)• Study 6: cognitive load and ISEE (Mu diss.)• Study 7: relevance judgments for video (Yang diss.)• Study 8: Surrogate integration study (in analysis)• Others: several specific master’s papers (Hughes,

Gruss


Study 1: Compare Surrogates

• What are the strengths and weaknesses of different surrogates from the users’ perspective?

• Are any of the surrogates better than the others in supporting user performance?


The Surrogates

• Storyboard with text keywords (20-36 per board@ 500 ms)

• Storyboard with audio keywords• Slide show with text keywords

(250ms repeated once)• Slide show with audio keywords• Fast forward (~ 4X)


Method

• 7 video segments (2-10 min), 5 surrogates created for each

• 10 subjects with high video and computer experience

• Three phases (all multi-camera videotaped)– View full video then use 3 surrogates, repeat

• Participant observation and debriefing– Do NOT view full video, use 3 surrogates, repeat

• Participant observation and debriefing– Complete 3 assigned tasks with surrogates of choice

• Think aloud and debriefing• http://www.open-video.org/experiments/chi-2002/methods/study1.mov

http://www.open-video.org/experiments/chi-2002/methods/study1.mov


Tasks

• Gist determination—free text• Gist determination—multiple choice• Object recognition—textual• Object recognition—graphical• Action recognition (2-3 second clips)• Visual gist (predict which frames

belong)– http://www.open-video.org/experiments/chi-2002/surrogates/in

dex.html

http://www.open-video.org/experiments/chi-2002/surrogates/index.html

http://www.open-video.org/experiments/chi-2002/surrogates/index.html


Preferences

• In debriefing after each phase, subjects asked about preferences.

• Some preferences changed over the phases

• 2 subjects preferred ff• 4 subjects said ff if audio keywords added• 1 storyboard with audio keywords• 2 slide show with audio keywords drop ss with text keywords, develop ff


Performance

• No SRD on gist (both free text and multiple choice)

• SRD on action recognition favoring ff• ‘Near’ SRD on text object recognition favoring

SB/w audio keywords• 8:1 to 29:1 compaction rates suitable for tasks• Psychometric and face validity support for the

tasks (means and variances; relevant to real tasks)

• SRD in gist and visual gist for one video Homogeneity of frames diminishes surrogate value Keywords help when visual variability decreases


Qualitative Results

• Subjects suggested different surrogates for different tasks (e.g., ff for judging kid safe, sb for identifying images, ff for video styles)

• Three senses of gist– Topic (T)– Narrativity (N)– T+N+visual style

• Individual preferences and experiences influence surrogate effectiveness


Study 2: Fast Forward

• How fast can we make fast forwards?– 4 ff conditions (32X, 64X, 128X, 256X)– Four video segments for each condition– 45 subjects (1/2 UG, 1/2 grad, 2/3 female)– 6 tasks (full text gist, multiple choice gist, word

object recognition, graphical object recognition, action recognition, visual gist)

– Counterbalance speed and videos– Web-driven experimental condition, 3-camera

video tapes, single subject at a time in usability laboratory


Example Image Recognition Stimulus


Results

• SRD on 4 of 6 tasks as speed increases, however, reasonable performance at even the highest rate

• Video content/genre interacts with performance• Preference does not parallel performance (people

can perform well under extreme conditions but do not like/enjoy)

• No user characteristic differences (age, sex)Give users control but select appropriate

defaults• Caveat: controlled, independent focus on FF,

likely a lower bound on performance


Speed Effects on Performance

0

1

2

3

4

5

6

7

8

9

10

11

12

32 64 128 256

Surrogate speed

Mea

n s

core Gist comp (ft)

Visual gist

Object rec (g)

Action rec

Visual gist at 32is better than atother speeds

Object recognition (g) at 32and 64 is better than at 256

Gist comprehension (ft)at 32 and 64 is betterthan at 128 and 256

Action recognition at 32 isbetter than at 128 or 256.


Narrativity Study

• CHI walk up kiosk, 20 people used• 20 one-minute clips ( half b&w, no

audio) selected on 2 criteria: contain characters, have cause/effect relations between scenes (5 in each category)

• SRD on chars, cause, and interaction


Shared Views and History Views Studies

• Evaluate AV Design Framework by instantiating and evaluating a design

• Shared (based on recommendations) and History Views (based on logs)

• Phase 1: compare OV to Views interface (28 participants). OV>accuracy; NSRD on time, but learning effect; AV>navigation/efficiency; AV>satisfaction

• Phase 2: qualitative analysis of shared and history views


Poster Frame Study

• Research Questions: – Given both textual and visual metadata;

which surrogate will be utilized, which surrogate will be preferred?

– Does the placement of the surrogates affect how they are used?

– Does the assigned task affect how surrogates are used?

– Does personal preference play a role in how surrogates are used?


Study Methods / Procedures

• 12 undergraduate students (paid volunteers)• Pre-Study questionnaire

– Demographics– Visual vs. Verbal learning style (VVQ)

• 10 search problems– Counter-balanced

• Design 1 and 2 – 1 : text on left / visuals on right– 2 : visuals on left / text on right

• Eyetracking• Post-study questionnaire

– Follow up questions


Results

• All participants over all tasks:

– Mean time looking at text = 29.7 sec.– Mean time looking at pics = 6.8 sec.

– 75% of fixations over text– 18% of fixations over pics

– First fixations over text = 65– First fixations over pics = 54

• Text requires and gets more user attention


Results cont’d

• Design 1 vs. Design 2– When text was placed on the left, mean time

per fixation was slightly higher

• VVQ– Balanced group spent more time looking at

text

• Tasks – Varied by task:

• Time spent looking at text • Time spent per fixation over text • Frequency of fixations over text


Screen Shots


Screen Shots


Screen Shots


Tasks

• Please find a video that discusses the destruction earthquakes can do to buildings. These search results are from a search on the word “Earthquake”.

• Please find a video that discusses nurses and their contributions to the United States Army. These search results are from a search on the word “Work”.

• Please choose a video from the following list that you think would be entertaining for you and your friends to watch.


Discussion

• In this restricted situation (i.e. pre-formulated results page) participants used text as the main anchor point– ? Because text is a better surrogate?– ? Because text contains more

information?– ? Because text is more familiar to

people – ? Because tasks directed users to text?


Discussion cont’d

• Layout seemed to have little effect on how surrogates were used. – Difference of .03 of a second– Participants didn’t report a significant

preference for layout• Some liked design 1 and some liked design 2

• VVQ– Hypothesis that visual learners would use

visual surrogates and verbal learners would use verbal surrogates was not supported


Discussion cont’d

• Tasks– Some tasks took more time to complete

• Regardless of: – Counterbalancing order– Participant – Layout design


Text or Pictures?

• Text was reported as:+ Being the search anchor+ Containing significant topical information – Taking longer to read than pictures

• Visuals were reported as: + Being globally liked+ Being used to quickly narrow down choices + Taking less time to decode than text+ All participants said the results page would be

weaker without them– Often lacking in reference points


Conclusion

• Visual metadata was used to make (confirm???) relevance judgments

• Combination of visual & verbal stronger than one or the other

• Generalize with caution:– Small number of study participants– Specific set of search results pages– Ten specific search tasks.


The Integration Study

• Compare old OV to redesign? Compare to Internet archive?

• How do multiple surrogates and agile control mechanisms affect understanding of video?

• Accuracy? Time? Satisfaction? Cognitive load? Navigational overhead?

• Data analysis underway


Relevance Study (Yang)

• 3 task groups (illustration [10 profs], collection building [8 video librarians], video production [8 producers/editors])

• In-depth interviews• Text, audiovisual, implicit categories of 39

different criteria– Topicality most often mentioned, but far less

than text studies– Production groups less varied, more

audiovisual criteria


Theory-Practice Lessons from OV

• User-centered design and user testing pays off, i.e. research informs practice

• Production system operation raises new kinds of research questions– Sustainability models– Curatorial models– Preservation challenges– Upgrade paths for universal access


DL Research Directions

• Incorporating people into DLs (patrons, librarians)

• Leveraging contributions and implications for curatorship

• Preservation strategies; how much context?

• Hybrid physical-digital library operations


Observations

• A moebius strip is infinite: the interplay between theory and practice goes on

• Need for collaboration between working libraries and researchers

Selected Open Video Readings• Yang, M. & Marchionini, G. (2005). “Deciphering visual gist and its implications for video retrieval and interface design.”

Conference on Human Factors in Computing Systems (CHI). Portland, OR. Apr. 2-7, 2005.• Yang, M. & Marchionini, G. (2004). “Exploring Users' Video Relevance Criteria -- A Pilot Study.” Proceedings of the Annual

Meeting of the American Society of Information Science and Technology, pp. 229-238. Nov. 12-17, 2004. Providence, RI.• Yang, M., Wildemuth, B., & Marchionini, G. (2004). “

The relative effectiveness of concept-based versus content-based video retrieval.” Proceedings of the ACM Multimedia conference, pp. 368-371.

• Mu, X., & Marchionini, G. (2003). “ Enriched video semantic metadata: authorization, integration, and presentation.” Proceedings of the Annual Meeting of the American Society for Information Science and Technology, 40, 316-322.

• Wilkens, T., Hughes, A., Wildemuth, B. M., & Marchionini, G. (2003). “ The role of narrative in understanding digital video: an exploratory analysis.” Proceedings of the Annual Meeting of the American Society for Information Science, 40, 323-329.

• Hughes, A., Wilkens, T., Wildemuth, B., Marchionini, G. (2003). “Text or Pictures? An Eyetracking Study of How People View Digital Video Surrogates.” Proceedings of CIVR 2003, pp. 271-280.

• Wildemuth, B. M., Marchionini, G., Yang, M., Geisler, G., Wilkens, T., Hughes, A., and Gruss, R. (2003). “How Fast Is Too Fast? Evaluating Fast Forward Surrogates for Digital Video.” Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2003), pp. 221-230. (Vannevar Bush Award Winner for Best Paper at JCDL 2003)

• Mu, X., Marchionini, G., & Pattee, A. (2003). “ The Interactive Shared Educational Environment: User interface, system architecture and field study.” Proceedings of the Annual Meeting of the American Society for Information Science and Technology, 40, 291-300.

• Mu, X., Marchionini, G. (2003) “Statistical Visual Features Indexes in Video Retrieval.” Proceedings of SIGIR 2003, pp. 395-396.• Marchionini, Gary (2003). “Video and Learning Redux: New Capabilities for Practical Use.” Educational Technology.• Marchionini, Gary and Geisler, Gary. (2002). “The Open Video Digital Library.” D-Lib Magazine, Vol. 8, Number 12, December.• Barbara M. Wildemuth, Gary Marchionini, Todd Wilkens, Meng Yang, Gary Geisler, Beth Fowler, Anthony Hughes, and

Xiangming Mu (2002). “Alternative Surrogates for Video Objects in a Digital Library: Users� Perspectives on Their Relative Usability.” Proceedings of the 6th European Conference on Digital Libraries, September 16 - 18, 2002, Rome, Italy.

• Geisler, G., Marchionini, G., Wildemuth, B. M., Hughes, A., Yang, M., Wilkens, T., and Spinks, R. (2002). “Video Browsing Interfaces for the Open Video Project.” Proceedings of CHI 2002, Extended Abstracts.

• Nelson, Michael L., Marchionini, Gary, Geisler, Gary, and Yang, Meng (2001). "A Bucket Architecture for the Open Video Project [short paper]." JCDL ’01, ACM - IEEE Joint Conference on Digital Libraries (June 24-28, 2001, Roanoke, Virginia).

• Geisler, Gary, and Gary Marchionini (2000). "The Open Video Project: A Research-Oriented Digital Video Repository [short paper]." In Digital Libraries '00: The Fifth ACM Conference on Digital Libraries (June 2-7 2000, San Antonio, TX). New York: Association for Computing Machinery, 258-259.

• Slaughter, L., Marchionini, G. and Geisler, G. (2000). "Open Video: A Framework for a Test Collection." Journal of Network and Computer Applications, Vol. 23(3). San Diego: Academic Press.

http://www.open-video.org/papers/MengYang_050205_CHI.pdf

http://www.open-video.org/papers/MengYang_ASIST040517.pdf

http://www.open-video.org/papers/MengYang_ACMMM_040718.pdf

http://www.ils.unc.edu/%7Emux/publications/asist03.pdf

http://www.open-video.org/papers/Wilkens_Asist_2003.pdf

Documents

Gary Marchionini, PhD University of North Carolina at Chapel Hill ils.unc/~march