20
DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY & QUALITY: COMPARING A-B-ARBITRATE AND PEER REVIEW

DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY QUALITY:

Embed Size (px)

DESCRIPTION

FAMILYSEARCH INDEXING

Citation preview

Page 1: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

D E R E K H A N S E N , J A K E G E H R I N G , PAT R I C K S C H O N E , A N D M AT T H E W R E I D

FAMILY HISTORY TECHNOLOGY WORKSHOPFEBRUARY 3, 2012

IMPROVING INDEXING EFFICIENCY & QUALITY:COMPARING A-B-ARBITRATE AND PEER REVIEW

Page 2: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

FAMILYSEARCH

Page 3: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

FAMILYSEARCH INDEXING

Page 4: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B-ARBITRATE PROCESS (A-B-ARB)

A

B

ARB

Page 5: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

THE PROBLEM

Page 6: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

OUR APPROACH•Historical Data Analysis•Field Experiment comparing quality control models

Page 7: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

HISTORICAL DATA ANALYSIS• Quality (estimated based on A-B agreement)• Measures difficulty more than actual quality• Underestimates quality, since an experienced Arbitrator

reviews all A-B disagreements• Good at capturing differences across people, fields, and

projects• Time (calculated using keystroke-logging data)• Idle time is tracked separately, making actual time

measurements more accurate• Outliers removed

Page 8: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY FIELD

Page 9: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY LANGUAGE

English Language• Given Name: 79.8• Surname: 66.4

French Language• Given Name: 62.7%• Surname: 48.8%

1871 Canadian Census

Page 10: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Birth Place: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 11: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Given Name: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 12: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Surname: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 13: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Gender: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 14: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCEU.S. - English Canada - English

Canada - FrenchMexico - Spanish

Page 15: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

TIME & KEYSTROKE BY EXPERIENCE

Page 16: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

TIME & KEYSTROKE OF ARB

Page 17: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A NEW APPROACH? (A-R-ARB)

• Peer review model• Efficiency ++•Quality ?

Page 18: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

PEER REVIEW PROCESS (A-R-ARB)

A R ARB

Already Filled In Optional?

Page 19: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

FIELD EXPERIMENT

• Develop Truth Set of 2,000 1930 Census images• Use historical A-B-ARB data• Create new A-R-ARB dataset by having

new indexers review and arbitrate• Compare quality & efficiency• Qualitatively identify types of errors

Page 20: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

DISCUSSIONIMPLICATIONS• Transition users from novice to expert• Recruit foreign language indexers• Intelligent matching based on expertise

(in A-B-ARB &/or A-R-ARB)

FUTURE POSSIBILITIES• Peer review by algorithms?• Initial indexing by algorithms?