40
Keystroke Biometric Studi es Identification and Authentication on Long-Text Input Book chapter in Behavioral Biometrics for Human Identification (2009), edited by Liang Wang and Xin Geng Authors Charles C. Tappert, Mary Villani, and Sung-Hyuk Cha Summarizes keystroke biometric work 2005-2008 3 DPS dissertations 2 on identification, 1 on missing/incomplete data About 6 masters-level projects New material – authentication, longitudinal, touch-type model

Keystroke Biometric Studies Keystroke Biometric Identification and Authentication on Long-Text Input Book chapter in Behavioral Biometrics for Human Identification

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Keystroke Biometric Studies

Keystroke Biometric Identification and

Authentication on Long-Text Input Book chapter in

Behavioral Biometrics for Human Identification (2009), edited by Liang Wang and Xin Geng

Authors Charles C. Tappert, Mary Villani, and Sung-Hyuk Cha

Summarizes keystroke biometric work 2005-2008 3 DPS dissertations

2 on identification, 1 on missing/incomplete data About 6 masters-level projects

New material – authentication, longitudinal, touch-type model

Keystroke Biometric Studies

Major Chapter Sections

Introduction Keystroke Biometric System Experimental Design and Data

Collection Experimental Results Conclusions and Future Work

Keystroke Biometric Studies

IntroductionBuild a Case for Usefulness of Study Validate importance of study – applications Define keystroke biometric Appeal of keystroke over other biometrics Previous work on the keystroke biometric No direct study comparisons on same data Feature measurements Make case for using: data over the internet,

long text input, free (arbitrary) text input Extends previous work by authors Summary of scope and methodology Summary of paper organization

Keystroke Biometric Studies

Introduction Validate importance of study –

applications

Internet authentication application Authenticate (verify) student test-takers

Internet identification application Identify perpetrators of inappropriate email

Internet security for other applications Important as more businesses move toward

e-commerce

Keystroke Biometric Studies

Introduction Define Keystroke Biometric

The keystroke biometric is one of the less-studied behavioral biometrics

Based on the idea that typing patterns are unique to individuals and difficult to duplicate

Keystroke Biometric Studies

Introduction Appeal of Keystroke

Biometric

Not intrusive – data captured as users type Users type frequently for business/pleasure

Inexpensive – keyboards are common No special equipment necessary

Can continue to check ID with keystrokes after initial authentication As users continue to type

Keystroke Biometric Studies

Introduction Previous Work on Keystroke

Biometric

One early study goes back to typewriter input Identification versus authentication

Most studies were on authentication Two commercial products on hardening passwords

Few on identification (more difficult problem) Short versus long text input

Most studies used short input – passwords, names Few used long text input –copy or free text

Other keystroke problems studies One study detected fatigue, stress, etc. Another detected ID change via monitoring

Keystroke Biometric Studies

Introduction No Direct Study Comparisons on Same

Data

No comparisons on a standard data set (desirable, available for many biometric and

pattern recognition problems) Rather, researchers collect their own

data Nevertheless, literature optimistic of

keystroke biometric potential for security

Keystroke Biometric Studies

Introduction Feature Measurements

Features derived from raw data Key press times and key release times Each keystroke provides small amount of data

Data varies from different keyboards, different conditions, and different entered texts

Using long text input allows Use of good (statistical) feature measurements Generalization over keyboards, conditions, etc.

Keystroke Biometric Studies

Introduction Make Case for Using

Data over the internet Required by applications

Long text input More and better features Higher accuracy

Free text input Required by applications Predefined copy texts unacceptable

Keystroke Biometric Studies

Introduction Extends Previous Work by Authors

Previous keystroke identification study Ideal conditions

Fixed text and Same keyboard for enrollment and testing

Less ideal conditions Free text input Different keyboards for enrollment and testing

Keystroke Biometric Studies

Introduction Summary of Scope and

Methodology

Determine distinctiveness of keystroke patterns

Two application types Identification (1-of-n problem) Authentication (yes/no problem)

Two indep. variables (4 data quadrants) Keyboard type – desktop versus laptop Entry mode – copy versus free text

Keystroke Biometric Studies

Keystroke Biometric System

Raw keystroke data capture Feature extraction Classification for identification Classification for authentication

Keystroke Biometric Studies

Keystroke Biometric SystemRaw Keystroke Data Capture

Keystroke Biometric Studies

Keystroke Biometric SystemRaw Keystroke Data Capture

Keystroke Biometric Studies

Keystroke Biometric SystemFeature Extraction

Mostly statistical features Averages and standard deviations

Key press times Transition times between keystroke pairs

Individual keys and groups of keys – hierarchy

Percentage features Percentage use of non-letter keys Percentage use of mouse clicks

Input rates – average time/keystroke

Keystroke Biometric Studies

Keystroke Biometric SystemFeature Extraction

A two-key sequence (th) showing the two transition measures

Keystroke Biometric Studies

Keystroke Biometric SystemFeature Extraction

Hierarchy tree for the 39 duration categories

Keystroke Biometric Studies

Keystroke Biometric SystemFeature Extraction

Hierarchy tree for the 35 transition categories

Keystroke Biometric Studies

Keystroke Biometric SystemFeature Extraction

Fallback procedure for few/missing samples When the number of samples is less than a

fallback threshold, take the weighted average of the key’s mean and the fallback mean

weightfallback

weightfallback

kin

fallbackkiini

)(

)()()()('

Keystroke Biometric Studies

Keystroke Biometric SystemFeature Extraction

Two preprocessing steps Outlier removal

Remove duration and transition times > threshold

Feature standardization Convert features into the range 0-1

minmax

min'xx

xxx

Keystroke Biometric Studies

Keystroke Biometric SystemClassification for Identification

Nearest neighbor using Euclidean distance

Compare a test sample against the training samples, and the author of the nearest training sample is identified as the author of the test sample

Keystroke Biometric Studies

Keystroke Biometric SystemClassification for Authentication

Cha’s vector-distance (dichotomy) model

Keystroke Biometric Studies

Experimental Design and Data Collection

Design

Two independent variables Keyboard type

Desktop – all Dell Laptop – 90% Dell + IBM, Compaq, Apple, HP,

Toshiba Input mode

Copy task – predefined text Free text input – e.g., arbitrary email

Keystroke Biometric Studies

Experimental Design and Data Collection

Design

Keystroke Biometric Studies

Experimental Design and Data Collection

Data Collection Subjects provided samples in at least two quadrants Five samples per quadrant per subject Summary of subject demographics

Age Female Male Total

Under 20 15 19 34

20-29 12 23 35

30-39 5 10 15

40-49 7 11 18

50+ 11 5 16

All 50 68 118

Keystroke Biometric Studies

Experimental Results Identification experimental results Authentication experimental results Longitudinal study results System hierarchical model and

parameters Hierarchical fallback model Outlier parameters Number of enrollment samples Input text length Probability distributions of statistical features

Keystroke Biometric Studies

Experimental ResultsIdentification Experimental

Results

90%

95%

100%

0 20 40 60 80 100

Number of Subjects

Per

cen

t A

ccu

racy

Desk-Copy

Lap-Copy

Desk-Free

Lap-Free

Identification performance under ideal conditions(same keyboard type and input mode, leave-one-out

procedure)

Keystroke Biometric Studies

Experimental ResultsIdentification Experimental

Results

0%

10%

20%

30%

40%

50%60%

70%

80%

90%

100%

0 20 40 60 80 100

Number of Subjects

Pe

rce

nt

Ac

cu

rac

y Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Identification performance under non-ideal conditions

(train on one file, test on another)

Keystroke Biometric Studies

Experimental Design and Data Collection

Design

Keystroke Biometric Studies

Experimental ResultsAuthentication Experimental

Results

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

DeskCopy LapCopy DeskFree LapFree

Performance

FRR

FAR

Perc

ent A

ccur

acy

Conditions

Authentication performance under ideal conditions(weak enrollment: train on 18 subjects and test on 18 different

subjects)

Keystroke Biometric Studies

Experimental ResultsLongitudinal Study Results

Identification – 13 subjects at 2-week intervals Average 6 arrow groups: 90% -> 85% -> 83%

Authentication – 13 subjects at 2-week intervals Average 6 arrow groups: 90% -> 87% -> 85%

Identification – 8 subjects at 2-year interval Average 6 arrow groups: 84% -> 67%

Authentication – 8 subjects at 2-year interval Average 6 arrow groups: 94% -> 92%

(all above results under non-ideal conditions)

Keystroke Biometric Studies

Experimental Results System hierarchical model and

parameters

Touch-type hierarchy tree for durations

Keystroke Biometric Studies

Experimental Results System hierarchical model and

parameters

Identification accuracy versus outlier removal passes

Keystroke Biometric Studies

Experimental Results System hierarchical model and

parameters

Identification accuracy versus outlier removal distance (sigma)

Keystroke Biometric Studies

Experimental Results System hierarchical model and

parameters

70

75

80

85

90

95

100

1 2 3 4

Enrollment Samples

Per

cen

t A

ccu

racy

Identification accuracy versus enrollment samples

Keystroke Biometric Studies

Experimental Results System hierarchical model and

parameters

Identification accuracy versus input text length

Keystroke Biometric Studies

Experimental Results System hierarchical model and

parameters

Distributions of “u” duration times for each entry mode

Keystroke Biometric Studies

Conclusions

Results are important and timely as more people become involved in the applications of interest Authenticating online test-takers Identifying senders of inappropriate email

High performance (accuracy) results if 2 or more enrollment samples/user Users use same keyboard type

Keystroke Biometric Studies

Future Work

Focus on user authentication Focus on Cha’s dichotomy model Develop strong/weak enrollment

concepts Strong – system trained on actual users Weak – system trained on other (non-test)

users Develop strategies to obtain ROC curves Run actual test-taker experiments