View
222
Download
2
Tags:
Embed Size (px)
Citation preview
1
• Security problems of your keyboard
– Authentication based on key strokes
– Compromising emanations consist of electrical,
mechanical, or acoustical
– Supply chain attack (Bluetooth, SD card)
– Power usage?
2
• Key stroke biometrics with number-pad input (DSN
2010)
– 28 users typed the same 10 digit number
– Use statistical machine learning techniques
– Detection rate 99.97%
– False alarm rate 1.51%
– Can be used for real life two-factor authentication
Keyboard Acoustic Emanations Revisited
Li Zhuang, Feng Zhou and J. D. Tygar
U. C. Berkeley
4
Motivation
• Emanations of electronic devices leak information
• How much information is leaked by emanations?
• Apply statistical learning methods to security
– What is learned from recordings of typing on a keyboard?
5
Keyboard Acoustic Emanations
• Leaking information by acoustic emanations
Alicepassword
6
Acoustic Information in Typing
• Frequency information in sound of each typed key
• Why do keystrokes make different sounds?
– Different locations on the supporting plate
– Each key is slightly different
• [Asonov and Agrawal 2004]
7
Timing Information in Typing
• Time between two keystrokes
• Lasting time of a keystroke
• E.g. [Song, Wagner and Tian, 2001]
8
Previous Work vs. Our Approach
Asonov and Agrawal Ours
Requirement Text-labeling Direct recovery
Analogy in Crypto Known-plaintext attack Known-ciphertext attack
Feature Extraction FFT Cepstrum
Initial trainingSupervised learning
with Neural Networks
Clustering (K-means,
Gaussian), EM algorithm
Language Model / HMMs at different levels
Feedback-based Training / Self-improving feedback
9
Key Observation
• Build acoustic model for keyboard & typist
• Non-random typed text (English)
– Limited number of words
– Limited letter sequences (spelling)
– Limited word sequences (grammar)
• Build language model
– Statistical learning theory
– Natural language processing
10
OverviewInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
11
Feature ExtractionInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
12
Sound of a Keystroke
• How to represent each keystroke?
– Vector of features: FFT, Cepstrum
– Cepstrum features used in speech recognition
13
Cepstrum vs. FFT
• Repeat experiments from [Asonov and Agrawal 2004]
Training Test 1 Test 1 Test 1Test 2 Test 2 Test 2Training Training
Linear Classification Neural Networks Gaussian Mixtures
0
1
accu
racy
14
Unsupervised LearningInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
15
Unsupervised Learning
• Group keystrokes into N clusters
– Assign keystroke a label, 1, …, N
• Find best mapping from cluster labels to characters
• Some character combinations are more common
– “th” vs. “tj”
– Hidden Markov Models (HMMs)
16
Bi-grams of Characters
• Colored circles: cluster labels
• Empty circles: typed characters
• Arrows: dependency
“t” “h” “e”
EM
5 11 2
17
Language Model CorrectionInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
18
Word Tri-grams
• Spelling correction
• Simple statistical model of English grammar
• Use HMMs again to model
19
Two Copies of Recovered Text
Before spelling and grammar correction
After spelling and grammar correction
_____ = errors in recovery = errors in corrected by grammar
20
Sample CollectorInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
21
Feedback-based TrainingInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
22
Feedback-based Training
• Recovered characters
– Language correction
• Feedback for more rounds of training
• Output: keystroke classifier
– Language independent
– Can be used to recognize random sequence of keys
• E.g. passwords
– Representation of keystroke classifier
• Neural networks, linear classification, Gaussian mixtures
23
Keystroke ClassifierInitial training
Unsupervised Learning
Language Model Correction
Sample Collector
Classifier Builder
keystroke classifierrecovered keystrokes
Feature Extraction
wave signal
Subsequent recognition
Feature Extraction
wave signal
Keystroke Classifier
Language Model Correction(optional)
recovered keystrokes
24
Experiment (1)
• Single keyboard
– Logitech Elite Duo wireless keyboard
– 4 data sets recorded in two settings
• Quiet & noisy
• Keystrokes are clearly separable from consecutive keys
– Automatically extract keystroke positions in the signal with
some manual error correction
25
– Data sets
Initial & final recognition rate
Recording length Number of words Number of keys
Set 1 ~12 min ~400 ~2500
Set 2 ~27 min ~1000 ~5500
Set 3 ~22 min ~800 ~4200
Set 4 ~24 min ~700 ~4300
Set 1 (%) Set 2 (%) Set 3 (%) Set 4 (%)
Word Char Word Char Word Char Word Char
Initial 35 76 39 80 32 73 23 68
Final 90 96 89 96 83 95 80 92
26
Experiment (2)
• Multiple Keyboards
– Keyboard 1: DELL QuietKey PS/2, P/N: 2P121
• In use for about 6 months
– Keyboard 2: DELL QuietKey PS/2, P/N: 035KKW
• In use for more than 5 years
– Keyboard 3: DELL Wireless Keyboard, P/N: W0147
• New
27
• 12-minute recording with ~2300 characters
Keyboard 1 (%) Keyboard 2 (%) Keyboard 3 (%)
Word Char Word Char Word Char
Initial 31 72 20 62 23 64
Final 82 93 82 94 75 90
28
Experiment (3)
• Classification methods in feedback-based training
– Neural Networks (NN)
– Linear Classification (LC)
– Gaussian Mixtures (GM)
0
10
20
30
40
50
60
70
80
90
100
Word Char
NNLCGM
29
Limitations of Our Experiments
• Considered letters, period, comma, space, enter
• Did not consider numbers, other punctuation,
backspace, shift, etc.
• Easily separable keystrokes
• Only considered white noise (e.g. fans)
30
Defenses
• Physical security
• Two-factor authentication
• Masking noise
• Keyboards with uniform sound (?)
31
Summary
• Recover keys from only the sound
• Using typing of English text for training
• Apply statistical learning theory to security
– Clustering, HMMs, supervised classification, feedback
incremental learning
• Recover 96% of typed characters