Improving the Privacy of Wireless Protocols Jeffrey Pang Carnegie Mellon University

Embed Size (px)

Citation preview

  • Slide 1
  • Improving the Privacy of Wireless Protocols Jeffrey Pang Carnegie Mellon University
  • Slide 2
  • 2
  • Slide 3
  • 3 Protocol Header Protocol Control Info
  • Slide 4
  • What can Protocol Control Info Reveal? 4 www.bluetoothtracking.org Location traces can be deanonymized [Beresford 03, Hoh 05-07, Krum 07] Kims House
  • Slide 5
  • Who Might be Tracking You? 5
  • Slide 6
  • Talk Overview Motivation Quantifying the tracking threat Building identifier-free protocols Other research 6
  • Slide 7
  • Building identifier-free protocols Quantifying the tracking threat My work: How to build efficient protocols that reveals no transmitted bits to eavesdroppers [MobiSys 08 Best Paper, HotNets 07] Previous work: Temporary device addresses [Gruteser 05, Hu 06, Jiang 07, Stajano 05] My work: Temporary addresses are not enough; Other protocol info can be used to track devices [MobiCom 07, HotOS 07] Talk Overview 7
  • Slide 8
  • Name: Bobs Device Secret: Alice
  • PHY Layer Signatures Charlie -> AP Alice -> AP Charlie -> AP ??? -> AP Physical layer fingerprints [Brik 08, Patwari 07, Hall 04] Require uncommon and/or expensive hardware Not as accurate in all circumstances These may be obscured by adding analog noise in hardware SlyFi raises the bar and is a necessary first step 56
  • Slide 57
  • Linking with Signal Strength Side-channel attack accuracy degrades significantly even if attacker tries to use signal strength to link packets Side-channel attack accuracy degrades significantly even if attacker tries to use signal strength to link packets Attack: website finger-printing [Liberatore 06] Attacker has 5 nodes to record packets RSSIs Attacker uses k-means clustering to determine which packets belong to each client. Set of RSSIs is the feature vector. Varying RSSI by +/- 5db reduces accuracy even further to 30% [Bauer 08] 57
  • Slide 58
  • Why not Time for Data Transport? Data messages: Frequent: sent often to deliver data (1000+ pkts/sec) Wide interface: many applications, many side-channels Linkability at short timescales is NOT usually OK Can NOT use loosely synchronized time to synchronize i TiTi AB TiTi TiTi TiTi = = = 58
  • Slide 59
  • Other Protocol Details Broadcast All broadcast packets routed through the AP Use same shared key for all the clients of the AP Higher-layer binding Clients report pseudonym MAC address-to-IP address bindings to AP AP answers all ARP queries Time synchronization and roaming Use protected broadcast to transmit timestamps, same BSSID info Coexistence with 802.11 Encapsulate SlyFi in anonymous 802.11 frame with unused FC code Clients first search for SlyFi AP, then fall back to non-private AP search Link-layer ACKs If fast enough, just acknowledge last SlyFi token sent Our software implementation uses windowed ACKs 59
  • Slide 60
  • Multi-Party Discovery 60 Search Probe Enc(K payload1, 0, ) MAC(K payload2, )AB T i T i = AES(K AB, i) token K payload, offset TiTiTiTi AB T i T i = AES(K AB, i)AC token K payload, offset TiTiTiTi AC... Enc(K AB,0, ) MAC(K AB, ) encrypt auth Enc(K AB,0, ) MAC(K AB, ) encrypt auth random key On receipt, check each 16-byte block that could be a token 1500 byte packets at most 31 lookups per packet Can be done in parallel in hardware What if there are more than 31 receivers? Option 1: Duplicate packet with different tokens in each Option 2: Approximate token matching; open problem and future work
  • Slide 61
  • Packet Format Tryst: Discovery/Binding Shroud: Data Transport TokenUnencrypted Message 61
  • Slide 62
  • Protocol Timing Diagram Discover Authenticate and Bind Authenticate and Bind Send Data 62
  • Slide 63
  • Comparison protocols wifi-open:802.11 with no security wifi-wpa: 802.11 with WPA PSK/CCMP public-key:straw man symmetric-key:straw man armknecht: previous header encryption proposal Performance Evaluation Similar 63
  • Slide 64
  • Link Setup Failures Encrypt everything fails to setup many links 64
  • Slide 65
  • Token Computation Time Token computation time is negligible (Once every token interval) Using software AES, 256 Mhz Geode processor 65
  • Slide 66
  • Data Throughput vs. Packet Size SlyFi data transport overhead is similar to WPA 66
  • Slide 67
  • Data Throughput SlyFi data filtering is about as efficient as 802.11 With simulated AES hardware Performs like symmetric key Higher = Better 67
  • Slide 68
  • Empirical Stream Interleaving Many streams interleaved even at short timescales 68
  • Slide 69
  • Empirical Background Probe Rate Background probes are frequent in practice 69
  • Slide 70
  • Empirical # Probes per Client Some clients probe for many network names 70
  • Slide 71
  • === MOBICOM === 71
  • Slide 72
  • Tracking 802.11 Users Tracking scenario: Every users changes pseudonyms every hour Adversary monitors some locations One hourly traffic sample from each user in each location ? ? ? tcpdump Build a profile from training samples: First collect some traffic known to be from user X and from random others tcpdump..... ? ? ? Then classify observation samples Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM 72
  • Slide 73
  • Sample Classification Algorithm Core question: Did traffic sample s come from user X? A simple approach: nave Bayes classifier Derive probabilistic model from training samples Given s with features F, answer yes if: Pr[ s from user X | s has features F ] > T for a selected threshold T. F = feature set derived from implicit identifiers 73
  • Slide 74
  • Deriving features F from implicit identifiers Set similarity (Jaccard Index), weighted by frequency: IR_Guest SAMPLE FROM OBSERVATION Sample Classification Algorithm linksys djw SIGCOMM_1 PROFILE FROM TRAINING Rare Common w(e) = low w(e) = high 74
  • Slide 75
  • Evaluating Classification Effectiveness Simulate tracking scenario with wireless traces: Split each trace into training and observation phases Simulate pseudonym changes for each user X DurationProfiled UsersTotal Users SIGCOMM conf. (2004) 4 days377465 UCSD office building (2006) 1 day153615 Apartment building (2006) 14 days39196 75
  • Slide 76
  • Question: Is observation sample s from user X? Evaluation metrics: True positive rate (TPR) Fraction of user Xs samples classified correctly False positive rate (FPR) Fraction of other samples classified incorrectly Evaluating Classification Effectiveness Pr[ s from user X | s has features F ] > T = 0.01 = ??? Fix T for FPR Measure TPR (See paper) 76
  • Slide 77
  • Results: Individual Feature Accuracy TPR 60% TPR 30% Individual implicit identifiers give evidence of identity 1.0 77
  • Slide 78
  • Results: Multiple Feature Accuracy Users with TPR >50%: Public: 63% Home: 31% Enterprise: 27% We can identify many users in all environments PublicHomeEnterprise netdests ssids bcast fields 78
  • Slide 79
  • Results: Multiple Feature Accuracy Some users much more distinguishable than others Public networks: ~20% users identified >90% of the time PublicHomeEnterprise netdests ssids bcast fields 79
  • Slide 80
  • One Application Question: Was user X here today? More difficult to answer: Suppose N users present each hour Over an 8 hour day, 8N opportunities to misclassify Decide user X is here only if multiple samples are classified as his Revised: Was user X here today for a few hours? 80
  • Slide 81
  • Results: Tracking with 90% Accuracy Many users can be identified 81
  • Slide 82
  • === WIFI-REPORTS === 82
  • Slide 83
  • Will Reports Improve Selection? Commercial APs at Seattle hotspot locations red = official AP grey = other visible AP 83
  • Slide 84
  • === OLD SLIDES === 84
  • Slide 85
  • Other Research Wireless Service Discovery [MobiSys 09, HotNets 07] Wi-Fi hotspot reputation system Private device discovery without per-device bootstrapping Peer-to-Peer Games [SIGCOMM 08, NSDI 06, IPTPS 07] Distributed File Systems [ICDCS 07] Internet Measurement [IMC 04 x2, SIGCOMM 04] DNS infrastructure properties DNS cache compliance BGP routing and multi-homing 85
  • Slide 86
  • What can Protocol Control Info Reveal? 86 Wardriving Database Me
  • Slide 87
  • What can Protocol Control Info Reveal? 87 Wardriving Database David Wetherall
  • Slide 88
  • Previous work: Periodically change device addresses [Gruteser 05, Hu 06, Jiang 07, Stajano 05] Central Challenge From: 19:1A:1B:1C:1D:1E Prevent third parties from tracking wireless devices tcpdump 88
  • Slide 89
  • Central Challenge My thesis work: Temporary device addresses are not enough; Other protocol info can be used to track devices [MobiCom 07, HotOS 07] How to build efficient wireless protocols that reveals no transmitted bits to eavesdroppers [MobiSys 08 Best Paper, HotNets 07] Prevent third parties from tracking wireless devices 89
  • Slide 90
  • Implicit Identifiers Example Implicit identifier: other protocol fields Header bits (e.g., power management) Supported transmit rates Supported authentication algorithms tcpdump ? time Protocol Fields: 11,4,2,1Mbps, WEP Protocol Fields: 11,4,2,1Mbps, WEP 90
  • Slide 91
  • Wireless Protocol Primer Same basic protocol for 802.11 ad-hoc 802.11 infrastructure Bluetooth Other wireless protocols 91
  • Slide 92
  • Wireless Protocol Primer Name: Bobs PSP Password: Alice