Security Monitoring for Wireless Network Forensics (SMoWF)

1

Security Monitoring for Wireless Network Forensics (SMoWF) Yongjie Cai and Ping Ji City University of New York (CUNY)

Abstract With the broad deployment of WiFi networks nowadays, it is easy for malicious network users to camouflage their true identities through randomly hopping onto open wireless networks, conduct an attack and leave without being caught. Most of the current infrastructures of wireless networks do not keep logs of network activities by default, which makes it hard to obtain important network traces that may facilitate future forensics investigations for a suspicious network event. In this paper, we outline a Security Monitoring System for Wireless Network Forensics (SMoWF), which aims to establish a forensic database based on encrypted (or hashed) wireless trace digests, and to answer the critical investigation question: which wireless device appeared at where during what time? We propose to accomplish our goal through three steps: 1. Design a network trace logging method that records the abstract of useful fields of network packets. Here only abstracts of packets are kept due to privacy protection concerns. 2. Design a query/search system that allows users to conduct forensic analysis based on gathered traces; 3. Study and integrate localization algorithms into SMoWF, which can provide the location estimation of a given device when such information is needed. Author Yongjie Cai is a second-year student of the Computer Science PhD Program of the Graduate Center of City University of New York (CUNY). Prior to joining the graduate program of CUNY, Yongjie obtained her B.E degree in Computer Science from NanKai University, China, in year 2010. Working in the Computer Network and Mobile System Security (NeMo) Lab led by Prof. Ping Ji, Yongjie’s current research interest includes Wireless Network Performance and Security Measurement, Network Traffic Analysis and Wireless Network Applications. Ping Ji is a Professor of the Mathematics and Computer Science Department of John Jay College of Criminal Justice of the City University of New York, and a faculty member of the Computer Science PhD Program of CUNY – Graduate Center. Prof. Ji holds a PhD degree in Computer Science from the University of Massachusetts at Amherst and a B.S. degree in Computer Science and Technology from Tsinghua University, China. Prof. Ji’s research interests cover the broad area of Computer Networks and have recently focused on Wireless and Mobile Network Security and Forensics. Since joining the City University of New York from year 2003, Prof. Ji has been active in both research and teaching, and awarded a number of CUNY internal and external government grants. These awards and grants include the RF-CUNY research award for consecutive five years, US/UK Army research grants on sensor network information quality study, and NSF grants in the field of Network Forensics.

2

1. Introduction

Cybercrime is an exploding security challenge in the current digital age, and has been largely concerned by the public over the past several decades. With the escalating deployment of WiFi networks, the accelerated usage of mobile devices, and the dynamic physical and protocol characteristics of wireless communication, wireless links have become an increasingly popular channel for cyber criminals to camouflage their true identities. For example, a hacker may drive on the street, randomly pick an open WiFi network, conveniently connect to the Access Point, upload or download malicious files through the Access Point, then close the session and drive away. The whole process may only take minutes to accomplish, and when the victim machine notices the attack, the best point of interest that it can trace back is very likely only the benign Access Point, through which the true attacker conducted the malicious activity. It is almost always certain that the hacker will be cut loose.

In this research, we propose to design a distributed Security Monitoring system for Wireless network Forensics (SMoWF), which monitors Wireless LAN activities. Abstracts of network traces are captured and selectively recorded at each monitoring point. Distributed monitoring points collaborate to reconstruct the crime scene based on monitored logs, and the SMoWF system should be able to answer the following questions: 1. Was a particular wireless device involved in a given malicious network activity? 2. Can this device be uniquely identified by the logs? 3. Where a particular device was physically located during a given period of time.

We propose to accomplish our goal through three steps: 1. Design a network trace logging method that records the abstracts of useful fields of network packets. Here only abstracts of packets are kept for privacy protection purpose. 2. Design a query/search system that allows users to conduct forensic analysis based on monitored traces. 3. Study and implement localization algorithms that can provide the location information of a given device when necessary.

The rest of this paper is organized as follows: Section 2 explores the related work. Section 3 outlines the architecture of SMoWF. Section 4 illustrates wireless network trace capturing and preprocessing methods. Section 5 discusses the approaches to store critical logs and conducts post analysis and investigation. Section 6 shows the prototype of SMoWF. Section 7 concludes the paper.

2. Related Work

There are a number of wireless traffic capturing tools 1 including Wireshark, Tcpdump and Kismet/KisMac, with which we can gather wireless network traces through "off-the-shelf" 802.11 network cards. All traffic in the same network can be passively captured when a network card is set in promiscuous (i.e. monitor) mode. When the card is in monitor mode, no packets are transmitted through it and all the traffic in a specific channel can be preserved into a backend server. More interestingly, Kismet2 is able to hop channels to cover the entire spectrum, and record the physical location of a monitoring point when the tool is used with a GPS receiver. Important trace information, such as the SSID, channel number, MAC address and associated clients of wireless networks in range, can be gathered by these traffic capturing tools, which may contain vital clues for future forensic investigations.

1 Wireless sniffer: https://personaltelco.net/wiki/wirelesssniffer. 2 Kismet: http://www.kismetwireless.net/.

3

Figure 1. A typical wireless monitoring system

Researchers have proposed several wireless monitoring infrastructure systems, primarily for improving wireless channel and protocol performance. The framework of a typical wireless monitoring system is showed in Figure 1, which consists of three parts: the monitoring point, the data repository and the centric processing engine. Each monitoring point gathers network information from access points or by capturing traffic in the air, and transmits the gathered raw data to the repository. The centric processing engine conducts network analysis and reports abnormal events to network operators. VISUM3 delegates the monitoring task to a set of distributed agents using SNMP. It uses device-specific XML profiles to map retrieved high-level monitoring information to device-specific SNMP Object Identifiers. The centric processing engine is responsible to assign the subset of network devices that needs to be monitored to individual agents. This is a complicated task when the number of devices gets very large, and things can get worse when we don’t know the locations of these devices.

Also along this line of research, DAIR4 is a framework that manages and troubleshoots enterprise wireless networks using desktop infrastructure. It proposes to attach USB-based wireless adapters to desktop machines that usually have spare CPU, disk resources and the more reliable wired-line Internet connectivity. These inexpensive adapters then work as monitoring points and can be densely deployed to cover an entire local area. In addition, Jigsaw5 deploys 192 stand-alone radio sniffers to monitor a wireless network that consists of 40 open APs, which cover four floors and the basement in a building. The three aforementioned systems are designed for network administrators to better monitor and diagnose the network performance of 802.11 networks. They mainly focus on maintaining the stability of clients’ connectivity, reducing the interference and packet delay. The infrastructures of DAIR and Jigsaw can be adopted in our wireless network security monitoring project for raw data collection in indoor environment.

3 Camden C. Ho, Krishna N. Ramachandran, Kevin C. Almeroth, and Elizabeth M. Belding-Royer. A scalable framework for wireless network monitoring. In Proceedings of the 2nd ACM international workshop on Wireless mobile applications and services on WLAN hotspots, WMASH ’04, pages 93–101, New York, NY, USA, 2004. ACM. 4 Paramvir Bahl, Jitendra Padhye, Lenin Ravindranath, Manpreet Singh, Alec Wolman, and Brian Zill. Dair: A framework for managing enterprise wireless networks using desktop infrastructure. In HOTNETS’05, 2005. 5 Yu-Chung Cheng, John Bellardo, Péter Benkö, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage. Jigsaw: solving the puzzle of enterprise 802.11 analysis. SIGCOMM Comput. Commun. Rev., 36:39–50, August 2006

Monitoring Point Monitoring Point

Repository Centric

Processing Engine

User Interaction

…

4

Similar to what we hope to propose, FLUX6 is a prototype of forensic monitoring system based on CoMo platform.7 It aims to identify suspicious activities, network anomalies and provide incident playback. This work proposes a similar goal with ours, however FLUX was in its preliminary stage and seemed discontinued.

Another aspect related to our work is device identification. Malicious attackers can easily camouflage their device IDs. MAC spoofing is a perfect example here for simple and effective anonymity tactics. Attackers can change the MAC address of their devices easily. However, in recent years, researchers have proposed quite a few ways to fight against MAC spoofing. First, Jeffery et al.8 demonstrate that with 90% accuracy 64% of users can be identified without using MAC address. The implicit identifiers from users’ network activities, such as pairs of IP Address and port, SSID probes, broadcast packets sizes and MAC Protocol Fields, can help identify a unique user (device) quite accurately. S. Dolatshahi et al.9,10 show the effectiveness of using RF signature as a wireless device identity. They exploited the imperfection of commercially used RF transmitter and amplifiers, which is difficult to for attackers to modify. Moreover, Polak et al.11 propose a method by the analysis of the in-band distortion and the spectral growth to uncover the more sophisticated attackers who distorted their signatures. In the current stage of our work, we use MAC address as the device identifier without worrying too much about MAC spoofing problem. However, in the future deployment of SMoWF system, we will consider the above mentioned device identification methods and implement appropriate ones to fight against MAC spoofing.

3. Overview of SMoWF

The emerging and increasing growth of WiFi wireless networking technology makes it possible to connect to the Internet from anywhere at anytime. For example, in a wireless network measurement study,12 we conducted experiments around a three-block metropolitan neighbourhood of the mid-west side of Manhattan for 12 runs, and detected 8000+ access points deployed in the neighbourhood. The densely deployed WiFi networks are undoubtedly making our life much easier and enjoyable, but they also provide more opportunities for malicious users to conduct criminal activities through mobile devices. We notice that among our detected access points, about 30 per cent provide unencrypted WiFi services. In other words, these open networks can be easily compromised.

6 Kevin P. Mc Grath and John Nelson. FLUX: A Forensic Time Machine for Wireless Networks. In INFOCOM Poster and Demo Session. IEEE, April 2006 7 Gianluca Iannaccone. Como: An open infrastructure for network monitoring – research agenda. Intel Research Technical Report, 2005 8 Jeffrey Pang, Ben Greenstein, Ramakrishna Gummadi, Srinivasan Seshan, and David Wetherall. 2007. 802.11 user fingerprinting. In Proceedings of the 13th annual ACM international conference on Mobile computing and networking (MobiCom '07). ACM, New York, NY, USA, 99-110. 9 Dolatshahi, S. and Polak, A. and Goeckel, D.L. 2010. Identification of wireless users via power amplifier imperfections. 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers (ASILOMAR) 10 A. Polak, S. Dolatshahi and D. Goeckel, “Identifying Wireless Users via Transmitter Imperfections,” IEEE Journal on Selected Areas in Communications- Special Issue on Advances in Digital Forensics for Communications and Networking, August 2011 11 Polak, A.C. and Goeckel, D.L. RF Fingerprinting of Users Who Actively Mask Their Identities with Artificial Distortion 12 Yongjie Cai and Ping Ji. A measurement study for understanding wireless forensic monitoring. To appear in ICDFI, Sept. 2012

5

In this paper, we propose a security monitoring infrastructure for wireless network forensics (SMoWF), which is to build an intelligent monitoring system that can uncover malicious devices, track their activities in Wireless LAN of metropolitan area, and preserve digital evidence to facility future cyber crime investigation. SMoWF system should be able to answer the following questions:

• Whether or not a particular device was involved in a given malicious network activity? • Can this device be uniquely identified by the logs? • Where was a particular device physically located during a given event?

Similar to Figure 1, the SMoWF system consists of a set of monitors that are responsible to capture

Wireless network traffic. These monitoring points are distributed through a Wireless network and may be moved around to cover Wireless LANs as much as possible. After the collection of raw traffic data, SMoWF parses raw data into human-readable texts, eliminates irrelevant traffic types and extracts useful information for device identification and localization. It also removes the data part of traffic packets to protect users’ privacy. SMoWF uses a central repository to store processed data as digital evidence. Finally, it includes a post-investigation engine that helps investigators to figure out what was going on when a criminal activity occurred. The post-investigation engine retrieves relevant data from the evidence repository and is able to answer the aforementioned questions.

4. Traffic Capture and Preprocess

Comparing to those monitoring systems deployed in buildings/universities, 13 , 14 there are several challenges of Wireless traffic monitoring in a metropolitan area: 1) the number of access points that are observable is large; 2) the AP locations and distributions are unknown; 3) we are out of control of these access points. Therefore, the traditional ways of obtaining traffic via access points are not practical. We cannot configure all these access points to log their real-time traffic, nor can we deploy thousands of static stand-alone monitor nodes to cover the whole area.

For SMoWF, we propose to delegate the traffic capturing tasks to wireless monitoring points, such as laptops being either stationary or mobile. These monitoring points passively capture nearby Wireless network traffic, and periodically upload the encrypted or hashed traffic logs to a central repository. Particularly, in our experiments, we use Kismet installed on a MacBook Pro to gather raw Wireless network traffic. Kismet is an 802.11 wireless network sniffer working with any wireless card, which supports monitoring mode, and detects networks by passively collecting packets. It can provide GPS coordinates where packets are detected when integrated with a GPS device. Kismet will generate several log files including .pcapdump, .gpsxml, .netxml, .nettxt, .alert. All above MAC layer packets information, together with Per-Packet Information (PPI) header that includes channel, signal and noise strength, are logged to .pcapdump files. GPS information such as coordinates and speed are recorded into .gpsxml files. Our SMoWF system mainly uses these two types of logs. 13 Camden C. Ho, Krishna N. Ramachandran, Kevin C. Almeroth, and Elizabeth M. Belding-Royer. A scalable framework for wireless network monitoring. In Proceedings of the 2nd ACM international workshop on Wireless mobile applications and services on WLAN hotspots, WMASH ’04, pages 93–101, New York, NY, USA, 2004. ACM 14 Yu-Chung Cheng, John Bellardo, Péter Benkö, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage. Jigsaw: solving the puzzle of enterprise 802.11 analysis. SIGCOMM Comput. Commun. Rev., 36:39–50, August 2006

6

While Kismet logs collect raw packets in libpcap format into .pcapdump files, we use Tshark,15 which is the command line version of Wireshark to parse them into human-readable text files. Also we filtered out the data part of packets and only preserve packet headers.

5. Evidence Preservation and Post-‐investigation

For evidence preservation, only extracting packet headers to reduce logged data size is not efficient enough. The network traffic size can be huge compared to limited storage. For instance, in Section 6, we collected 362,305 packets using Kismet, about 311MB trace data by randomly walking around a three-block neighbourhood for 12 trips around four hours. These data only came from one single monitor. If tens or hundreds of monitors participate, it can easily get 10-100 GB traces in one day. For instance, Jigsaw16 collected 96GB 802.11 raw traces in one day using 192 radio monitors. We made statistic analysis on the packet types in 802.11 on our data.17 We observed that half of the packets were beacons sent from access points, which simply claimed their existence and were not related to their associated clients. Therefore, we filtered out this kind of packets. Secondly, to support efficient queries and conducting post forensics investigation, we store our network traces into a database.

The critical part of our system is post-investigation, which aims to answer the questions described in Section 3. As a preliminary work for our system, we use MAC addresses as the unique identifiers of mobile devices and explore the device localization problem accordingly. We study and evaluate two localization algorithms, one is weighted centroid algorithm18 and the other is log-distance path loss modeling method19. Weighted centroid algorithm, as Equation 1 shows, estimates the location of the target device as the weighted sum of all locations where it was observed. Shown in Formula 1, 𝑝 is estimated location of target device, p is the ith location coordinate where the device is detected, and the weight wi is proportional to signal strength received from the target device at ith location.

𝑝 = 𝑤!! 𝑝! , (𝑤! ∝ 𝑠! , 𝑤! = 1! ) (1)

The log-distance path loss modelling method describes that the average received signal strength decreases logarithmically with distance whether in outdoor or indoor radio channels, shown in Equation 2.

𝑠! = 𝑆 − 10𝛾𝑙𝑜𝑔𝑑! + 𝑋! (2)

15 Tshark: http://www.wireshark.org/docs/man-pages/tshark.html 16 Yu-Chung Cheng, Mikhail Afanasyev, Patrick Verkaik, Péter Benkö, Jennifer Chiang, Alex C. Snoeren, Stefan Savage, and Geoffrey M. Voelker. Automating cross-layer diagnosis of enterprise wireless networks. In SIGCOMM ’07, pages 25–36, New York, NY, USA, 2007. ACM 17 Yongjie Cai and Ping Ji. A measurement study for understanding wireless forensic monitoring. To appear on ICDFI, 2012 18 Yu-Chung Cheng, Yatin Chawathe, Anthony LaMarca, and John Krumm. 2005. Accuracy characterization for metropolitan-scale Wi-Fi localization. In Proceedings of the 3rd international conference on Mobile systems, applications, and services (MobiSys '05). ACM, New York, NY, USA, 233-245. DOI=10.1145/1067170.1067195 http://doi.acm.org/10.1145/1067170.1067195 19 Theodore Rappaport. Wireless Communications: Principles and Practice. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2nd edition, 2001.

7

𝑠! is the received signal strength from target device at position i. 𝑑! is the physical distance from target device with coordinates <x, y> to the monitor point with coordinates <xi,yi>. The path loss exponent 𝛾 indicates the loss rate of the received signal strength. S is the signal strength from the device at a distance of one meter. To compensate for the random shadowing effects in radio propagation, 𝑋! is added as a zero-mean Gaussian distributed random variable with standard deviation 𝜎. Theoretically, we need four monitor points or traces to determine four parameters <S, 𝛾, x, y> of the target device in order to know the location coordinate <x, y>. However, in practice, more than four sets of traces from the target devices are collected. We have to solve a set of over-determined equations. There are several solutions to this problem. For example, Krishna 20 proposed to find solutions to minimize the least mean absolute error of equations. In our system, to simplify the implementation, we used trust-region-reflective optimization approach21 implemented in Matlab to minimize the least square error which is defined in Equation 3. 𝐽 = (𝑠! − 𝑆 + 10 𝛾𝑙𝑜𝑔𝑑!)!! (3)

6. Experiments and System Prototype

We conduct experiments to explore the feasibility and evaluate the performance of our system in the testbed. Our testbed, shown in Figure 2, is a three-block metropolitan area of the upper-west side in NYC, which is around 260m*260m. We use a MacBook Pro laptop with internal airport wireless card and a BU353 GPS receiver as a moving monitor point. Kismet, installed on the MacBook Pro, is configured to hop on channels to cover the entire spectrum and log all received wireless packets. We walked around the testbed for 12 runs along the path of A-H or H-A in a week of April of 2011. We collected 362,305 packets around 311MB traces.

Figure 2. Testbed and Testing Path

20 Krishna Chintalapudi, Anand Padmanabha Iyer, and Venkata N. Padmanabhan. Indoor localization without the pain. In Proceedings of the sixteenth annual international conference on Mobile computing and networking, MobiCom ’10, pages 173–184, New York, NY, USA, 2010. ACM. 21 lsqnonlin: http://www.mathworks.com/help/toolbox/optim/ug/lsqnonlin.html.

8

After parsing .pcapdump files into readable texts, we filter out the data payload of packets, extract

packet header fields and dump them into PACKET table. The fields include frame date and time, source address, destination address, BSSID, transmitter address, and receiver address of MAC, data length, channel frequency, received signal strength, noise strength, type and subtype of 802.11, source and destination address of IP, source and destination port of TCP and UDP. Notice that a packet doesn’t include all the fields. For example, 802.11 Acknowledgement and Clear-To-Send packets only contain receiver MAC address and no other MAC address. One packet only has source/destination port either from TCP or UDP. We can obtain neither TCP nor UDP information from encryption packets. Furthermore, we extract .gpsxml files and dump them into MAC_GPS table in our database. MAC_GPS table contains date, time, source MAC address, signal, noise, latitude, longitude, altitude, fix, speed, heading. To speed queries, we create indexes on date fields in both tables. For device localization, we chose to apply the simple but effective weighted centroid algorithm in our system.

We further developed a simple web user interface to help investigators to trace their interested devices. As shown in Figure 5, an investigator can enter the date and time period of an interesting event, as well as the MAC address, IP, or BSSID of a device. SMoWF then pulls out the records/packets that are related to their interested device from the database. It will estimate the locations of the device every five minutes during that time window shown as the second picture and generate a KML file that tags the geo locations of the device in Google Earth, shown in the third picture. In this way, the investigators can easily locate their interested device.

9

Figure 3. System Prototype

7. Conclusion

In this work, we outlined a wireless forensic monitoring system (SMoWF), which aims to establish a forensic database based on encrypted (or hashed) wireless trace digests, and to answer the following investigation questions: 1. Was a particular device involved in a given malicious network activity? 2. Can this device be uniquely identified by the logs? 3. Where a particular device was physically located during a given event. We conducted research and experiments for the following tasks: 1. Design network trace logging method that records the abstract of useful fields of network packets. Here only abstracts of packets are kept for privacy protection purpose. 2. Design a query/search system that allows users to conduct forensic analysis activities based on monitored traces; 3. Study and propose localization algorithms that can provide the location information of a given device.

Acknowledgments

This research is supported by the National Science Foundation under NSF grant CNS-0904901

10

Documents

Security Monitoring for Wireless Network Forensics (SMoWF)