throbber
802.11 User Fingerprinting
`
`Ramakrishna Gummadi‡
`Ben Greenstein†
`Jeffrey Pang∗
`David Wetherall†§
`Srinivasan Seshan∗
`†Intel Research Seattle
`∗Carnegie Mellon University
`§University of Washington
`‡University of Southern California
`jeffpang@cs.cmu.edu
`benjamin.m.greenstein@intel.com gummadi@usc.edu
`srini@cmu.edu
`djw@cs.washington.edu
`
`ABSTRACT
`The ubiquity of 802.11 devices and networks enables anyone to
`track our every move with alarming ease. Each 802.11 device
`transmits a globally unique and persistent MAC address and thus
`is trivially identifiable. In response, recent research has proposed
`replacing such identifiers with pseudonyms (i.e., temporary, un-
`linkable names). In this paper, we demonstrate that pseudonyms
`are insufficient to prevent tracking of 802.11 devices because im-
`plicit identifiers, or identifying characteristics of 802.11 traffic, can
`identify many users with high accuracy. For example, even with-
`out unique names and addresses, we estimate that an adversary can
`identify 64% of users with 90% accuracy when they spend a day
`at a busy hot spot. We present an automated procedure based on
`four previously unrecognized implicit identifiers that can identify
`users in three real 802.11 traces even when pseudonyms and en-
`cryption are employed. We find that the majority of users can be
`identified using our techniques, but our ability to identify users is
`not uniform; some users are not easily identifiable. Nonetheless,
`we show that even a single implicit identifier is sufficient to distin-
`guish many users. Therefore, we argue that design considerations
`beyond eliminating explicit identifiers (i.e., unique names and ad-
`dresses), must be addressed in order to prevent user tracking in
`wireless networks.
`Categories and Subject Descriptors:
`C.2.1 Computer-Communication Networks: Network Architecture
`and Design
`General Terms: Measurement, Security
`Keywords: privacy, anonymity, wireless, 802.11
`
`INTRODUCTION
`1.
`The alarming ease with which third parties can track our ev-
`ery move has drawn the concern of the popular media [1, 2], the
`United States government [22, 40], and technical standards bod-
`ies [17]. The fear is that we are sacrificing our location privacy
`due to the ubiquity of wireless devices that disclose our locations,
`identities, or both. Though this fear has focused on large scale
`wireless systems, such as cellular phone networks, the capability
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`MobiCom’07, September 9–14, 2007, Montréal, Québec, Canada.
`Copyright 2007 ACM 978-1-59593-681-3/07/0009 ...$5.00.
`
`to track user location in such systems has typically been limited
`to service providers that are legally bound to protect our privacy.
`In contrast, the low cost of 802.11 hardware and ease of access to
`network monitoring software—all that is required for someone to
`locate others nearby and eavesdrop on their traffic—enable any-
`one to track users. Furthermore, although the popular press raised
`awareness about tracking threats posed by emerging wireless tech-
`nologies, such as RFID [13], no such campaign has been waged to
`educate users about 802.11 devices and networks, which pose the
`same threats today.
`The best practices for securing 802.11 networks, embodied in
`the 802.11i standard [16], provide user authentication, service au-
`thentication, data confidentiality, and data integrity. However, they
`do not provide anonymity, a property essential to prevent location
`tracking. For example, it is trivial to track an 802.11 device today
`since each device advertises a globally unique and persistent MAC
`address with every frame that it transmits. To mask this identifier,
`researchers have proposed applying pseudonyms [14, 18] (i.e., tem-
`porary, unlinkable names) by having users periodically change the
`MAC addresses of their 802.11 devices.
`In this paper, we demonstrate that pseudonyms are insufficient
`to provide anonymity in 802.11. Even without a unique address,
`characteristics of users’ 802.11 traffic can identify them implicitly
`and track them with high accuracy. An example of such an im-
`plicit identifier is the IP address of a service that a user frequently
`accesses, such as his or her email server. In a population of sev-
`eral hundred users, this address might be unique to one individual;
`thus, the mere observation of this IP address would indicate the
`presence of that user. Of course, in a wireless network that em-
`ploys link-layer encryption, IP addresses would not be visible to
`an eavesdropper. However, other implicit identifiers would remain
`and these identifiers can be used in combination to identify users
`accurately.
`This paper quantifies how well a passive adversary can track
`users with four implicit identifiers visible to commodity hardware.
`We thereby place a lower bound on how accurately users can be
`identified implicitly, as more implicit identifiers and more capable
`adversaries exist in practice. We make the following contributions:
`
`• We identify four previously unrecognized implicit identifiers:
`network destinations, network names advertised in 802.11
`probes, differing configurations of 802.11 options, and sizes
`of broadcast packets that hint at their contents.
`
`• We develop an automated procedure to identify users. This
`procedure allows us to quantify how much information im-
`plicit identifiers, both alone and in combination, reveal about
`several hundred users in three empirical 802.11 traces.
`
`IA1018
`
`Page 1 of 12
`
`

`

`• Our evaluation shows that users emit highly discriminating
`implicit identifiers, and, thus, even a small sample of network
`traffic can identify them more than half (56%) of the time in
`public networks, on average. Moreover, we will almost never
`mistake them as the source of other network traffic (1% of the
`time). Since adversaries will obtain multiple traffic samples
`from a user over time, this high accuracy in traffic classi-
`fication enables them to track many users with even higher
`accuracy in common wireless networks. For example, an ad-
`versary can identify 64% of users with 90% accuracy when
`they spend a day at a busy hot spot that serves 25 concurrent
`users each hour.
`
`• To our knowledge, we are the first to show with empirical
`evidence that design considerations beyond eliminating ex-
`plicit identifiers, such as unique names and addresses, must
`be addressed to protect anonymity in wireless networks.
`
`In Section 2 we illustrate the power of implicit identifiers with
`several real examples. Section 3 covers related work. Section 4 ex-
`plains our experimental methodology. Section 5 describes our em-
`pirical 802.11 traces. Section 6 analyzes how well 802.11 users can
`be identified using each implicit identifier individually. Section 7
`examines how accurately an adversary can track people using these
`implicit identifiers in public, home, and enterprise networks. We
`conclude in Section 8.
`
`2. THE IMPLICIT IDENTIFIER PROBLEM
`How significantly do implicit identifiers erode location privacy?
`Consider the seemingly innocuous trace of 802.11 traffic collected
`at the 2004 SIGCOMM conference, now anonymized and archived
`for public use [31]. Interestingly, hashing real MAC addresses to
`pseudonyms is also the best practice for anonymizing traces such
`as this. Unfortunately, implicit identifiers remain and they are suf-
`ficient to identify many SIGCOMM attendees. For example:
`Implicit identifiers can identify us uniquely. One particular at-
`tendee’s laptop transmitted requests for the network names “MIT,”
`“StataCenter,” and “roofnet,” identifying him or her as someone
`probably from Cambridge, MA. This occurred because the default
`behavior of a Windows laptop is to actively search for the user’s
`preferred networks by name, or Service Set Identifier (SSID). The
`SSID “therobertmorris” perhaps identifies this person uniquely [26].
`A second attendee requested “University of Washington” and “djw.”
`The last SSID is unique in the SIGCOMM trace and suggests that
`this person may be University of Washington Professor David J.
`Wetherall, one of our coauthors. More distressingly, Wigle [39],
`an online database of 802.11 networks observed around the world,
`shows that there is only one “djw” network in the entire Seattle
`area. Wigle happens to locate this network within 192 feet of David
`Wetherall’s home.
`Implicit identifiers remain even when counter measures are em-
`ployed. Another SIGCOMM attendee transferred 512MB of data
`via BitTorrent (this user contacted hosts on the typical BitTorrent
`port, 6881). A request for the SSID “roofnet” [32] from the same
`MAC address suggests that this user is from Cambridge, MA. Sup-
`pose that this user had been more stealthy and changed his or her
`MAC address periodically. In this particular case, since the user
`had not requested the SSID during the time he or she had been
`downloading, the MAC address used in the SSID request would
`have been different from the one used in BitTorrent packets. There-
`fore, we would not be able to use the MAC address to explicitly link
`“roofnet” to this poor network etiquette. However, the user does ac-
`cess the same SSH and IMAP server nearly every hour and was the
`
`only user at SIGCOMM to do so. Thus, this server’s address is an
`implicit identifier, and knowledge of it enables us to link the user’s
`sessions together.
`Now suppose that the network employed link-layer encryption
`scheme, such as WPA, that obscures network addresses. Even then,
`we could link this user’s sessions together by employing the fact
`that, of the 341 users that sent 802.11 broadcast packets, this was
`the only one that sent broadcast packets of sizes 239, 245, and 257
`bytes and did so repeatedly throughout the entire conference. Fur-
`thermore, the identical 802.11 capabilities advertised in each ses-
`sion’s management frames improves our confidence of this link-
`age because these capabilities differentiate different 802.11 cards
`and drivers. Prior research has shown that peer-to-peer file shar-
`ing traffic can be detected through encryption [42]. Thus, even if
`pseudonyms and link-layer encryption were employed, we could
`still implicate someone in Cambridge.
`Implicit identifiers are exposed by design flaws. These exam-
`ples illustrate three shortcomings of the 802.11 protocol beyond
`exposing explicit identifiers, none of which is trivially fixed. These
`shortcomings afflict not only 802.11 but many wireless protocols,
`including Bluetooth and ZigBee.
`Identifying information exposed at higher layers of the network
`stack is not adequately masked. For example, even with encryption,
`packet sizes can be identifying. Padding, decoy transmissions, and
`delays may hide information exposed by size and timing channels,
`but increase overhead. For example, Sun et al. [34] found that 8 to
`16 KB of padding is required to hide the identity of web objects.
`The performance penalty due to this overhead would be especially
`acute in wireless networks due to shared nature of the medium.
`Identifying information during service discovery is not masked.
`802.11 service discovery can not be encrypted since no shared keys
`exist prior to association. This raises the more general problem
`of how two devices can discover each other in a private manner,
`which is expensive to solve [4]. This problem arises not only when
`searching for access points, but also when clients want to locate
`devices in ad hoc mode, such as when using a Microsoft Zune to
`share music or a Nintendo DS to play games with friends.
`Identifying information exposed by variations in implementation
`and configuration is not masked. Each 802.11 implementation typ-
`ically supports different 802.11 features (e.g., supported rates) and
`has different timing characteristics. This problem is difficult to
`solve due to the inherent ambiguity of human specifications and
`manufacturers’ and network implementers’ desire for flexibility to
`meet differing constraints.
`Balancing the costs involved in rectifying these shortcomings
`with the incentives necessary for deployment is itself a challenge.
`Nonetheless, rectifying these flaws at the protocol level is impor-
`tant so that users need not limit their activities in order to protect
`their location privacy. By measuring the magnitude with which
`each flaw contributes to the implicit identifier problem, our study
`provides insight into the proper trade-offs to make when correcting
`these design flaws in future wireless protocols. In the short term,
`our study may give guidance to individuals that are willing to pro-
`actively hide their identity in existing wireless networks.
`In the remainder of this paper, we examine how these shortcom-
`ings impact the location privacy of a large number of users in differ-
`ent 802.11 networks and demonstrate that the examples described
`in this section are not isolated anomalies.
`
`3. RELATED WORK
`The challenge of hiding a user’s identity has been examined in
`three different contexts: location privacy, identity hiding designs,
`
`IA1018
`
`Page 2 of 12
`
`

`

`and the study of other implicit identifiers. In this section, we de-
`scribe the previous work in each of these areas.
`Location Privacy. Location privacy has recently received signifi-
`cant attention, most notably in the RFID [13] and pervasive com-
`puting [7] fields. The concern is that location-aware applications,
`which use GPS and other positioning technologies, might reveal
`this information in undesirable ways. However, location privacy
`is threatened even by devices that do not explicitly track location.
`Since 802.11 users usually associate with access points that are less
`than tens of meters away, knowing the access point that a user is as-
`sociated with gives away a coarse estimate of his location, such as
`his home or workplace. Moreover, systems that can employ multi-
`ple monitoring locations can use wireless signal strength to obtain
`an even more accurate estimate of a user’s location [6, 35]. An
`added complication is that wireless devices are rapidly becoming
`integral parts of our daily lives. A resulting trend, which is evident
`from examining databases of access point locations [39], is the in-
`creasing availability of service, which is increasing the number of
`location tracking opportunities. Unfortunately, identifying individ-
`ual users is often trivial since the 802.11 devices that they use are
`uniquely named by their MAC addresses.
`Identity Hiding. Pseudonyms are widely used in systems, such
`as the GSM cellular phone network [15] to hide user identities.
`Gruteser et al. [14] and Jiang et al. [18] proposed using pseudonyms
`within 802.11 networks, and Stajano et al. [41] proposed a similar
`mechanism for Bluetooth. Using pseudonyms is a necessary first
`step to make tracking in these networks more difficult. However,
`we show that it is insufficient to protect location privacy because
`implicit identifiers can be sufficient to track users in many real sce-
`narios.
`Implicit Identifiers. Fingerprinting devices using implicit identi-
`fiers is not a new concept. For example, Franklin et al. [11] showed
`that it is possible to fingerprint device drivers using the timing of
`802.11 probes. In contrast, our work attempts to pin down actual
`user identities rather than selecting among a few dozen drivers.
`Kohno et al. [21] showed that devices could be fingerprinted us-
`ing the clock skew exposed by TCP timestamps. We introduce new
`implicit identifiers that are useful in identifying users and, in con-
`trast to TCP timestamps, three of our identifiers are still visible in
`wireless networks using link-layer encryption. Moreover, Kohno et
`al. note that one limitation of their work is that an adversary can not
`passively obtain timestamps from devices running the most preva-
`lent operating system, Windows XP. For example, in two of our
`empirical traces, only 32% and 15% of the users sent TCP times-
`tamps. All our identifiers have much at least 55% coverage.
`Padmanabhan and Yang [29] explored fingerprinting users with
`“clickprints,” or the paths that users take through a website. Their
`techniques rely on data from many user sessions collected at ac-
`tual web servers. Our techniques can be employed passively by
`anyone with a wireless card without even associating to a network.
`These three research efforts compliment ours, since the procedure
`we develop for identifying users enables an adversary to use these
`implicit identifiers in combination with ours, yielding even more
`accurate user fingerprints. None of these previous efforts offer a
`formal method to combine multiple pieces of evidence. Moreover,
`to our knowledge, we are the first to evaluate the how well users
`are identified by implicit identifiers observed in empirical wireless
`data.
`Implicit identifiers also reveal identity in other contexts. Security
`tools like nmap [12] and p0f [28] leverage differences in network
`stack behaviors to determine a device’s operating system. Key-
`stroke dynamics have been shown to accurately identify users [24,
`
`33]. The timing and sizes of Web transfers often uniquely identify
`websites, even when transmitted over encrypted channels [8, 34].
`Finally, there has been a large body of research in identifying appli-
`cations from implicit identifiers in encrypted traffic [19, 20, 25, 42,
`43]. Like many of these techniques which succeed in classifying
`applications accurately, we use a Bayesian approach.
`
`4. EXPERIMENTAL SETUP
`This section describes the evaluation criteria we use to determine
`how well several implicit identifiers can be used to track users.
`The Adversary. Strong adversaries, such as service providers and
`large monitoring networks, obviously pose a large threat to our lo-
`cation privacy. However, the significance of the threat posed by
`802.11 is that anyone that wishes to track users can do so.
`Therefore, we consider an adversary that runs readily available
`monitoring software, such as tcpdump [37], on one or more lap-
`tops or on less conspicuous commodity 802.11 devices [3]. We
`further restrict adversaries by assuming that their devices listen
`passively. That is, they never transmits 802.11 frames, not even
`to associate with a network. This means that the adversary can not
`be detected by other radios. The adversary deploys monitoring de-
`vices in one or more locations in order to observe 802.11 traffic
`from nearby users. By considering a weak adversary, we place a
`lower bound on the accuracy with which users can be tracked, as
`stronger adversaries would be strictly more successful.
`The Environments. An adversary’s tracking accuracy will depend
`on the 802.11 networks he or she is monitoring. Since implicit
`identifiers are not perfectly identifying, it will be more difficult to
`distinguish users in more populous networks. In addition, different
`networks employ different levels of security, making some implicit
`identifiers invisible to an adversary. We consider the three domi-
`nant forms of wireless deployments today: public networks, home
`networks, and enterprise networks.
`Public networks, such as hot spots or metro-area networks [27],
`are typically unencrypted at the link-layer. Although many public
`networks employ access control—for example, to allow access to
`only a provider’s customers—most do so via authentication above
`the link-layer (e.g., through a web page) and by using MAC address
`filtering thereafter. Very few use 802.11i-compliant protocols that
`also enable encryption. Hence, identifying features at the network,
`link, and physical layers would be visible to an eavesdropper in
`such an environment. Unfortunately, this is the most common type
`of network today due to the challenge of secure key distribution.
`Home and small business networks are small, but detecting when
`specific users are present is increasingly challenging due to the
`high density of access points in urban areas [5]. In addition, these
`networks are more likely to employ link-layer encryption, such
`as WEP or WPA, because the set of authorized users is typically
`known and is small. In cases where link-layer encryption is em-
`ployed, an eavesdropper will not be able to view the payloads of
`data packets. However, features that are derived from frame sizes
`or timing, which are not masked by encryption, or from 802.11
`management frames, which are always sent in the clear, remain
`visible.
`Finally, security conscious enterprise networks are likely to em-
`ploy link-layer encryption. Moreover, if the only authorized de-
`vices on the network are provided by the company, there will be
`less diversity in the behavior of wireless cards. For example, Intel
`corporation issues similar corporate laptops to its employees. We
`consider a enterprise network where only one type of wireless card
`and configuration is in use, so users can not be identified by differ-
`ences in device implementation. However, features derived from
`
`IA1018
`
`Page 3 of 12
`
`

`

`the networks that users visit or the applications and services they
`run remain visible.
`The Monitoring Scenario. We assume that users use different
`pseudonyms during each wireless session in each of these environ-
`ments, as Gruteser et al. [14] propose. As a result, explicit iden-
`tifiers can not link their sessions together. Sessions can vary in
`length, so we assume that every hour, each user will have a differ-
`ent pseudonym. We define a traffic sample to be one user’s network
`traffic observed during one hour.
`Although it is possible for users to change their MAC addresses
`more frequently, this is unlikely to be very useful in practice be-
`cause other features, such as received signal strength, can link
`pseudonyms together at these timescales [6, 35]. Moreover, chang-
`ing a device’s MAC address forces a device to re-associate with
`the access point and, thus, disrupts active connections.
`In addi-
`tion, it may require users to revisit a web page to re-authenticate
`themselves, since MAC addresses are tied to user accounts in many
`public networks. Users are unlikely to tolerate these annoyances
`multiple times per session.
`Of course, the ability to link traffic samples together does not
`help an adversary detect a user’s presence unless the adversary is
`also able to link at least one sample to that user’s identity. In Sec-
`tion 2, we showed that identity can sometimes be revealed by cor-
`relating implicit identifiers with out-of-band information, such as
`that provided by the Wigle [39] location database. However, if the
`adversary knows the user he wishes to track, he can likely obtain a
`few traffic samples known to come from that user’s device. For ex-
`ample, an adversary could obtain such samples by physically track-
`ing a person for a short time. We assume the adversary is able to
`obtain this set of training samples either before, during, or after
`the monitoring period. Our results show that on average, only 1 to
`3 training samples are sufficient to track users with each implicit
`identifier (see Section 6.2.3). The monitor itself collects samples
`that the adversary wants to test, which we call validation samples.
`Evaluation Criteria. There are a number of questions an adver-
`sary may wish to answer with these validation samples. Who was
`present? When was user U present? Which samples came from
`user U? Essential to answering all these questions is the ability to
`classify samples by the user who generated them. In other words,
`given a validation sample, the adversary needs to answer the fol-
`lowing question for one or more users U:
`
`Question 1 Did this traffic sample come from user U?
`
`Section 6 evaluates how well an adversary can answer this question
`with each of our implicit identifiers.
`To demonstrate how well implicit identifiers can be used for
`tracking, we also evaluate the accuracy in answering the following:
`
`Question 2 Was user U here today?
`
`This question is distinct from Question 1 because an adversary can
`observe many traffic samples at any given time, any one of which
`may be from the target user U. In addition, a single affirmative
`answer to Question 1 does not necessitate a affirmative answer to
`Question 2 because an adversary may want to be more certain by
`obtaining multiple positive samples. Section 7 details the interac-
`tion between these questions and evaluates how many users can
`be tracked with high accuracy in each of the 802.11 networks de-
`scribed above.
`
`5. WIRELESS TRACES
`We evaluate the implicit identifiers of users in three 802.11 traces.
`We consider sigcomm, a 4 day trace taken from one monitoring
`point at the 2004 SIGCOMM conference [31], ucsd, a trace of all
`802.11 traffic in U.C. San Diego’s computer science building on
`November 17, 2006 [10], and apt, a 19 day trace monitoring all
`networks in an apartment building, which we collected. All traces
`were collected with tcpdump-like tools and only contain informa-
`tion that can be collected using standard wireless cards in monitor
`mode. The ucsd trace is the union of observations from multiple
`monitoring points. IP and MAC addresses are anonymized but are
`consistent throughout each trace (i.e., there is a unique one-to-one
`mapping between addresses and anonymized labels). Link-layer
`encryption (i.e., WEP or WPA) was not employed in either the
`sigcomm or ucsd network and neither trace recorded application
`packet payloads. In our analysis, we show that implicit identifiers
`remain even when we emulate link layer encryption and that we
`do not need packet payloads to identify users accurately. The apt
`trace only recorded broadcast management packets due to privacy
`concerns; hence, we only use it to study the one implicit identifier
`that is extracted from these packets.
`We distinguish unique users by their MAC address since it is not
`currently common practice to change it. To simulate the effect of
`using pseudonyms, we assume that every user has a different MAC
`address each hour. Hence, we have one sample per user for each
`hour that they are active. To simulate the training samples collected
`by an adversary, we split each trace into two temporally contiguous
`parts. Samples from the first part are used as training samples and
`the remainder are validation samples. We choose a training period
`in each trace long enough to profile a large number of users. For
`the sigcomm trace, the training period covers the time until the
`end of the first full day of the conference. For the ucsd trace, the
`training period covers the time until just before noon. We skip one
`hour between the training and validation periods so user activities
`at the end of the training period are less likely to carry over to the
`validation period. For the apt trace, the training period covers the
`first 5 days. We consider a user to be present during an hour if and
`only if she sends at least one data or 802.11 probe packets during
`that time; i.e., if the user is actively using or searching for a wireless
`network.1
`Table 1 shows the relevant statistics about each trace. Note that
`since can we only compute accuracy for users that were present in
`both the training and validation data, those are the only users that
`we profile. Therefore, results in this paper refer to ‘Profiled Users’
`as the total user count and not ‘Total Users.’
`
`6.
`
`IMPLICIT IDENTIFIERS
`In this section, we describe four novel implicit identifiers and
`evaluate how much information each one reveals. Our results show
`that (1) many implicit identifiers are effective at distinguishing in-
`dividual users and others are effective at distinguishing groups of
`users; (2) a non-trivial fraction of users are trackable using any one
`highly discriminating identifier; (3) on average, only 1 to 3 train-
`ing samples are required to leverage each implicit identifier to its
`full effect; and (4) at least one implicit identifier that we examine
`accurately identifies users over multiple weeks.
`1We ignore samples that only contain other 802.11 management
`frames, such as power management polls. Including samples with
`these frames would not appreciably change the characteristics of
`the sigcomm workload, but would double the number of total
`“users” in the ucsd workload. This is because many devices ob-
`served in the ucsd trace were never actively using the network; we
`ignore these idle devices.
`
`IA1018
`
`Page 4 of 12
`
`

`

`apt
`ucsd
`sigcomm
`validation
`training
`training
`validation
`training
`validation
`345
`119
`10
`11
`37
`54
`Duration (hours)
`Total Samples
`1473
`638
`587
`1240
`1974
`3391
`Frames Per Sample (median)
`92
`57
`1227
`1128
`289
`284
`196
`97
`225
`371
`377
`412
`Total Users
`Profiled Users
`39
`39
`153
`153
`337
`337
`Samples Per Profiled User (mean)
`32.2
`14.7
`3.1
`4.7
`5.5
`9.1
`Users Per Hour (mean)
`4
`5
`59
`113
`53
`64
`Table 1—Summary of relevant workload statistics and parameters. The duration reports only hours with at least one active user.
`
`6.1 Identifying Traffic Characteristics
`Network Destinations. We first consider netdests, the set of IP
`<address, port> pairs in a traffic sample, excluding pairs that are
`known to be common to all users, such as the address of the local
`network’s DHCP server. There are several reasons to believe that
`this set is relatively unique to each user. It is well known that the
`popularity of web sites has a Zipf distribution [9], so many sites are
`visited by a small number of users. In fact, in the sigcomm and
`ucsd training data, each <address, port> pair is visited by 1.15
`and 1.20 users on average, respectively. The set of sites that a user
`visits is even more likely to be unique. In addition, users are likely
`to visit some of the same sites repeatedly over time. For example,
`a user generally has only one email server and a set of bookmarked
`sites they check often [36].
`An adversary could obtain network addresses in any wireless
`network that does not enable link layer encryption. Even if users
`sent all their traffic through VPNs, the case for several users in
`the sigcomm trace, the IP addresses of the VPN servers would be
`revealing. No application or network level confidentiality mecha-
`nisms, such as SSL or IPSec, would mask this identifier either.
`SSID Probes. Next we consider ssids, the set of SSIDs in 802.11
`probes observed in a traffic sample. Windows XP and OS X add
`the SSID of a network to a preferred networks list when the client
`first associates with the network. To simplify future associations,
`subsequent attempts to discover any network will try to locate this
`network by transmitting the SSID in a probe request. As we ob-
`served in Section 2, SSID names can be distinguishing.2 In addi-
`tion, probes are never encrypted because active probing must be
`able to occur before association and key agreement.
`There are two practical issues that limit the use of ssids as an
`implicit identifier. First, the preferred networks list changes each
`time a user adds a network, and thus a profile may degrade over
`time. Second, clients transmit the SSIDs on their preferred net-
`works lists only when attempting to discover service. Therefore,
`clients may not probe for distinguishing SSIDs very often. While
`this is true, our results show that when distinguishing SSIDs are
`probed for, they can often uniquely identify a user. Since all users
`in the monitoring area are likely to use the SSIDs of the networks
`being monitored, these SSIDs are not distinguishing and we do not
`include them in the ssids set.
`Broadcast Packet Sizes. We now consider bcast, the set of 802.11
`broadcast packet sizes in each traffic sample. Many applications
`broadcast packets to advertise their existence to other machines on
`the local network. Due to the nature of this function, these packets
`
`2A recent patch [23] to Windows XP allows a user to disable ac-
`tive probing, but it remains enabled by default because disabling it
`would break association in networks where the access point does
`not announce itself. In addition, revealing probes or beacons are
`still required for devices to discover each other in ad hoc mode.
`
`Application
`Port Number of Sizes
`NA
`14
`wireless driver or OS
`DHCP
`67
`14
`sunrpc
`111
`1
`NetBIOS
`138
`7
`groove-dpp
`1211
`1
`Microsoft Office v.X 2222
`1
`FileMaker Pro
`5003
`7
`X Windows
`6000
`1
`Table 2—A list of the most unique broadcast packets observed in
`the sigcomm trace. The third column shows the number of packet
`sizes that were emitted by at most 2 users.
`
`often contain naming information. For example, in our traces, we
`observed many Windows machines broadcasting NetBIOS naming
`advertisements and applications such as FileMaker and Microsoft
`Office advertising themselves.
`Since these packets vary in length, their sizes can reveal infor-
`mation about their content even if the content itself is encrypted.
`Packet sizes alone appear to distinguish users almost as well as
`<application, size> tuples. For

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket