**ABSTRACT**

The popularity of mobile devices and location-based services (LBS) have created great concerns regarding the location privacy of the users of such devices and services. Anonymization is a common technique that is often being used to protect the location privacy of LBS users.

This technique assigns a random pseudonym to each user and these pseudonyms can change over time. Here, we provide a general information theoretic definition for perfect location privacy and prove that perfect location privacy is achievable for mobile devices when using the anonymization technique appropriately.

First, we assume that the user’s current location is independent from her past locations. Using this i.i.d model, we show that if the pseudonym of the user is changed before O(n2r−1) number of anonymized observations is made by the adversary for that user, then she has perfect location privacy, where n is the number of users in the network and r is the number of all possible locations that the user might occupy.

Then, we model each user’s movement by a Markov chain so that a user’s current location depends on his previous locations, which is a more realistic model when approximating real world data. We show that perfect location privacy is achievable in this model if the pseudonym of the user is changed before O(n2|E|−r)anonymized observationsis collected by the adversary for that user where |E|is the number of edges in the user’s Markovmodel.

**PRELIMINARIES**

** Defining Perfect Location Privacy:**

Let us consider a network with a large number of users. In the proposed framework, an identity perturbation LPPM known as anonymization technique is used to protect the privacy of the users which assigns random pseudonyms to users over time. An adversary is observing this network over time and her intention is to identify anonymized users by tracking their traces over time.

In this framework, the strongest adversary is assumed to be observing the network. This adversary has the complete statistical knowledge of the users’ movement from her past observations or other sources and she can describe the users’ movement as a random process on the corresponding geographic area.

** Defining the Anonymization Technique:**

In this framework, to achieve location privacy, the LPPM performs an anonymization method and changes the identifier of each user with a random pseudonym. That is, it performs a random permutation Π(n) on the set of n users and then assigns the pseudonym Π(n) (u) to user u.

Π(n):{1,2,···,n}→{1,2,···,n}

**ACHIEVING PERFECT LOCATION PRIVACY**

** Perfect Location Privacy for Two-State i.i.d Model:**

To get a better insight about the location privacy problem, here we consider a simple scenario where there are two locations, location 0 and 1. At any time k∈{0,1,2,···}, user u has probability p u ∈(0,1) to be at location1, independent from previous locations and other users’ locations. Therefore, Xu(k)∼ Bernoulli (pu).

**Proof of Theorem 1 (Perfect Location Privacy for Two-State Model):**

Here, we provide a formal proof for Theorem 1. In the proposed setting, we assume we have an infinite number of potential users indexed by integers, and at any step we consider a network consisting of n users, i.e., users1,2,···,n. We would like to show perfect location privacy when n goes to infinity. Remember that X u(t) shows the location of user u at time t.

**Perfect Location Privacy for r-States i.i.d. Mode:**

Here we extend the results to a scenario in which we have r2 locations or regions, locations 0, 1,···, r−1 . At any time k∈{0,1,2,···}, user u has probability p u(i)∈(0,1) to be at location i, independent from previous locations and other users’ locations.

**SIMULATION**

** I.I.D. Model:**

In the two states model with n number of users, let p u be the probability of user u being at state 1 and the observation vector Y (m) consists of m observation for each user

If the following holds,

- m=cn2−α, whichc,α >0 and are constant
- p1∈(0,1)
- (p2,p3,···,pn)∼fP,0< δ1< fP< δ2
- P = (p1,p2,···,pn)be known to the adversary

then, we have perfect location privacy for all the users, e.g. for the first user we have,

**∀k∈N,limn→∞I(X1(k);Y(m))= 0**

**Markov Chain Model:**

With n number of users, observation vector Y (m) consists of m observations for each user.For an irreducible, aperiodic Markov chain with r states and |E| edges, if m = cn2 |E|−r−β, where c >0 and β > 0 are constants, then

limn→∞I(X1(k);Y(m)) = 0,∀k∈N

**CONCLUSION AND FUTURE WORK**

We provided an information theoretic definition for perfect location privacy using the mutual

information between users actual location and the anonymized observation that the adversary collects. Here, we assumed the strongest adversary who has all the statistical knowledge of the users’ movements. The anonymization technique creates a pseudonym for each user at any time using a random permutation on the set of all users,{1,2,···,n}.

First, we model users movements independent from their previous locations. In this model, we have n users and r locations. We prove that if the number of observations that the adversary collects, m, is less than O(n2r−1), then users will have perfect location privacy. So, if the anonymization method changes the pseudonyms of the users before m observations made by the adversary for each user, then the adversary cannot distinguish between users.

We assumed that the location of a user is independent from his previous locations and also independent from other users’ location. This assumption will fail immediately using this framework for real world data. Markov chain models are known to be more realistic models in terms of modeling users’ movement rather than independent patterns. In Markov chain models, users’ next location depends on the current location. Then, we extended our framework by using Markov chain model, a more realistic model, to model users’ movements. By using the same notion of perfect location privacy we show the feasibility of achieving perfect location privacy using Markov chain.

Using Markov chains we prove that perfect location privacy is achievable if the pseudonym of the user is changed before O(n2|E|−r) observations is made by the adversary. If the anonymization method changes the pseudonyms of the users before m observations made by the adversary for each user, then all the users have perfect location privacy.

Several issues may arise in such a framework. Achieving perfect location privacy is dependent on how unique the Markov chain of a user is. In the best case, all the users have a same Markov chain model of movements with similar transition probabilities. In this case, adversary cannot distinguish between them by observing even for large amount of time. On the other hand, some users may have a very unique Markov model for their movements in which case, the adversary is able to find the user with very limited number of observations.

Users can be classified in to two groups: (1) users who have perfect location privacy if the number of observations collected by the adversary, m , is below some threshold, (2) users who will never achieve perfect location privacy when only anonymization is used. That is, a finite number of observations is enough to give the adversary a strictly positive probability of identifying a user correctly. The key to the above analysis seems to be in defining a uniqueness measure for the user’s Markov chain. That is, users who have too unique transition graphs are insecure in terms of location privacy and other privacy protecting mechanisms have to get involved.

Extending this work using other location privacy protecting mechanisms can help users to protect their location information. Other LPPMs such as location obfuscation LPPMs allow users to report their location less precisely. Users are able to add noise or hide their location for certain amounts of time. By adding this method to our framework, users are able to both change their pseudonyms over time and also slightly change their location before reporting it. This may result to achieve more common Markov models for all the users’ movements and since this change may decrease the uniqueness of users’ Markov chain, they are more likely to achieve perfect location privacy.

Source: University of Massachusetts

Author: Zarrin Montazeri