TY - GEN
T1 - Your trajectory privacy can be breached even if you walk in groups
AU - Sui, Kaixin
AU - Zhao, Youjian
AU - Liu, Dapeng
AU - Ma, Minghua
AU - Xu, Lei
AU - Zimu, Li
AU - Pei, Dan
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/10/13
Y1 - 2016/10/13
N2 - The enterprise Wi-Fi networks enable the collection of large-scale users' mobility information at an indoor level. The collected trajectory data is very valuable for both research and commercial purposes, but the use of the trajectory data also raises serious privacy concerns. A large body of work tries to achieve k-anonymity (hiding each user in an anonymity set no smaller than k) as the first step to solve the privacy problem. Yet it has been qualitatively recognized that k-anonymity is still risky when the diversity of the sensitive information in the k-anonymity set is low. There, however, still lacks a study that provides a quantitative understanding of that risk in the trajectory dataset. In this work, we present a large-scale measurement based analysis of the low-diversity risk over four weeks of trajectory data collected from Tsinghua, a campus that covers an area of 4 km2, on which 2,670 access points are deployed in 111 buildings. Using this dataset, we highlight the high risk of the low diversity. For example, we find that even when 5-anonymity is satisfied, the sensitive attributes of 25% of individuals can be easily guessed. We also find that although a larger k increases the size of anonymity sets, the corresponding improvement on the diversity of anonymity sets is very limited (decayed exponentially). These results suggest that diversity-oriented solutions are necessary.
AB - The enterprise Wi-Fi networks enable the collection of large-scale users' mobility information at an indoor level. The collected trajectory data is very valuable for both research and commercial purposes, but the use of the trajectory data also raises serious privacy concerns. A large body of work tries to achieve k-anonymity (hiding each user in an anonymity set no smaller than k) as the first step to solve the privacy problem. Yet it has been qualitatively recognized that k-anonymity is still risky when the diversity of the sensitive information in the k-anonymity set is low. There, however, still lacks a study that provides a quantitative understanding of that risk in the trajectory dataset. In this work, we present a large-scale measurement based analysis of the low-diversity risk over four weeks of trajectory data collected from Tsinghua, a campus that covers an area of 4 km2, on which 2,670 access points are deployed in 111 buildings. Using this dataset, we highlight the high risk of the low diversity. For example, we find that even when 5-anonymity is satisfied, the sensitive attributes of 25% of individuals can be easily guessed. We also find that although a larger k increases the size of anonymity sets, the corresponding improvement on the diversity of anonymity sets is very limited (decayed exponentially). These results suggest that diversity-oriented solutions are necessary.
UR - http://www.scopus.com/inward/record.url?scp=85009787176&partnerID=8YFLogxK
U2 - 10.1109/IWQoS.2016.7590444
DO - 10.1109/IWQoS.2016.7590444
M3 - Conference contribution
AN - SCOPUS:85009787176
T3 - 2016 IEEE/ACM 24th International Symposium on Quality of Service, IWQoS 2016
BT - 2016 IEEE/ACM 24th International Symposium on Quality of Service, IWQoS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th IEEE/ACM International Symposium on Quality of Service, IWQoS 2016
Y2 - 20 June 2016 through 21 June 2016
ER -