TY - JOUR
T1 - Dynamic privacy pricing
T2 - A multi-armed bandit approach with time-variant rewards
AU - Xu, Lei
AU - Jiang, Chunxiao
AU - Qian, Yi
AU - Zhao, Youjian
AU - Li, Jianhua
AU - Ren, Yong
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2017/2
Y1 - 2017/2
N2 - Recently, the conflict between exploiting the value of personal data and protecting individuals' privacy has attracted much attention. Personal data market provides a promising solution to this conflict, while determining the price of privacy is a tough issue. In this paper, we study the pricing problem in a setting where a data collector sequentially buys data from multiple data owners whose valuations of privacy are randomly drawn from an unknown distribution. To maximize the total payoff, the collector needs to dynamically adjust the prices offered to owners. We model the sequential decision-making problem of the collector as a multi-armed bandit problem with each arm representing a candidate price. Specifically, the privacy protection technique adopted by the collector is taken into account. Protecting privacy generally causes a negative effect on the value of data, and this effect is embodied by the time-variant distributions of the rewards associated with arms. Based on the classic upper confidence bound policy, we propose two learning policies for the bandit problem. The first policy estimates the expected reward of a price by counting how many times the price has been accepted by data owners. The second policy treats the time-variant data value as a context and uses ridge regression to estimate the rewards in different contexts. Simulation results on real-world data demonstrate that by applying the proposed policies, the collector can get a payoff which is close to that he can get by setting a fixed price, which is the best in hindsight, for all data owners.
AB - Recently, the conflict between exploiting the value of personal data and protecting individuals' privacy has attracted much attention. Personal data market provides a promising solution to this conflict, while determining the price of privacy is a tough issue. In this paper, we study the pricing problem in a setting where a data collector sequentially buys data from multiple data owners whose valuations of privacy are randomly drawn from an unknown distribution. To maximize the total payoff, the collector needs to dynamically adjust the prices offered to owners. We model the sequential decision-making problem of the collector as a multi-armed bandit problem with each arm representing a candidate price. Specifically, the privacy protection technique adopted by the collector is taken into account. Protecting privacy generally causes a negative effect on the value of data, and this effect is embodied by the time-variant distributions of the rewards associated with arms. Based on the classic upper confidence bound policy, we propose two learning policies for the bandit problem. The first policy estimates the expected reward of a price by counting how many times the price has been accepted by data owners. The second policy treats the time-variant data value as a context and uses ridge regression to estimate the rewards in different contexts. Simulation results on real-world data demonstrate that by applying the proposed policies, the collector can get a payoff which is close to that he can get by setting a fixed price, which is the best in hindsight, for all data owners.
KW - Bandit problems
KW - Data anonymization
KW - Dynamic pricing
KW - Learning policy
KW - Private data collecting
UR - http://www.scopus.com/inward/record.url?scp=85013484247&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2016.2611487
DO - 10.1109/TIFS.2016.2611487
M3 - Article
AN - SCOPUS:85013484247
SN - 1556-6013
VL - 12
SP - 271
EP - 285
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
IS - 2
M1 - 7572170
ER -