Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards

Lei Xu; Chunxiao Jiang; Yi Qian; Youjian Zhao; Jianhua Li; Yong Ren

doi:10.1109/TIFS.2016.2611487

Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards

Lei Xu, Chunxiao Jiang, Yi Qian, Youjian Zhao, Jianhua Li, Yong Ren

Research output: Contribution to journal › Article › peer-review

41 Citations (Scopus)

Abstract

Recently, the conflict between exploiting the value of personal data and protecting individuals' privacy has attracted much attention. Personal data market provides a promising solution to this conflict, while determining the price of privacy is a tough issue. In this paper, we study the pricing problem in a setting where a data collector sequentially buys data from multiple data owners whose valuations of privacy are randomly drawn from an unknown distribution. To maximize the total payoff, the collector needs to dynamically adjust the prices offered to owners. We model the sequential decision-making problem of the collector as a multi-armed bandit problem with each arm representing a candidate price. Specifically, the privacy protection technique adopted by the collector is taken into account. Protecting privacy generally causes a negative effect on the value of data, and this effect is embodied by the time-variant distributions of the rewards associated with arms. Based on the classic upper confidence bound policy, we propose two learning policies for the bandit problem. The first policy estimates the expected reward of a price by counting how many times the price has been accepted by data owners. The second policy treats the time-variant data value as a context and uses ridge regression to estimate the rewards in different contexts. Simulation results on real-world data demonstrate that by applying the proposed policies, the collector can get a payoff which is close to that he can get by setting a fixed price, which is the best in hindsight, for all data owners.

Original language	English
Article number	7572170
Pages (from-to)	271-285
Number of pages	15
Journal	IEEE Transactions on Information Forensics and Security
Volume	12
Issue number	2
DOIs	https://doi.org/10.1109/TIFS.2016.2611487
Publication status	Published - Feb 2017
Externally published	Yes

Keywords

Bandit problems
Data anonymization
Dynamic pricing
Learning policy
Private data collecting

Access to Document

10.1109/TIFS.2016.2611487

Cite this

Xu, L., Jiang, C., Qian, Y., Zhao, Y., Li, J., & Ren, Y. (2017). Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards. IEEE Transactions on Information Forensics and Security, 12(2), 271-285. Article 7572170. https://doi.org/10.1109/TIFS.2016.2611487

@article{87164e0737774ca8a96885d641b0331a,

title = "Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards",

abstract = "Recently, the conflict between exploiting the value of personal data and protecting individuals' privacy has attracted much attention. Personal data market provides a promising solution to this conflict, while determining the price of privacy is a tough issue. In this paper, we study the pricing problem in a setting where a data collector sequentially buys data from multiple data owners whose valuations of privacy are randomly drawn from an unknown distribution. To maximize the total payoff, the collector needs to dynamically adjust the prices offered to owners. We model the sequential decision-making problem of the collector as a multi-armed bandit problem with each arm representing a candidate price. Specifically, the privacy protection technique adopted by the collector is taken into account. Protecting privacy generally causes a negative effect on the value of data, and this effect is embodied by the time-variant distributions of the rewards associated with arms. Based on the classic upper confidence bound policy, we propose two learning policies for the bandit problem. The first policy estimates the expected reward of a price by counting how many times the price has been accepted by data owners. The second policy treats the time-variant data value as a context and uses ridge regression to estimate the rewards in different contexts. Simulation results on real-world data demonstrate that by applying the proposed policies, the collector can get a payoff which is close to that he can get by setting a fixed price, which is the best in hindsight, for all data owners.",

keywords = "Bandit problems, Data anonymization, Dynamic pricing, Learning policy, Private data collecting",

author = "Lei Xu and Chunxiao Jiang and Yi Qian and Youjian Zhao and Jianhua Li and Yong Ren",

note = "Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2017",

month = feb,

doi = "10.1109/TIFS.2016.2611487",

language = "English",

volume = "12",

pages = "271--285",

journal = "IEEE Transactions on Information Forensics and Security",

issn = "1556-6013",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "2",

}

TY - JOUR

T1 - Dynamic privacy pricing

T2 - A multi-armed bandit approach with time-variant rewards

AU - Xu, Lei

AU - Jiang, Chunxiao

AU - Qian, Yi

AU - Zhao, Youjian

AU - Li, Jianhua

AU - Ren, Yong

PY - 2017/2

Y1 - 2017/2

N2 - Recently, the conflict between exploiting the value of personal data and protecting individuals' privacy has attracted much attention. Personal data market provides a promising solution to this conflict, while determining the price of privacy is a tough issue. In this paper, we study the pricing problem in a setting where a data collector sequentially buys data from multiple data owners whose valuations of privacy are randomly drawn from an unknown distribution. To maximize the total payoff, the collector needs to dynamically adjust the prices offered to owners. We model the sequential decision-making problem of the collector as a multi-armed bandit problem with each arm representing a candidate price. Specifically, the privacy protection technique adopted by the collector is taken into account. Protecting privacy generally causes a negative effect on the value of data, and this effect is embodied by the time-variant distributions of the rewards associated with arms. Based on the classic upper confidence bound policy, we propose two learning policies for the bandit problem. The first policy estimates the expected reward of a price by counting how many times the price has been accepted by data owners. The second policy treats the time-variant data value as a context and uses ridge regression to estimate the rewards in different contexts. Simulation results on real-world data demonstrate that by applying the proposed policies, the collector can get a payoff which is close to that he can get by setting a fixed price, which is the best in hindsight, for all data owners.

AB - Recently, the conflict between exploiting the value of personal data and protecting individuals' privacy has attracted much attention. Personal data market provides a promising solution to this conflict, while determining the price of privacy is a tough issue. In this paper, we study the pricing problem in a setting where a data collector sequentially buys data from multiple data owners whose valuations of privacy are randomly drawn from an unknown distribution. To maximize the total payoff, the collector needs to dynamically adjust the prices offered to owners. We model the sequential decision-making problem of the collector as a multi-armed bandit problem with each arm representing a candidate price. Specifically, the privacy protection technique adopted by the collector is taken into account. Protecting privacy generally causes a negative effect on the value of data, and this effect is embodied by the time-variant distributions of the rewards associated with arms. Based on the classic upper confidence bound policy, we propose two learning policies for the bandit problem. The first policy estimates the expected reward of a price by counting how many times the price has been accepted by data owners. The second policy treats the time-variant data value as a context and uses ridge regression to estimate the rewards in different contexts. Simulation results on real-world data demonstrate that by applying the proposed policies, the collector can get a payoff which is close to that he can get by setting a fixed price, which is the best in hindsight, for all data owners.

KW - Bandit problems

KW - Data anonymization

KW - Dynamic pricing

KW - Learning policy

KW - Private data collecting

UR - http://www.scopus.com/inward/record.url?scp=85013484247&partnerID=8YFLogxK

U2 - 10.1109/TIFS.2016.2611487

DO - 10.1109/TIFS.2016.2611487

M3 - Article

AN - SCOPUS:85013484247

SN - 1556-6013

VL - 12

SP - 271

EP - 285

JO - IEEE Transactions on Information Forensics and Security

JF - IEEE Transactions on Information Forensics and Security

IS - 2

M1 - 7572170

ER -

Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this