TPII: tracking personally identifiable information via user behaviors in HTTP traffic

Yi Liu; Tian Song; Lejian Liao

doi:10.1007/s11704-018-7451-z

TPII: tracking personally identifiable information via user behaviors in HTTP traffic

Yi Liu, Tian Song^*, Lejian Liao

^*此作品的通讯作者

网络空间安全学院

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.

源语言	英语
文章编号	143801
期刊	Frontiers of Computer Science
卷	14
期	3
DOI	https://doi.org/10.1007/s11704-018-7451-z
出版状态	已出版 - 1 6月 2020

访问文件

10.1007/s11704-018-7451-z

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{268d29d2429c46b482e2ed41df7d6e58,

title = "TPII: tracking personally identifiable information via user behaviors in HTTP traffic",

abstract = "It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users{\textquoteright} devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users{\textquoteright} behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.",

keywords = "HTTP, mobile applications, network traffic analysis, personally identifiable information, privacy leakage",

author = "Yi Liu and Tian Song and Lejian Liao",

note = "Publisher Copyright: {\textcopyright} 2019, Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2020",

month = jun,

day = "1",

doi = "10.1007/s11704-018-7451-z",

language = "English",

volume = "14",

journal = "Frontiers of Computer Science",

issn = "2095-2228",

publisher = "Higher Education Press Limited Company",

number = "3",

}

TY - JOUR

T1 - TPII

T2 - tracking personally identifiable information via user behaviors in HTTP traffic

AU - Liu, Yi

AU - Song, Tian

AU - Liao, Lejian

PY - 2020/6/1

Y1 - 2020/6/1

N2 - It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.

AB - It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.

KW - HTTP

KW - mobile applications

KW - network traffic analysis

KW - personally identifiable information

KW - privacy leakage

UR - http://www.scopus.com/inward/record.url?scp=85076894252&partnerID=8YFLogxK

U2 - 10.1007/s11704-018-7451-z

DO - 10.1007/s11704-018-7451-z

M3 - Article

AN - SCOPUS:85076894252

SN - 2095-2228

VL - 14

JO - Frontiers of Computer Science

JF - Frontiers of Computer Science

IS - 3

M1 - 143801

ER -

TPII: tracking personally identifiable information via user behaviors in HTTP traffic

摘要

访问文件

其它文件与链接

指纹

引用此