TY - JOUR
T1 - Static tainting extraction approach based on information flow graph for personally identifiable information
AU - Liu, Yi
AU - Liao, Lejian
AU - Song, Tian
N1 - Publisher Copyright:
© 2020, Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Personally identifiable information (PII) is widely used for many aspects such as network privacy leak detection, network forensics, and user portraits. Internet service providers (ISPs) and administrators are usually concerned with whether PII has been extracted during the network transmission process. However, most studies have focused on the extractions occurring on the client side and server side. This study proposes a static tainting extraction approach that automatically extracts PII from large-scale network traffic without requiring any manual work and feedback on the ISP-level network traffic. The proposed approach does not deploy any additional applications on the client side. The information flow graph is drawn via a tainting process that involves two steps: inter-domain routing and intra-domain infection that contains a constraint function (CF) to limit the “over-tainting”. Compared with the existing semantic-based approach that uses network traffic from the ISP, the proposed approach performs better, with 92.37% precision and 94.04% recall. Furthermore, three methods that reduce the computing time and the memory overhead are presented herein. The number of rounds is reduced to 0.0883%, and the execution time overhead is reduced to 0.0153% of the original approach.
AB - Personally identifiable information (PII) is widely used for many aspects such as network privacy leak detection, network forensics, and user portraits. Internet service providers (ISPs) and administrators are usually concerned with whether PII has been extracted during the network transmission process. However, most studies have focused on the extractions occurring on the client side and server side. This study proposes a static tainting extraction approach that automatically extracts PII from large-scale network traffic without requiring any manual work and feedback on the ISP-level network traffic. The proposed approach does not deploy any additional applications on the client side. The information flow graph is drawn via a tainting process that involves two steps: inter-domain routing and intra-domain infection that contains a constraint function (CF) to limit the “over-tainting”. Compared with the existing semantic-based approach that uses network traffic from the ISP, the proposed approach performs better, with 92.37% precision and 94.04% recall. Furthermore, three methods that reduce the computing time and the memory overhead are presented herein. The number of rounds is reduced to 0.0883%, and the execution time overhead is reduced to 0.0153% of the original approach.
KW - constraint function
KW - information flow graph
KW - inter-domain routing
KW - intra-domain infection
KW - network privacy leak detection
KW - network traffic analysis
KW - personally identifiable information
KW - static tainting
UR - http://www.scopus.com/inward/record.url?scp=85079571513&partnerID=8YFLogxK
U2 - 10.1007/s11432-018-9839-6
DO - 10.1007/s11432-018-9839-6
M3 - Article
AN - SCOPUS:85079571513
SN - 1674-733X
VL - 63
JO - Science China Information Sciences
JF - Science China Information Sciences
IS - 3
M1 - 132104
ER -