The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets

Yi Sun; Jun Zheng; Lingjuan Lyn; Hanyu Zhao; Jiaxing Li; Yunteng Tan; Xinyu Liu; Yuanzhang Li

doi:10.3390/electronics12112353

The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets

Yi Sun, Jun Zheng, Lingjuan Lyn, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yuanzhang Li^*

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.

源语言	英语
文章编号	2353
期刊	Electronics (Switzerland)
卷	12
期	11
DOI	https://doi.org/10.3390/electronics12112353
出版状态	已出版 - 6月 2023

访问文件

10.3390/electronics12112353

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5b799a7ea9eb45458038a5e958216a59,

title = "The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets",

abstract = "Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.",

keywords = "Calinski Harabasz, clustering, correlation, datasets, deepfake, traceability",

author = "Yi Sun and Jun Zheng and Lingjuan Lyn and Hanyu Zhao and Jiaxing Li and Yunteng Tan and Xinyu Liu and Yuanzhang Li",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = jun,

doi = "10.3390/electronics12112353",

language = "English",

volume = "12",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "11",

}

TY - JOUR

T1 - The Same Name Is Not Always the Same

T2 - Correlating and Tracing Forgery Methods across Various Deepfake Datasets

AU - Sun, Yi

AU - Zheng, Jun

AU - Lyn, Lingjuan

AU - Zhao, Hanyu

AU - Li, Jiaxing

AU - Tan, Yunteng

AU - Liu, Xinyu

AU - Li, Yuanzhang

PY - 2023/6

Y1 - 2023/6

N2 - Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.

AB - Deepfakes are becoming increasingly ubiquitous, particularly in facial manipulation. Numerous researchers and companies have released multiple datasets of face deepfakes labeled to indicate different methods of forgery. However, naming these labels is often arbitrary and inconsistent, leading to the fact that most researchers now choose to use only one of the datasets for research work. However, researchers must use these datasets in practical applications and conduct traceability research. In this study, we employ some models to extract forgery features from various deepfake datasets and utilize the K-means clustering method to identify datasets with similar feature values. We analyze the feature values using the Calinski Harabasz Index method. Our findings reveal that datasets with the same or similar labels in different deepfake datasets exhibit different forgery features. We proposed the KCE system to solve this problem, which combines multiple deepfake datasets according to feature similarity. We analyzed four groups of test datasets and found that the model trained based on KCE combined data faced unknown data types, and Calinski Harabasz scored 42.3% higher than combined by forged names. Furthermore, it is 2.5% higher than the model using all data, although the latter has more training data. It shows that this method improves the generalization ability of the model. This paper introduces a fresh perspective for effectively evaluating and utilizing diverse deepfake datasets and conducting deepfake traceability research.

KW - Calinski Harabasz

KW - clustering

KW - correlation

KW - datasets

KW - deepfake

KW - traceability

UR - http://www.scopus.com/inward/record.url?scp=85161543016&partnerID=8YFLogxK

U2 - 10.3390/electronics12112353

DO - 10.3390/electronics12112353

M3 - Article

AN - SCOPUS:85161543016

SN - 2079-9292

VL - 12

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 11

M1 - 2353

ER -

The Same Name Is Not Always the Same: Correlating and Tracing Forgery Methods across Various Deepfake Datasets

摘要

访问文件

其它文件与链接

指纹

引用此