跳到主要导航 跳到搜索 跳到主要内容

IDE: A System for Iterative Mislabel Detection

  • Beijing Institute of Technology
  • University of Arizona
  • The Hong Kong University of Science and Technology (Guangzhou)
  • Renmin University of China
  • Tsinghua University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

While machine learning techniques, especially deep neural networks, have shown remarkable success in various applications, their performance is adversely affected by label errors in training data. Acquiring high-quality annotated data is both costly and time-consuming in real-world scenarios, requiring extensive human annotation and verification. Consequently, many industry-applied models are trained over data containing substantial noise, significantly degrading the performance of these models. To address this critical issue, we demonstrate IDE, a novel system that iteratively detects mislabeled instances and repairs the wrong labels. Specifically, IDE leverages the early loss observation and influence-based verification to iteratively identify mislabeled instances. When the mislabeled instances are obtained in each iteration, IDE will repair their labels to enhance detection accuracy for subsequent iterations. The framework automatically determines the termination point when the early loss is no longer effective. For uncertain instances, it generates pseudo labels to train a binary classification model, leveraging the model's generalization ability to make the final decision. With a real-life scenario, we demonstrate that IDE produces high-quality training data by effective mislabel detection and repair.

源语言英语
主期刊名SIGMOD-Companion 2024 - Companion of the 2024 International Conferaence on Management of Data
出版商Association for Computing Machinery
500-503
页数4
ISBN(电子版)9798400704222
DOI
出版状态已出版 - 9 6月 2024
活动2024 International Conference on Management of Data, SIGMOD 2024 - Santiago, 智利
期限: 9 6月 202415 6月 2024

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN(印刷版)0730-8078

会议

会议2024 International Conference on Management of Data, SIGMOD 2024
国家/地区智利
Santiago
时期9/06/2415/06/24

指纹

探究 'IDE: A System for Iterative Mislabel Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此