跳到主要导航 跳到搜索 跳到主要内容

DALL: Data Labeling via Data Programming and Active Learning Enhanced by Large Language Models

  • Guozheng Li*
  • , Ao Wang
  • , Shaoxiang Wang
  • , Yu Zhang
  • , Pengcheng Cao
  • , Yang Bai
  • , Chi Harold Liu
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • University of Oxford
  • People's Daily

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Deep learning models for natural language processing rely heavily on high-quality labeled datasets. However, existing labeling approaches often struggle to balance label quality with labeling cost. To address this challenge, we propose DALL, a text labeling framework that integrates data programming, active learning, and large language models. DALL introduces a structured specification that allows users and large language models to define labeling functions via configuration, rather than code. Active learning identifies informative instances for review, and the large language model analyzes these instances to help users correct labels and to refine or suggest labeling functions. We implement DALL as an interactive labeling system for text labeling tasks. Comparative, ablation, and usability studies demonstrate DALL's efficiency, the effectiveness of its modules, and its usability.

源语言英语
主期刊名CHI 2026 - Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
编辑Nuria Oliver, David A. Shamma, Heloisa Candello, Pablo Cesar, Pedro Lopes, Alessandro Bozzon, Thomas Kosch, Vera Liao, Xiaojuan Ma, Valentino Artizzu, Fiona Draxler, Gustavo Lopez, Anke V. Reinschluessel, Xin Tong, Phoebe O. Toups Dugas
出版商Association for Computing Machinery
ISBN(电子版)9798400722783
DOI
出版状态已出版 - 13 4月 2026
活动2026 CHI Conference on Human Factors in Computing Systems, CHI 2026 - Barcelona, 西班牙
期限: 13 4月 202617 4月 2026

出版系列

姓名Conference on Human Factors in Computing Systems - Proceedings

会议

会议2026 CHI Conference on Human Factors in Computing Systems, CHI 2026
国家/地区西班牙
Barcelona
时期13/04/2617/04/26

指纹

探究 'DALL: Data Labeling via Data Programming and Active Learning Enhanced by Large Language Models' 的科研主题。它们共同构成独一无二的指纹。

引用此