TY - GEN
T1 - DALL
T2 - 2026 CHI Conference on Human Factors in Computing Systems, CHI 2026
AU - Li, Guozheng
AU - Wang, Ao
AU - Wang, Shaoxiang
AU - Zhang, Yu
AU - Cao, Pengcheng
AU - Bai, Yang
AU - Liu, Chi Harold
N1 - Publisher Copyright:
© 2026 Copyright held by the owner/author(s).
PY - 2026/4/13
Y1 - 2026/4/13
N2 - Deep learning models for natural language processing rely heavily on high-quality labeled datasets. However, existing labeling approaches often struggle to balance label quality with labeling cost. To address this challenge, we propose DALL, a text labeling framework that integrates data programming, active learning, and large language models. DALL introduces a structured specification that allows users and large language models to define labeling functions via configuration, rather than code. Active learning identifies informative instances for review, and the large language model analyzes these instances to help users correct labels and to refine or suggest labeling functions. We implement DALL as an interactive labeling system for text labeling tasks. Comparative, ablation, and usability studies demonstrate DALL's efficiency, the effectiveness of its modules, and its usability.
AB - Deep learning models for natural language processing rely heavily on high-quality labeled datasets. However, existing labeling approaches often struggle to balance label quality with labeling cost. To address this challenge, we propose DALL, a text labeling framework that integrates data programming, active learning, and large language models. DALL introduces a structured specification that allows users and large language models to define labeling functions via configuration, rather than code. Active learning identifies informative instances for review, and the large language model analyzes these instances to help users correct labels and to refine or suggest labeling functions. We implement DALL as an interactive labeling system for text labeling tasks. Comparative, ablation, and usability studies demonstrate DALL's efficiency, the effectiveness of its modules, and its usability.
KW - Data labeling
KW - active learning
KW - data programming
KW - interactive machine learning
KW - large language model
UR - https://www.scopus.com/pages/publications/105038711801
U2 - 10.1145/3772318.3791356
DO - 10.1145/3772318.3791356
M3 - Conference contribution
AN - SCOPUS:105038711801
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2026 - Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
A2 - Oliver, Nuria
A2 - Shamma, David A.
A2 - Candello, Heloisa
A2 - Cesar, Pablo
A2 - Lopes, Pedro
A2 - Bozzon, Alessandro
A2 - Kosch, Thomas
A2 - Liao, Vera
A2 - Ma, Xiaojuan
A2 - Artizzu, Valentino
A2 - Draxler, Fiona
A2 - Lopez, Gustavo
A2 - Reinschluessel, Anke V.
A2 - Tong, Xin
A2 - Toups Dugas, Phoebe O.
PB - Association for Computing Machinery
Y2 - 13 April 2026 through 17 April 2026
ER -