Skip to main navigation Skip to search Skip to main content

DALL: Data Labeling via Data Programming and Active Learning Enhanced by Large Language Models

  • Guozheng Li*
  • , Ao Wang
  • , Shaoxiang Wang
  • , Yu Zhang
  • , Pengcheng Cao
  • , Yang Bai
  • , Chi Harold Liu
  • *Corresponding author for this work
  • Beijing Institute of Technology
  • University of Oxford
  • People's Daily

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning models for natural language processing rely heavily on high-quality labeled datasets. However, existing labeling approaches often struggle to balance label quality with labeling cost. To address this challenge, we propose DALL, a text labeling framework that integrates data programming, active learning, and large language models. DALL introduces a structured specification that allows users and large language models to define labeling functions via configuration, rather than code. Active learning identifies informative instances for review, and the large language model analyzes these instances to help users correct labels and to refine or suggest labeling functions. We implement DALL as an interactive labeling system for text labeling tasks. Comparative, ablation, and usability studies demonstrate DALL's efficiency, the effectiveness of its modules, and its usability.

Original languageEnglish
Title of host publicationCHI 2026 - Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
EditorsNuria Oliver, David A. Shamma, Heloisa Candello, Pablo Cesar, Pedro Lopes, Alessandro Bozzon, Thomas Kosch, Vera Liao, Xiaojuan Ma, Valentino Artizzu, Fiona Draxler, Gustavo Lopez, Anke V. Reinschluessel, Xin Tong, Phoebe O. Toups Dugas
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400722783
DOIs
Publication statusPublished - 13 Apr 2026
Event2026 CHI Conference on Human Factors in Computing Systems, CHI 2026 - Barcelona, Spain
Duration: 13 Apr 202617 Apr 2026

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2026 CHI Conference on Human Factors in Computing Systems, CHI 2026
Country/TerritorySpain
CityBarcelona
Period13/04/2617/04/26

Keywords

  • Data labeling
  • active learning
  • data programming
  • interactive machine learning
  • large language model

Fingerprint

Dive into the research topics of 'DALL: Data Labeling via Data Programming and Active Learning Enhanced by Large Language Models'. Together they form a unique fingerprint.

Cite this