跳到主要导航 跳到搜索 跳到主要内容

HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices

  • Silin Li
  • , Yuhang Guo
  • , Jiashu Yao
  • , Zeming Liu*
  • , Haifeng Wang
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Beihang University
  • Baidu Inc

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Large language models (LLMs) have the potential to revolutionize smart home assistants by enhancing their ability to accurately understand user needs and respond appropriately, which is extremely beneficial for building a smarter home environment. While recent studies have explored integrating LLMs into smart home systems, they primarily focus on handling straightforward, valid single-device operation instructions. However, real-world scenarios are far more complex and often involve users issuing invalid instructions or controlling multiple devices simultaneously. These have two main challenges: LLMs must accurately identify and rectify errors in user instructions and execute multiple user instructions perfectly. To address these challenges and advance the development of LLM-based smart home assistants, we introduce HomeBench, the first smart home dataset with valid and invalid instructions across single and multiple devices in this paper. We have experimental results on 13 distinct LLMs; e.g., GPT-4o achieves only a 0.0% success rate in the scenario of invalid multi-device instructions, revealing that the existing state-of-the-art LLMs still cannot perform well in this situation even with the help of in-context learning, retrieval-augmented generation, and fine-tuning. Our code and dataset are publicly available at the link.

源语言英语
主期刊名Long Papers
编辑Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
出版商Association for Computational Linguistics (ACL)
12230-12250
页数21
ISBN(电子版)9798891762510
DOI
出版状态已出版 - 2025
已对外发布
活动63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, 奥地利
期限: 27 7月 20251 8月 2025

出版系列

姓名Proceedings of the Annual Meeting of the Association for Computational Linguistics
1
ISSN(印刷版)0736-587X

会议

会议63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
国家/地区奥地利
Vienna
时期27/07/251/08/25

指纹

探究 'HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices' 的科研主题。它们共同构成独一无二的指纹。

引用此