Evaluating AI-Generated Questionnaires Using LDA Topic Modeling and KMeans Clustering: A Comparative Study with Human-Designed Instruments

  • Menghan Cheng
  • , Zhaolin Lu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid development of large language models (LLMs) has opened up new opportunities for automated questionnaire design in academic research. However, questions remain about the validity and quality of AI-generated tools, especially when applied to theory-driven frameworks. The aim of this study is to assess the performance of AI-generated questionnaires and human-designed questionnaires in single and integrated theoretical models. To this end, we employ a hybrid approach that combines expert evaluation and unsupervised machine learning techniques. Specifically, we use latent dirichlet allocation (LDA) for topic modeling to assess semantic coverage. And, KMeans clustering was used to detect redundancy and assess semantic consistency. We created four questionnaires: two based on validated manual writing tools and two generated by the GPT-4. This covered the Unified Theory of Acceptance and Use of Technology (UTAUT) and its extended models. 310 The questionnaires assessed program quality on multiple dimensions. And, these judgments were objectively validated using machine learning outputs. Results show that AI-generated questionnaires are fluent and objective, but perform poorly in terms of accuracy, clarity, and comprehensiveness, especially under complex modeling conditions. Redundancy and semantic drift increased with theory integration. However, the AI performed well in areas requiring standardization and neutrality.

Original languageEnglish
Title of host publicationProceedings of 2025 2nd International Conference on Digital Systems and Design Innovation, ICDSDI 2025
PublisherAssociation for Computing Machinery, Inc
Pages150-155
Number of pages6
ISBN (Electronic)9798400719554
DOIs
Publication statusPublished - 10 Sept 2025
Externally publishedYes
Event2nd International Conference on Digital Systems and Design Innovation, ICDSDI 2025 - Zhengzhou, China
Duration: 13 Jun 202515 Jun 2025

Publication series

NameProceedings of 2025 2nd International Conference on Digital Systems and Design Innovation, ICDSDI 2025

Conference

Conference2nd International Conference on Digital Systems and Design Innovation, ICDSDI 2025
Country/TerritoryChina
CityZhengzhou
Period13/06/2515/06/25

Keywords

  • KMeans Clustering
  • Large Language Models
  • Latent Dirichlet Allocation
  • Questionnaire Design

Fingerprint

Dive into the research topics of 'Evaluating AI-Generated Questionnaires Using LDA Topic Modeling and KMeans Clustering: A Comparative Study with Human-Designed Instruments'. Together they form a unique fingerprint.

Cite this