An efficient big data framework for validating the random walk hypothesis in high-frequency markets via neural networks and large language models

  • Yueyue Sun
  • , Chi Chiu So
  • , Su Tan
  • , Siu Pang Yung
  • , Junmin Wang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Financial market efficiency, commonly formalized through the Random Walk Hypothesis, remains a central issue in quantitative finance. Conventional statistical tests, while rigorous, often provide limited insight into the practical predictability of market prices. To complement these tests, we propose Machine Learning Market Randomness Testing (MART), an efficient prediction-based framework that evaluates market efficiency through the directional forecasting performance of machine learning models. Within this framework, simple neural networks (NNs) and large language models (LLMs) serve as predictive agents for validating the effectiveness of the proposed approach. The LLM module further employs compact batching and iterative summarization to efficiently process large-scale high-frequency datasets while reducing computational cost and preventing information leakage. Empirical results from the MART framework, applied to high-frequency data at tick, 1-min, 5-min, and 15-min intervals across ten major global stock indices, reveal frequency-dependent deviations from market efficiency. At finer temporal resolutions—particularly at tick, 1-min, and 5-min levels—MART identifies statistically significant predictability consistent with classical statistical tests and translates it into economically meaningful cumulative returns through NN-based predictions, whereas LLM-based implementations fail to demonstrate comparable forecasting performance under few-shot conditions. Overall, MART establishes a generalizable and statistically grounded approach for testing market efficiency, bridging predictive modeling with formal inference, and providing new empirical evidence on frequency-dependent deviations from the Random Walk Hypothesis.

Original languageEnglish
Article number131358
JournalExpert Systems with Applications
Volume311
DOIs
Publication statusPublished - 15 May 2026
Externally publishedYes

Keywords

  • Efficient markets
  • Large language models
  • Neural networks
  • Random walk hypothesis

Fingerprint

Dive into the research topics of 'An efficient big data framework for validating the random walk hypothesis in high-frequency markets via neural networks and large language models'. Together they form a unique fingerprint.

Cite this