跳到主要导航 跳到搜索 跳到主要内容

QUEST: Query Optimization in Unstructured Document Analysis

  • Beijing Institute of Technology
  • University of Arizona

科研成果: 期刊稿件会议文章同行评审

摘要

Most recently, researchers have started building large language models (LLMs) powered data systems that allow users to analyze unstructured text documents like working with a database because LLMs are very effective in extracting attributes from documents. In such systems, LLM-based extraction operations constitute the performance bottleneck of query execution due to the high monetary cost and slow LLM inference. Existing systems typically borrow the query optimization principles popular in relational databases to produce query execution plans, which unfortunately are ineffective in minimizing LLM cost. To fill this gap, we propose QUEST, which features a bunch of novel optimization strategies for unstructured document analysis. First, we introduce an index-based strategy to minimize the cost of each extraction operation. With this index, QUEST quickly retrieves the text segments relevant to the target attributes and only feeds them to LLMs. Furthermore, we design an evidence-augmented retrieval strategy to reduce the possibility of missing relevant segments. Moreover, we develop an instance-optimized query execution strategy: because the attribute extraction cost could vary significantly document by document, QUEST produces different plans for different documents. For each document, QUEST produces a plan to minimize the frequency of attribute extraction. The innovations include LLM cost-aware operator ordering strategies and an optimized join execution approach that transforms joins into filters. Extensive experiments on 3 realworld datasets demonstrate the superiority of QUEST, achieving 30%-6× cost savings while improving the F1 score by 10%-27% compared with state-of-the-art baselines.

源语言英语
页(从-至)4560-4573
页数14
期刊Proceedings of the VLDB Endowment
18
11
DOI
出版状态已出版 - 2025
活动51st International Conference on Very Large Data Bases, VLDB 2025 - London, 英国
期限: 1 9月 20255 9月 2025

指纹

探究 'QUEST: Query Optimization in Unstructured Document Analysis' 的科研主题。它们共同构成独一无二的指纹。

引用此