ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table Understanding

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Hybrid-modal table understanding (HMTU), which targets leveraging multi-modal table evidence for multi-hop reasoning, has garnered widespread attention. Existing models primarily focus on effectively integrating multi-modal table evidence to enhance the table understanding capabilities of multi-modal large language models (MLLMs). However, these models ignore the fact that different types of table understanding questions lean toward different modalities of table evidence. Consequently, these models suffer from low utilization efficiency and poor interpretability. To address these issues, in this paper, we propose a modality preference alignment model, called ESTJ, which Enhances Structured Tendency Judgment in HMTU. Specifically, ESTJ first samples modality preference data from the responses generated by MLLMs. Then, it alleviates modality preference imbalance by adhering to the principle of least modality priority. Finally, ESTJ performs direct preference optimization (DPO) training based on structured tendency judgment to align modality preference effectively. Experimental results on TableQA and TableFV tasks demonstrate that our proposed model outperforms state-of-the-art baselines. Additionally, these results present fascinating phenomena and unveil profound insights into modality preference for table understanding.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages2399-2408
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • direct preference optimization
  • hybrid-modal table understanding
  • least modality priority
  • modality preference alignment
  • multi-modal large language models
  • structured tendency judgment

Fingerprint

Dive into the research topics of 'ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table Understanding'. Together they form a unique fingerprint.

Cite this