Lava: Language Driven Scalable and Versatile Traffic Video Analytics

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In modern urban environments, camera networks generate massive amounts of operational footage - reaching petabytes each day - making scalable video analytics essential for efficient processing. Many existing approaches adopt an SQL-based paradigm for querying such large-scale video databases; however, this constrains queries to rigid patterns with predefined semantic categories, significantly limiting analytical flexibility. In this work, we explore a language-driven video analytics paradigm aimed at enabling flexible and efficient querying of high-volume video data driven by natural language. Particularly, we build Lava, a system that accepts natural language queries and retrieves traffic targets across multiple levels of granularity and arbitrary categories. Lava comprises three main components: 1) a multi-armed bandit-based efficient sampling method for video segment-level localization; 2) a video-specific open-world detection module for object-level retrieval; and 3) a long-term object trajectory extraction scheme for temporal object association, yielding complete trajectories for object-of-interests. To support comprehensive evaluation, we further develop a novel benchmark by providing diverse, semantically rich natural language predicates and fine-grained annotations for multiple videos. Experiments on this benchmark demonstrate that Lava improves F1-scores for selection queries by 14% reduces MPAE for aggregation queries by 0.39, and achieves top-k precision of 86% while processing videos 9.6x faster than the most accurate baseline. Our code and dataset are available at https://github.com/yuyanrui/LAVA.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages7558-7567
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • high-volume video data
  • language
  • scalable video analytics

Fingerprint

Dive into the research topics of 'Lava: Language Driven Scalable and Versatile Traffic Video Analytics'. Together they form a unique fingerprint.

Cite this