TY - GEN
T1 - Lava
T2 - 33rd ACM International Conference on Multimedia, MM 2025
AU - Yu, Yanrui
AU - Zhou, Tianfei
AU - Sun, Jiaxin
AU - Qiao, Lianpeng
AU - Ding, Lizhong
AU - Yuan, Ye
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/10/27
Y1 - 2025/10/27
N2 - In modern urban environments, camera networks generate massive amounts of operational footage - reaching petabytes each day - making scalable video analytics essential for efficient processing. Many existing approaches adopt an SQL-based paradigm for querying such large-scale video databases; however, this constrains queries to rigid patterns with predefined semantic categories, significantly limiting analytical flexibility. In this work, we explore a language-driven video analytics paradigm aimed at enabling flexible and efficient querying of high-volume video data driven by natural language. Particularly, we build Lava, a system that accepts natural language queries and retrieves traffic targets across multiple levels of granularity and arbitrary categories. Lava comprises three main components: 1) a multi-armed bandit-based efficient sampling method for video segment-level localization; 2) a video-specific open-world detection module for object-level retrieval; and 3) a long-term object trajectory extraction scheme for temporal object association, yielding complete trajectories for object-of-interests. To support comprehensive evaluation, we further develop a novel benchmark by providing diverse, semantically rich natural language predicates and fine-grained annotations for multiple videos. Experiments on this benchmark demonstrate that Lava improves F1-scores for selection queries by 14% reduces MPAE for aggregation queries by 0.39, and achieves top-k precision of 86% while processing videos 9.6x faster than the most accurate baseline. Our code and dataset are available at https://github.com/yuyanrui/LAVA.
AB - In modern urban environments, camera networks generate massive amounts of operational footage - reaching petabytes each day - making scalable video analytics essential for efficient processing. Many existing approaches adopt an SQL-based paradigm for querying such large-scale video databases; however, this constrains queries to rigid patterns with predefined semantic categories, significantly limiting analytical flexibility. In this work, we explore a language-driven video analytics paradigm aimed at enabling flexible and efficient querying of high-volume video data driven by natural language. Particularly, we build Lava, a system that accepts natural language queries and retrieves traffic targets across multiple levels of granularity and arbitrary categories. Lava comprises three main components: 1) a multi-armed bandit-based efficient sampling method for video segment-level localization; 2) a video-specific open-world detection module for object-level retrieval; and 3) a long-term object trajectory extraction scheme for temporal object association, yielding complete trajectories for object-of-interests. To support comprehensive evaluation, we further develop a novel benchmark by providing diverse, semantically rich natural language predicates and fine-grained annotations for multiple videos. Experiments on this benchmark demonstrate that Lava improves F1-scores for selection queries by 14% reduces MPAE for aggregation queries by 0.39, and achieves top-k precision of 86% while processing videos 9.6x faster than the most accurate baseline. Our code and dataset are available at https://github.com/yuyanrui/LAVA.
KW - high-volume video data
KW - language
KW - scalable video analytics
UR - https://www.scopus.com/pages/publications/105024066203
U2 - 10.1145/3746027.3754955
DO - 10.1145/3746027.3754955
M3 - Conference contribution
AN - SCOPUS:105024066203
T3 - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
SP - 7558
EP - 7567
BT - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2025 through 31 October 2025
ER -