Multimodal Knowledge Graph Vision-Language Models for Precision Retrieval and Inference in Embedded Systems

Mingyi Li, Jiayin Li, Hui Liu*, Lijin Han, Dengting Liao, Zekun Zhang, Zong Gao, Baoshuai Liu, Zhongfeng Jin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In light of the challenges associated with enhancing multimodal knowledge graph retrieval and improving embodied intelligence in dynamic environments, we propose the Multimodal Knowledge Graph Vision-Language System (MKGVL). This system is designed to enhance the reasoning and feedback capabilities of embodied intelligent systems by continuously updating the knowledge graph through real-time feedback from the Vision-Language Model (VLM). By integrating visual encoders, language models, and knowledge graph networks, MKGVL constructs a unified multimodal representation that enables adaptive decision-making and responsiveness to environmental changes. Experimental results show that MKGVL outperforms existing models in fine-grained retrieval tasks, achieving an 11.5% improvement in Rank1 accuracy and a mean Average Precision (mAP) of 97.49%. Further evaluations conducted on datasets such as ARKitScenes, MultiScan, and 3RScan highlight the model's robustness and adaptability. Additionally, MKGVL's deployment on embedded platforms like Jetson Orin demonstrates its efficiency in real-time multimodal tasks, particularly in resource-constrained environments. These findings underscore MKGVL's ability to deliver accurate and efficient multimodal processing, making it a strong solution for adaptive knowledge-based systems in complex settings.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Robotics and Biomimetics, ROBIO 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1271-1275
Number of pages5
Edition2024
ISBN (Electronic)9781665481090
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Robotics and Biomimetics, ROBIO 2024 - Bangkok, Thailand
Duration: 10 Dec 202414 Dec 2024

Conference

Conference2024 IEEE International Conference on Robotics and Biomimetics, ROBIO 2024
Country/TerritoryThailand
CityBangkok
Period10/12/2414/12/24

Fingerprint

Dive into the research topics of 'Multimodal Knowledge Graph Vision-Language Models for Precision Retrieval and Inference in Embedded Systems'. Together they form a unique fingerprint.

Cite this