Towards Versatile Embodied Navigation

Hanqing Wang; Wei Liang; Luc Van Gool; Wenguan Wang

Towards Versatile Embodied Navigation

Hanqing Wang, Wei Liang^*, Luc Van Gool, Wenguan Wang^*

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

17 引用（Scopus）

摘要

With the emergence of varied visual navigation tasks (e.g., image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose VIENNA, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, VIENNA formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.

源语言	英语
主期刊名	Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
编辑	S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
出版商	Neural information processing systems foundation
ISBN（电子版）	9781713871088
出版状态	已出版 - 2022
活动	36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, 美国期限: 28 11月 2022 → 9 12月 2022

出版系列

姓名	Advances in Neural Information Processing Systems
卷	35
ISSN（印刷版）	1049-5258

会议

会议	36th Conference on Neural Information Processing Systems, NeurIPS 2022
国家/地区	美国
市	New Orleans
时期	28/11/22 → 9/12/22

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, H., Liang, W., Van Gool, L., & Wang, W. (2022). Towards Versatile Embodied Navigation. 在 S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (编辑), Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022 (Advances in Neural Information Processing Systems; 卷 35). Neural information processing systems foundation.

Wang, Hanqing ; Liang, Wei ; Van Gool, Luc 等. / Towards Versatile Embodied Navigation. Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. 编辑 / S. Koyejo ; S. Mohamed ; A. Agarwal ; D. Belgrave ; K. Cho ; A. Oh. Neural information processing systems foundation, 2022. (Advances in Neural Information Processing Systems).

@inproceedings{d18966eb54014ab3b05c9ef85f18447f,

title = "Towards Versatile Embodied Navigation",

abstract = "With the emergence of varied visual navigation tasks (e.g., image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose VIENNA, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, VIENNA formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.",

author = "Hanqing Wang and Wei Liang and {Van Gool}, Luc and Wenguan Wang",

note = "Publisher Copyright: {\textcopyright} 2022 Neural information processing systems foundation. All rights reserved.; 36th Conference on Neural Information Processing Systems, NeurIPS 2022 ; Conference date: 28-11-2022 Through 09-12-2022",

year = "2022",

language = "English",

series = "Advances in Neural Information Processing Systems",

publisher = "Neural information processing systems foundation",

editor = "S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh",

booktitle = "Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022",

}

Wang, H, Liang, W, Van Gool, L & Wang, W 2022, Towards Versatile Embodied Navigation. 在 S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho & A Oh (编辑), Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. Advances in Neural Information Processing Systems, 卷 35, Neural information processing systems foundation, 36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, 美国, 28/11/22.

Towards Versatile Embodied Navigation. / Wang, Hanqing; Liang, Wei; Van Gool, Luc 等.
Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. 编辑 / S. Koyejo; S. Mohamed; A. Agarwal; D. Belgrave; K. Cho; A. Oh. Neural information processing systems foundation, 2022. (Advances in Neural Information Processing Systems; 卷 35).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Towards Versatile Embodied Navigation

AU - Wang, Hanqing

AU - Liang, Wei

AU - Van Gool, Luc

AU - Wang, Wenguan

PY - 2022

Y1 - 2022

N2 - With the emergence of varied visual navigation tasks (e.g., image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose VIENNA, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, VIENNA formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.

AB - With the emergence of varied visual navigation tasks (e.g., image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose VIENNA, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, VIENNA formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.

UR - http://www.scopus.com/inward/record.url?scp=85150745935&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85150745935

T3 - Advances in Neural Information Processing Systems

BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022

A2 - Koyejo, S.

A2 - Mohamed, S.

A2 - Agarwal, A.

A2 - Belgrave, D.

A2 - Cho, K.

A2 - Oh, A.

PB - Neural information processing systems foundation

T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022

Y2 - 28 November 2022 through 9 December 2022

ER -

Wang H, Liang W, Van Gool L, Wang W. Towards Versatile Embodied Navigation. 在 Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, 编辑, Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022. Neural information processing systems foundation. 2022. (Advances in Neural Information Processing Systems).

Towards Versatile Embodied Navigation

摘要

出版系列

会议

其它文件与链接

指纹

引用此