Dreamwalker: Mental Planning for Continuous Vision-Language Navigation

Hanqing Wang; Wei Liang; Luc Van Gool; Wenguan Wang

doi:10.1109/ICCV51070.2023.00998

Dreamwalker: Mental Planning for Continuous Vision-Language Navigation

Hanqing Wang^*, Wei Liang^*, Luc Van Gool, Wenguan Wang^*

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

5 引用（Scopus）

摘要

VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose Dreamwalker - a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. Dreamwalker can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily results in shortsighted behaviors, Dreamwalker is able to make strategic planning through large amounts of "mental experiments."Moreover, the imagined future scenarios reflect our agent's intention, making its decision-making process more transparent. Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.

源语言	英语
主期刊名	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	10839-10849
页数	11
ISBN（电子版）	9798350307184
DOI	https://doi.org/10.1109/ICCV51070.2023.00998
出版状态	已出版 - 2023
活动	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, 法国期限: 2 10月 2023 → 6 10月 2023

出版系列

姓名	Proceedings of the IEEE International Conference on Computer Vision
ISSN（印刷版）	1550-5499

会议

会议	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
国家/地区	法国
市	Paris
时期	2/10/23 → 6/10/23

访问文件

10.1109/ICCV51070.2023.00998

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, H., Liang, W., Van Gool, L., & Wang, W. (2023). Dreamwalker: Mental Planning for Continuous Vision-Language Navigation. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 (页码 10839-10849). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV51070.2023.00998

@inproceedings{a91ef1a6ad304b59b2fb1fd80138f687,

title = "Dreamwalker: Mental Planning for Continuous Vision-Language Navigation",

abstract = "VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose Dreamwalker - a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. Dreamwalker can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily results in shortsighted behaviors, Dreamwalker is able to make strategic planning through large amounts of {"}mental experiments.{"}Moreover, the imagined future scenarios reflect our agent's intention, making its decision-making process more transparent. Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.",

author = "Hanqing Wang and Wei Liang and {Van Gool}, Luc and Wenguan Wang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCV51070.2023.00998",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "10839--10849",

booktitle = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023",

address = "United States",

}

Wang, H, Liang, W, Van Gool, L & Wang, W 2023, Dreamwalker: Mental Planning for Continuous Vision-Language Navigation. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., 页码 10839-10849, 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, 法国, 2/10/23. https://doi.org/10.1109/ICCV51070.2023.00998

Dreamwalker: Mental Planning for Continuous Vision-Language Navigation. / Wang, Hanqing; Liang, Wei; Van Gool, Luc 等.
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 10839-10849 (Proceedings of the IEEE International Conference on Computer Vision).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Dreamwalker

T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

AU - Wang, Hanqing

AU - Liang, Wei

AU - Van Gool, Luc

AU - Wang, Wenguan

PY - 2023

Y1 - 2023

N2 - VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose Dreamwalker - a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. Dreamwalker can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily results in shortsighted behaviors, Dreamwalker is able to make strategic planning through large amounts of "mental experiments."Moreover, the imagined future scenarios reflect our agent's intention, making its decision-making process more transparent. Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.

AB - VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose Dreamwalker - a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. Dreamwalker can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily results in shortsighted behaviors, Dreamwalker is able to make strategic planning through large amounts of "mental experiments."Moreover, the imagined future scenarios reflect our agent's intention, making its decision-making process more transparent. Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.

UR - http://www.scopus.com/inward/record.url?scp=85185473961&partnerID=8YFLogxK

U2 - 10.1109/ICCV51070.2023.00998

DO - 10.1109/ICCV51070.2023.00998

M3 - Conference contribution

AN - SCOPUS:85185473961

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 10839

EP - 10849

BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 2 October 2023 through 6 October 2023

ER -

Wang H, Liang W, Van Gool L, Wang W. Dreamwalker: Mental Planning for Continuous Vision-Language Navigation. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 10839-10849. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV51070.2023.00998

Dreamwalker: Mental Planning for Continuous Vision-Language Navigation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此