Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction

Li Wu Tsao; Yan Kai Wang; Hao Siang Lin; Hong Han Shuai; Lai Kuan Wong; Wen Huang Cheng

doi:10.1007/978-3-031-20047-2_14

Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction

Li Wu Tsao^*, Yan Kai Wang, Hao Siang Lin, Hong Han Shuai, Lai Kuan Wong, Wen Huang Cheng

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

17 Citations (Scopus)

Abstract

Earlier trajectory prediction approaches focus on ways of capturing sequential structures among pedestrians by using recurrent networks, which is known to have some limitations in capturing long sequence structures. To address this limitation, some recent works proposed Transformer-based architectures, which are built with attention mechanisms. However, these Transformer-based networks are trained end-to-end without capitalizing on the value of pre-training. In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. Specifically, Social-SSL models the interaction and motion patterns with three pretext tasks: interaction type prediction, closeness prediction, and masked cross-sequence to sequence pre-training. Comprehensive experiments show that Social-SSL outperforms the state-of-the-art methods by at least 12% and 20% on ETH/UCY and SDD datasets in terms of Average Displacement Error and Final Displacement Error (code available at https://github.com/Sigta678/Social-SSL.

Original language	English
Title of host publication	Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
Editors	Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	234-250
Number of pages	17
ISBN (Print)	9783031200465
DOIs	https://doi.org/10.1007/978-3-031-20047-2_14
Publication status	Published - 2022
Externally published	Yes
Event	17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13682 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th European Conference on Computer Vision, ECCV 2022
Country/Territory	Israel
City	Tel Aviv
Period	23/10/22 → 27/10/22

Keywords

Representation learning
Self-supervised learning
Trajectory prediction
Transformer

Access to Document

10.1007/978-3-031-20047-2_14

Cite this

Tsao, L. W., Wang, Y. K., Lin, H. S., Shuai, H. H., Wong, L. K., & Cheng, W. H. (2022). Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022 - 17th European Conference, Proceedings (pp. 234-250). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13682 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20047-2_14

Tsao, Li Wu ; Wang, Yan Kai ; Lin, Hao Siang et al. / Social-SSL : Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. editor / Shai Avidan ; Gabriel Brostow ; Moustapha Cissé ; Giovanni Maria Farinella ; Tal Hassner. Springer Science and Business Media Deutschland GmbH, 2022. pp. 234-250 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{0e3f084d90594939a676eebef1e77c0a,

title = "Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction",

abstract = "Earlier trajectory prediction approaches focus on ways of capturing sequential structures among pedestrians by using recurrent networks, which is known to have some limitations in capturing long sequence structures. To address this limitation, some recent works proposed Transformer-based architectures, which are built with attention mechanisms. However, these Transformer-based networks are trained end-to-end without capitalizing on the value of pre-training. In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. Specifically, Social-SSL models the interaction and motion patterns with three pretext tasks: interaction type prediction, closeness prediction, and masked cross-sequence to sequence pre-training. Comprehensive experiments show that Social-SSL outperforms the state-of-the-art methods by at least 12% and 20% on ETH/UCY and SDD datasets in terms of Average Displacement Error and Final Displacement Error (code available at https://github.com/Sigta678/Social-SSL.",

keywords = "Representation learning, Self-supervised learning, Trajectory prediction, Transformer",

author = "Tsao, {Li Wu} and Wang, {Yan Kai} and Lin, {Hao Siang} and Shuai, {Hong Han} and Wong, {Lai Kuan} and Cheng, {Wen Huang}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 17th European Conference on Computer Vision, ECCV 2022 ; Conference date: 23-10-2022 Through 27-10-2022",

year = "2022",

doi = "10.1007/978-3-031-20047-2_14",

language = "English",

isbn = "9783031200465",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "234--250",

editor = "Shai Avidan and Gabriel Brostow and Moustapha Ciss{\'e} and Farinella, {Giovanni Maria} and Tal Hassner",

booktitle = "Computer Vision – ECCV 2022 - 17th European Conference, Proceedings",

address = "Germany",

}

Tsao, LW, Wang, YK, Lin, HS, Shuai, HH, Wong, LK & Cheng, WH 2022, Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. in S Avidan, G Brostow, M Cissé, GM Farinella & T Hassner (eds), Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13682 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 234-250, 17th European Conference on Computer Vision, ECCV 2022, Tel Aviv, Israel, 23/10/22. https://doi.org/10.1007/978-3-031-20047-2_14

Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. / Tsao, Li Wu; Wang, Yan Kai; Lin, Hao Siang et al.
Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. Springer Science and Business Media Deutschland GmbH, 2022. p. 234-250 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13682 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Social-SSL

T2 - 17th European Conference on Computer Vision, ECCV 2022

AU - Tsao, Li Wu

AU - Wang, Yan Kai

AU - Lin, Hao Siang

AU - Shuai, Hong Han

AU - Wong, Lai Kuan

AU - Cheng, Wen Huang

PY - 2022

Y1 - 2022

N2 - Earlier trajectory prediction approaches focus on ways of capturing sequential structures among pedestrians by using recurrent networks, which is known to have some limitations in capturing long sequence structures. To address this limitation, some recent works proposed Transformer-based architectures, which are built with attention mechanisms. However, these Transformer-based networks are trained end-to-end without capitalizing on the value of pre-training. In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. Specifically, Social-SSL models the interaction and motion patterns with three pretext tasks: interaction type prediction, closeness prediction, and masked cross-sequence to sequence pre-training. Comprehensive experiments show that Social-SSL outperforms the state-of-the-art methods by at least 12% and 20% on ETH/UCY and SDD datasets in terms of Average Displacement Error and Final Displacement Error (code available at https://github.com/Sigta678/Social-SSL.

AB - Earlier trajectory prediction approaches focus on ways of capturing sequential structures among pedestrians by using recurrent networks, which is known to have some limitations in capturing long sequence structures. To address this limitation, some recent works proposed Transformer-based architectures, which are built with attention mechanisms. However, these Transformer-based networks are trained end-to-end without capitalizing on the value of pre-training. In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. Specifically, Social-SSL models the interaction and motion patterns with three pretext tasks: interaction type prediction, closeness prediction, and masked cross-sequence to sequence pre-training. Comprehensive experiments show that Social-SSL outperforms the state-of-the-art methods by at least 12% and 20% on ETH/UCY and SDD datasets in terms of Average Displacement Error and Final Displacement Error (code available at https://github.com/Sigta678/Social-SSL.

KW - Representation learning

KW - Self-supervised learning

KW - Trajectory prediction

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85142701898&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-20047-2_14

DO - 10.1007/978-3-031-20047-2_14

M3 - Conference contribution

AN - SCOPUS:85142701898

SN - 9783031200465

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 234

EP - 250

BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings

A2 - Avidan, Shai

A2 - Brostow, Gabriel

A2 - Cissé, Moustapha

A2 - Farinella, Giovanni Maria

A2 - Hassner, Tal

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 23 October 2022 through 27 October 2022

ER -

Tsao LW, Wang YK, Lin HS, Shuai HH, Wong LK, Cheng WH. Social-SSL: Self-supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction. In Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors, Computer Vision – ECCV 2022 - 17th European Conference, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 234-250. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-20047-2_14