An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

Xiaojiang Zuo, Qinglong Zhang, Rui Han*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Federated learning (FL) is an emerging distributed machine learning method that collaboratively trains a universal model among clients while maintaining their data privacy. Recently, several efforts attempt to introduce vision transformer (ViT) models into FL training. However, deploying and training such ViT models from scratch in practice is not trivial, existing works overlook the existence of the clients with low resources (e.g., mobile phones), which is a common and practical FL setting. In this paper, we use low-resolution images as model input to satisfy the resource constraints and investigate several ViT models to explore whether ViT models still outperform CNN models in this setting. Our experiment was performed on CIFAR10 and Fashion MNIST with their IID and non-IID versions, and the results demonstrate that ViT models can achieve a better global test accuracy than CNN models while using a comparable training cost, suggesting that they are ideally suitable for FL training with resource-constrained devices.

源语言英语
主期刊名Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence
出版商Association for Computing Machinery
8-13
页数6
ISBN(电子版)9781450397551
DOI
出版状态已出版 - 23 9月 2022
活动5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022 - Virtual, Online, 中国
期限: 23 9月 202225 9月 2022

出版系列

姓名ACM International Conference Proceeding Series

会议

会议5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022
国家/地区中国
Virtual, Online
时期23/09/2225/09/22

指纹

探究 'An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此