An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

Xiaojiang Zuo; Qinglong Zhang; Rui Han

doi:10.1145/3568199.3568201

An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

Xiaojiang Zuo, Qinglong Zhang, Rui Han^*

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Federated learning (FL) is an emerging distributed machine learning method that collaboratively trains a universal model among clients while maintaining their data privacy. Recently, several efforts attempt to introduce vision transformer (ViT) models into FL training. However, deploying and training such ViT models from scratch in practice is not trivial, existing works overlook the existence of the clients with low resources (e.g., mobile phones), which is a common and practical FL setting. In this paper, we use low-resolution images as model input to satisfy the resource constraints and investigate several ViT models to explore whether ViT models still outperform CNN models in this setting. Our experiment was performed on CIFAR10 and Fashion MNIST with their IID and non-IID versions, and the results demonstrate that ViT models can achieve a better global test accuracy than CNN models while using a comparable training cost, suggesting that they are ideally suitable for FL training with resource-constrained devices.

源语言	英语
主期刊名	Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence
出版商	Association for Computing Machinery
页	8-13
页数	6
ISBN（电子版）	9781450397551
DOI	https://doi.org/10.1145/3568199.3568201
出版状态	已出版 - 23 9月 2022
活动	5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022 - Virtual, Online, 中国期限: 23 9月 2022 → 25 9月 2022

出版系列

姓名	ACM International Conference Proceeding Series

会议

会议	5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022
国家/地区	中国
市	Virtual, Online
时期	23/09/22 → 25/09/22

访问文件

10.1145/3568199.3568201

其它文件与链接

链接到 Scopus 的出版物

引用此

Zuo, X., Zhang, Q., & Han, R. (2022). An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning. 在 Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence (页码 8-13). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3568199.3568201

@inproceedings{3995389d2b584f94b490497470ec61f9,

title = "An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning",

abstract = "Federated learning (FL) is an emerging distributed machine learning method that collaboratively trains a universal model among clients while maintaining their data privacy. Recently, several efforts attempt to introduce vision transformer (ViT) models into FL training. However, deploying and training such ViT models from scratch in practice is not trivial, existing works overlook the existence of the clients with low resources (e.g., mobile phones), which is a common and practical FL setting. In this paper, we use low-resolution images as model input to satisfy the resource constraints and investigate several ViT models to explore whether ViT models still outperform CNN models in this setting. Our experiment was performed on CIFAR10 and Fashion MNIST with their IID and non-IID versions, and the results demonstrate that ViT models can achieve a better global test accuracy than CNN models while using a comparable training cost, suggesting that they are ideally suitable for FL training with resource-constrained devices.",

keywords = "CNN, Deep Learning, Federated Learning, Vision Transformer",

author = "Xiaojiang Zuo and Qinglong Zhang and Rui Han",

note = "Publisher Copyright: {\textcopyright} 2022 ACM.; 5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022 ; Conference date: 23-09-2022 Through 25-09-2022",

year = "2022",

month = sep,

day = "23",

doi = "10.1145/3568199.3568201",

language = "English",

series = "ACM International Conference Proceeding Series",

publisher = "Association for Computing Machinery",

pages = "8--13",

booktitle = "Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence",

}

Zuo, X, Zhang, Q & Han, R 2022, An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning. 在 Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence. ACM International Conference Proceeding Series, Association for Computing Machinery, 页码 8-13, 5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022, Virtual, Online, 中国, 23/09/22. https://doi.org/10.1145/3568199.3568201

An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning. / Zuo, Xiaojiang; Zhang, Qinglong; Han, Rui.
Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence. Association for Computing Machinery, 2022. 页码 8-13 (ACM International Conference Proceeding Series).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

AU - Zuo, Xiaojiang

AU - Zhang, Qinglong

AU - Han, Rui

PY - 2022/9/23

Y1 - 2022/9/23

N2 - Federated learning (FL) is an emerging distributed machine learning method that collaboratively trains a universal model among clients while maintaining their data privacy. Recently, several efforts attempt to introduce vision transformer (ViT) models into FL training. However, deploying and training such ViT models from scratch in practice is not trivial, existing works overlook the existence of the clients with low resources (e.g., mobile phones), which is a common and practical FL setting. In this paper, we use low-resolution images as model input to satisfy the resource constraints and investigate several ViT models to explore whether ViT models still outperform CNN models in this setting. Our experiment was performed on CIFAR10 and Fashion MNIST with their IID and non-IID versions, and the results demonstrate that ViT models can achieve a better global test accuracy than CNN models while using a comparable training cost, suggesting that they are ideally suitable for FL training with resource-constrained devices.

AB - Federated learning (FL) is an emerging distributed machine learning method that collaboratively trains a universal model among clients while maintaining their data privacy. Recently, several efforts attempt to introduce vision transformer (ViT) models into FL training. However, deploying and training such ViT models from scratch in practice is not trivial, existing works overlook the existence of the clients with low resources (e.g., mobile phones), which is a common and practical FL setting. In this paper, we use low-resolution images as model input to satisfy the resource constraints and investigate several ViT models to explore whether ViT models still outperform CNN models in this setting. Our experiment was performed on CIFAR10 and Fashion MNIST with their IID and non-IID versions, and the results demonstrate that ViT models can achieve a better global test accuracy than CNN models while using a comparable training cost, suggesting that they are ideally suitable for FL training with resource-constrained devices.

KW - CNN

KW - Deep Learning

KW - Federated Learning

KW - Vision Transformer

UR - http://www.scopus.com/inward/record.url?scp=85149943651&partnerID=8YFLogxK

U2 - 10.1145/3568199.3568201

DO - 10.1145/3568199.3568201

M3 - Conference contribution

AN - SCOPUS:85149943651

T3 - ACM International Conference Proceeding Series

SP - 8

EP - 13

BT - Proceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence

PB - Association for Computing Machinery

T2 - 5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022

Y2 - 23 September 2022 through 25 September 2022

ER -

An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此