An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

Xiaojiang Zuo, Qinglong Zhang, Rui Han*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Federated learning (FL) is an emerging distributed machine learning method that collaboratively trains a universal model among clients while maintaining their data privacy. Recently, several efforts attempt to introduce vision transformer (ViT) models into FL training. However, deploying and training such ViT models from scratch in practice is not trivial, existing works overlook the existence of the clients with low resources (e.g., mobile phones), which is a common and practical FL setting. In this paper, we use low-resolution images as model input to satisfy the resource constraints and investigate several ViT models to explore whether ViT models still outperform CNN models in this setting. Our experiment was performed on CIFAR10 and Fashion MNIST with their IID and non-IID versions, and the results demonstrate that ViT models can achieve a better global test accuracy than CNN models while using a comparable training cost, suggesting that they are ideally suitable for FL training with resource-constrained devices.

Original languageEnglish
Title of host publicationProceedings of MLMI 2022 - 2022 5th International Conference on Machine Learning and Machine Intelligence
PublisherAssociation for Computing Machinery
Pages8-13
Number of pages6
ISBN (Electronic)9781450397551
DOIs
Publication statusPublished - 23 Sept 2022
Event5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022 - Virtual, Online, China
Duration: 23 Sept 202225 Sept 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference5th International Conference on Machine Learning and Machine Intelligence, MLMI 2022
Country/TerritoryChina
CityVirtual, Online
Period23/09/2225/09/22

Keywords

  • CNN
  • Deep Learning
  • Federated Learning
  • Vision Transformer

Fingerprint

Dive into the research topics of 'An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning'. Together they form a unique fingerprint.

Cite this