Abstract
Federated Learning (FL) has emerged as a promising paradigm for decentralized machine learning, where a central server coordinates distributed clients to collaboratively train a global model without direct access to raw data. Despite its advantages, heterogeneous and long-tail data distributions across clients remain a major bottleneck, particularly in IoT scenarios with diverse devices and sensing modalities. To address these challenges, we propose FedSM, a novel framework that integrates multimodal semantic knowledge with balanced pseudo features to enhance global model optimization. Unlike conventional approaches that rely on single-modal information, FedSM leverages CLIP’s cross-modal representations and open-vocabulary priors to guide semantic-aware data augmentation. A probabilistic selection mechanism further refines local features by mixing them with global prototypes, ensuring pseudo features are semantically reliable and reducing bias caused by skewed client distributions. Almost all computations are performed locally at the client side, thereby alleviating server overhead and improving scalability in resource-constrained IoT environments. Extensive experiments on long-tail benchmarks including CIFAR-10-LT, CIFAR-100-LT, and ImageNet-LT demonstrate the superiority of FedSM over state-of-the-art baselines, highlighting its potential for robust communication-efficient FL in IoT networks.
| Original language | English |
|---|---|
| Journal | IEEE Internet of Things Journal |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Federated Learning
- Internet of Things
- Long-tail Distribution
- Semantic-Guided Data Augmentation
Fingerprint
Dive into the research topics of 'FedSM: Semantic-Guided Feature Mixup for Bias Reduction in Federated Learning with Long-Tail Data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver