When In-Network Computing Meets Distributed Machine Learning

Haowen Zhu; Wenchao Jiang; Qi Hong; Zehua Guo

doi:10.1109/MNET.2024.3368138

When In-Network Computing Meets Distributed Machine Learning

Haowen Zhu, Wenchao Jiang, Qi Hong, Zehua Guo

School of Automation

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

Emerging In-Network Computing (INC) technique provides a new opportunity to improve application’s performance by using network programmability, computational capability, and storage capacity enabled by programmable switches. One typical application is Distributed Machine Learning (DML), which accelerates machine learning training by employing multiple works to train model parallelly. This paper introduces INC-based DML systems, analyzes performance improvement from using INC, and overviews current studies of INC-based DML systems. We also propose potential research directions for applying INC to DML systems.

Original language	English
Pages (from-to)	1
Number of pages	1
Journal	IEEE Network
DOIs	https://doi.org/10.1109/MNET.2024.3368138
Publication status	Accepted/In press - 2024

Keywords

Computational modeling
Data models
Distributed Machine Learning
In-Network Computing
Machine Learning
Machine learning
Performance evaluation
Programmable Switch
Servers
Synchronization
Training

Access to Document

10.1109/MNET.2024.3368138

Cite this

Zhu, H., Jiang, W., Hong, Q., & Guo, Z. (Accepted/In press). When In-Network Computing Meets Distributed Machine Learning. IEEE Network, 1. https://doi.org/10.1109/MNET.2024.3368138

@article{59921f53a1c14162899805073a6b5267,

title = "When In-Network Computing Meets Distributed Machine Learning",

abstract = "Emerging In-Network Computing (INC) technique provides a new opportunity to improve application{\textquoteright}s performance by using network programmability, computational capability, and storage capacity enabled by programmable switches. One typical application is Distributed Machine Learning (DML), which accelerates machine learning training by employing multiple works to train model parallelly. This paper introduces INC-based DML systems, analyzes performance improvement from using INC, and overviews current studies of INC-based DML systems. We also propose potential research directions for applying INC to DML systems.",

keywords = "Computational modeling, Data models, Distributed Machine Learning, In-Network Computing, Machine Learning, Machine learning, Performance evaluation, Programmable Switch, Servers, Synchronization, Training",

author = "Haowen Zhu and Wenchao Jiang and Qi Hong and Zehua Guo",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/MNET.2024.3368138",

language = "English",

pages = "1",

journal = "IEEE Network",

issn = "0890-8044",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - When In-Network Computing Meets Distributed Machine Learning

AU - Zhu, Haowen

AU - Jiang, Wenchao

AU - Hong, Qi

AU - Guo, Zehua

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Emerging In-Network Computing (INC) technique provides a new opportunity to improve application’s performance by using network programmability, computational capability, and storage capacity enabled by programmable switches. One typical application is Distributed Machine Learning (DML), which accelerates machine learning training by employing multiple works to train model parallelly. This paper introduces INC-based DML systems, analyzes performance improvement from using INC, and overviews current studies of INC-based DML systems. We also propose potential research directions for applying INC to DML systems.

AB - Emerging In-Network Computing (INC) technique provides a new opportunity to improve application’s performance by using network programmability, computational capability, and storage capacity enabled by programmable switches. One typical application is Distributed Machine Learning (DML), which accelerates machine learning training by employing multiple works to train model parallelly. This paper introduces INC-based DML systems, analyzes performance improvement from using INC, and overviews current studies of INC-based DML systems. We also propose potential research directions for applying INC to DML systems.

KW - Computational modeling

KW - Data models

KW - Distributed Machine Learning

KW - In-Network Computing

KW - Machine Learning

KW - Machine learning

KW - Performance evaluation

KW - Programmable Switch

KW - Servers

KW - Synchronization

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85186096176&partnerID=8YFLogxK

U2 - 10.1109/MNET.2024.3368138

DO - 10.1109/MNET.2024.3368138

M3 - Article

AN - SCOPUS:85186096176

SN - 0890-8044

SP - 1

JO - IEEE Network

JF - IEEE Network

ER -

When In-Network Computing Meets Distributed Machine Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this