TY - JOUR
T1 - Digital twin-driven multi-agent collaborative online optimization of production regulation for smart reconfigurable manufacturing systems with human-robot collaboration
AU - Huang, Ming
AU - Huang, Sihan
AU - Gao, Liang
AU - Dong, Wei
AU - Zhang, Yonghui
AU - Gu, Xi
AU - Gao, Zenggui
N1 - Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/10
Y1 - 2026/10
N2 - In Industry 4.0 era, the synchronous development of high technologies and advanced manufacturing technologies significantly fill the gap between demand and production, where the smart reconfigurable manufacturing systems consisting of more and more robots is one of the best solutions to handle the dynamic, uncertain, and individual demands. The emerging Industry 5.0 provides a brand-new perspective for enhancing the flexibility of production by emphasizing human centricity resulting in popular human-robot collaboration. Accordingly, smart reconfigurable manufacturing systems with human-robot collaboration (SRMS-HRC) combining the advantages of Industry 4.0 and Industry 5.0 shows vigorous vitality for future production, which is featured by high-level flexibility and smartness simultaneously. However, the ultrahigh flexibility of SRMS-HRC brings in extra challenges for production organization, including changeable operation sequences, dynamic position assignment, and human-robot configuration. Meanwhile, the traditional offline optimization of planning-first-and-executing-later cannot satisfy the rapid responsiveness of SRMS-HRC. Therefore, a digital twin-driven multi-agent collaborative online optimization of production regulation for SRMS-HRC based on multi-objective deep reinforcement learning is proposed in this paper, where digital twin is adopted to enhance responsiveness for changes and deep reinforcement learning is used to generate online optimization schemes in a smarter way. A Markov decision process with order size-independent state representations and a rule-based multi-agent action set with explicit semantics is constructed. A normalized Tchebycheff reward aggregation method enables simultaneous optimization of enterprise-level and human-centered objectives. Furthermore, a multi-objective multi-agent twin delayed deep deterministic policy gradient (MO-MATD3) is developed to train collaborating agents. Numerical experiments and simulation scenario validations on industrial cases of varying scales demonstrate that the proposed approach effectively generates high-performance online optimization schemes, outperforming single-agent architectures, heuristic rules, and other deep reinforcement learning methods.
AB - In Industry 4.0 era, the synchronous development of high technologies and advanced manufacturing technologies significantly fill the gap between demand and production, where the smart reconfigurable manufacturing systems consisting of more and more robots is one of the best solutions to handle the dynamic, uncertain, and individual demands. The emerging Industry 5.0 provides a brand-new perspective for enhancing the flexibility of production by emphasizing human centricity resulting in popular human-robot collaboration. Accordingly, smart reconfigurable manufacturing systems with human-robot collaboration (SRMS-HRC) combining the advantages of Industry 4.0 and Industry 5.0 shows vigorous vitality for future production, which is featured by high-level flexibility and smartness simultaneously. However, the ultrahigh flexibility of SRMS-HRC brings in extra challenges for production organization, including changeable operation sequences, dynamic position assignment, and human-robot configuration. Meanwhile, the traditional offline optimization of planning-first-and-executing-later cannot satisfy the rapid responsiveness of SRMS-HRC. Therefore, a digital twin-driven multi-agent collaborative online optimization of production regulation for SRMS-HRC based on multi-objective deep reinforcement learning is proposed in this paper, where digital twin is adopted to enhance responsiveness for changes and deep reinforcement learning is used to generate online optimization schemes in a smarter way. A Markov decision process with order size-independent state representations and a rule-based multi-agent action set with explicit semantics is constructed. A normalized Tchebycheff reward aggregation method enables simultaneous optimization of enterprise-level and human-centered objectives. Furthermore, a multi-objective multi-agent twin delayed deep deterministic policy gradient (MO-MATD3) is developed to train collaborating agents. Numerical experiments and simulation scenario validations on industrial cases of varying scales demonstrate that the proposed approach effectively generates high-performance online optimization schemes, outperforming single-agent architectures, heuristic rules, and other deep reinforcement learning methods.
KW - Industry 4.0
KW - Industry 5.0
KW - Multi-agent collaborative
KW - Multi-objective deep reinforcement learning
KW - Online optimization
KW - Production regulation
KW - Smart reconfigurable manufacturing systems with human-robot collaboration (SRMS-HRC)
UR - https://www.scopus.com/pages/publications/105037448276
U2 - 10.1016/j.rcim.2026.103322
DO - 10.1016/j.rcim.2026.103322
M3 - Article
AN - SCOPUS:105037448276
SN - 0736-5845
VL - 101
JO - Robotics and Computer-Integrated Manufacturing
JF - Robotics and Computer-Integrated Manufacturing
M1 - 103322
ER -