Attention-based neural network for end-to-end music separation

Jing Wang; Hanyue Liu; Haorong Ying; Chuhan Qiu; Jingxin Li; Muhammad Shahid Anwar

doi:10.1049/cit2.12163

Attention-based neural network for end-to-end music separation

Jing Wang^*, Hanyue Liu, Haorong Ying, Chuhan Qiu, Jingxin Li, Muhammad Shahid Anwar^*

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation. Moreover, since music signals are often dual channel data with a high sampling rate, how to model long-sequence data and make rational use of the relevant information between channels is also an urgent problem to be solved. In order to solve the above problems, the performance of the end-to-end music separation algorithm is enhanced by improving the network structure. Our main contributions include the following: (1) A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music, such as main melody, tone and so on. (2) On this basis, the multi-head attention and dual-path transformer are introduced in the separation module. Channel attention units are applied recursively on the feature map of each layer of the network, enabling the network to perform long-sequence separation. Experimental results show that after the introduction of the channel attention, the performance of the proposed algorithm has a stable improvement compared with the baseline system. On the MUSDB18 dataset, the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain (T-F domain).

源语言	英语
页（从-至）	355-363
页数	9
期刊	CAAI Transactions on Intelligence Technology
卷	8
期	2
DOI	https://doi.org/10.1049/cit2.12163
出版状态	已出版 - 6月 2023

访问文件

10.1049/cit2.12163

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{50588ee3352a420a9e5da2d2476494e2,

title = "Attention-based neural network for end-to-end music separation",

abstract = "The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation. Moreover, since music signals are often dual channel data with a high sampling rate, how to model long-sequence data and make rational use of the relevant information between channels is also an urgent problem to be solved. In order to solve the above problems, the performance of the end-to-end music separation algorithm is enhanced by improving the network structure. Our main contributions include the following: (1) A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music, such as main melody, tone and so on. (2) On this basis, the multi-head attention and dual-path transformer are introduced in the separation module. Channel attention units are applied recursively on the feature map of each layer of the network, enabling the network to perform long-sequence separation. Experimental results show that after the introduction of the channel attention, the performance of the proposed algorithm has a stable improvement compared with the baseline system. On the MUSDB18 dataset, the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain (T-F domain).",

keywords = "channel attention, densely connected network, end-to-end music separation",

author = "Jing Wang and Hanyue Liu and Haorong Ying and Chuhan Qiu and Jingxin Li and Anwar, {Muhammad Shahid}",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors. CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.",

year = "2023",

month = jun,

doi = "10.1049/cit2.12163",

language = "English",

volume = "8",

pages = "355--363",

journal = "CAAI Transactions on Intelligence Technology",

issn = "2468-6557",

publisher = "John Wiley & Sons Inc.",

number = "2",

}

TY - JOUR

T1 - Attention-based neural network for end-to-end music separation

AU - Wang, Jing

AU - Liu, Hanyue

AU - Ying, Haorong

AU - Qiu, Chuhan

AU - Li, Jingxin

AU - Anwar, Muhammad Shahid

N1 - Publisher Copyright: © 2023 The Authors. CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.

PY - 2023/6

Y1 - 2023/6

N2 - The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation. Moreover, since music signals are often dual channel data with a high sampling rate, how to model long-sequence data and make rational use of the relevant information between channels is also an urgent problem to be solved. In order to solve the above problems, the performance of the end-to-end music separation algorithm is enhanced by improving the network structure. Our main contributions include the following: (1) A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music, such as main melody, tone and so on. (2) On this basis, the multi-head attention and dual-path transformer are introduced in the separation module. Channel attention units are applied recursively on the feature map of each layer of the network, enabling the network to perform long-sequence separation. Experimental results show that after the introduction of the channel attention, the performance of the proposed algorithm has a stable improvement compared with the baseline system. On the MUSDB18 dataset, the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain (T-F domain).

AB - The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation. Moreover, since music signals are often dual channel data with a high sampling rate, how to model long-sequence data and make rational use of the relevant information between channels is also an urgent problem to be solved. In order to solve the above problems, the performance of the end-to-end music separation algorithm is enhanced by improving the network structure. Our main contributions include the following: (1) A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music, such as main melody, tone and so on. (2) On this basis, the multi-head attention and dual-path transformer are introduced in the separation module. Channel attention units are applied recursively on the feature map of each layer of the network, enabling the network to perform long-sequence separation. Experimental results show that after the introduction of the channel attention, the performance of the proposed algorithm has a stable improvement compared with the baseline system. On the MUSDB18 dataset, the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain (T-F domain).

KW - channel attention

KW - densely connected network

KW - end-to-end music separation

UR - http://www.scopus.com/inward/record.url?scp=85147000869&partnerID=8YFLogxK

U2 - 10.1049/cit2.12163

DO - 10.1049/cit2.12163

M3 - Article

AN - SCOPUS:85147000869

SN - 2468-6557

VL - 8

SP - 355

EP - 363

JO - CAAI Transactions on Intelligence Technology

JF - CAAI Transactions on Intelligence Technology

IS - 2

ER -

Attention-based neural network for end-to-end music separation

摘要

访问文件

其它文件与链接

指纹

引用此