Speech recognition based on deep tensor neural network and multifactor feature

Yahui Shan; Min Liu; Qingran Zhan; Shixuan Du; Jing Wang; Xiang Xie

doi:10.1109/APSIPAASC47483.2019.9023251

Speech recognition based on deep tensor neural network and multifactor feature

Yahui Shan, Min Liu, Qingran Zhan, Shixuan Du, Jing Wang, Xiang Xie

School of Information and Electronics

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

This paper presents a speech recognition system based on deep tensor neural network which uses multifactor feature as input feature of acoustic model. First, a deep neural network is trained to estimate articulatory feature from input speech, where the training data is MOCHA database[1]. Mel frequency cepstrum coefficients in conjunction with articulatory feature are used as multifactor feature. Deep tensor neural network which involves tensor interactions among neurons is used as the acoustic model in this system. Speech recognition results indicate that the multifactor feature helps in improving speech recognition performance not only under clean conditions but also under noisy background conditions; deep tensor neural network is more capable of modeling multifactor features because of its tensor interactions than deep neural network.

Original language	English
Title of host publication	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	650-654
Number of pages	5
ISBN (Electronic)	9781728132488
DOIs	https://doi.org/10.1109/APSIPAASC47483.2019.9023251
Publication status	Published - Nov 2019
Event	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China Duration: 18 Nov 2019 → 21 Nov 2019

Publication series

Name	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Country/Territory	China
City	Lanzhou
Period	18/11/19 → 21/11/19

Access to Document

10.1109/APSIPAASC47483.2019.9023251

Cite this

Shan, Y., Liu, M., Zhan, Q., Du, S., Wang, J., & Xie, X. (2019). Speech recognition based on deep tensor neural network and multifactor feature. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 (pp. 650-654). Article 9023251 (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPAASC47483.2019.9023251

Shan, Yahui ; Liu, Min ; Zhan, Qingran et al. / Speech recognition based on deep tensor neural network and multifactor feature. 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 650-654 (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019).

@inproceedings{26bde6338bd543ba9c7b561e03dc9a5b,

title = "Speech recognition based on deep tensor neural network and multifactor feature",

abstract = "This paper presents a speech recognition system based on deep tensor neural network which uses multifactor feature as input feature of acoustic model. First, a deep neural network is trained to estimate articulatory feature from input speech, where the training data is MOCHA database[1]. Mel frequency cepstrum coefficients in conjunction with articulatory feature are used as multifactor feature. Deep tensor neural network which involves tensor interactions among neurons is used as the acoustic model in this system. Speech recognition results indicate that the multifactor feature helps in improving speech recognition performance not only under clean conditions but also under noisy background conditions; deep tensor neural network is more capable of modeling multifactor features because of its tensor interactions than deep neural network.",

author = "Yahui Shan and Min Liu and Qingran Zhan and Shixuan Du and Jing Wang and Xiang Xie",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 ; Conference date: 18-11-2019 Through 21-11-2019",

year = "2019",

month = nov,

doi = "10.1109/APSIPAASC47483.2019.9023251",

language = "English",

series = "2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "650--654",

booktitle = "2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019",

address = "United States",

}

Shan, Y, Liu, M, Zhan, Q, Du, S, Wang, J & Xie, X 2019, Speech recognition based on deep tensor neural network and multifactor feature. in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019., 9023251, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Institute of Electrical and Electronics Engineers Inc., pp. 650-654, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, 18/11/19. https://doi.org/10.1109/APSIPAASC47483.2019.9023251

Speech recognition based on deep tensor neural network and multifactor feature. / Shan, Yahui; Liu, Min; Zhan, Qingran et al.
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 650-654 9023251 (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Speech recognition based on deep tensor neural network and multifactor feature

AU - Shan, Yahui

AU - Liu, Min

AU - Zhan, Qingran

AU - Du, Shixuan

AU - Wang, Jing

AU - Xie, Xiang

PY - 2019/11

Y1 - 2019/11

N2 - This paper presents a speech recognition system based on deep tensor neural network which uses multifactor feature as input feature of acoustic model. First, a deep neural network is trained to estimate articulatory feature from input speech, where the training data is MOCHA database[1]. Mel frequency cepstrum coefficients in conjunction with articulatory feature are used as multifactor feature. Deep tensor neural network which involves tensor interactions among neurons is used as the acoustic model in this system. Speech recognition results indicate that the multifactor feature helps in improving speech recognition performance not only under clean conditions but also under noisy background conditions; deep tensor neural network is more capable of modeling multifactor features because of its tensor interactions than deep neural network.

AB - This paper presents a speech recognition system based on deep tensor neural network which uses multifactor feature as input feature of acoustic model. First, a deep neural network is trained to estimate articulatory feature from input speech, where the training data is MOCHA database[1]. Mel frequency cepstrum coefficients in conjunction with articulatory feature are used as multifactor feature. Deep tensor neural network which involves tensor interactions among neurons is used as the acoustic model in this system. Speech recognition results indicate that the multifactor feature helps in improving speech recognition performance not only under clean conditions but also under noisy background conditions; deep tensor neural network is more capable of modeling multifactor features because of its tensor interactions than deep neural network.

UR - http://www.scopus.com/inward/record.url?scp=85082390237&partnerID=8YFLogxK

U2 - 10.1109/APSIPAASC47483.2019.9023251

DO - 10.1109/APSIPAASC47483.2019.9023251

M3 - Conference contribution

AN - SCOPUS:85082390237

T3 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

SP - 650

EP - 654

BT - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Y2 - 18 November 2019 through 21 November 2019

ER -

Shan Y, Liu M, Zhan Q, Du S, Wang J, Xie X. Speech recognition based on deep tensor neural network and multifactor feature. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 650-654. 9023251. (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019). doi: 10.1109/APSIPAASC47483.2019.9023251

Speech recognition based on deep tensor neural network and multifactor feature

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this