Prior Guided Transformer for Accurate Radiology Reports Generation

Bin Yan; Mingtao Pei; Meng Zhao; Caifeng Shan; Zhaoxing Tian

doi:10.1109/JBHI.2022.3197162

Prior Guided Transformer for Accurate Radiology Reports Generation

Bin Yan, Mingtao Pei, Meng Zhao^*, Caifeng Shan^*, Zhaoxing Tian

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

11 引用（Scopus）

摘要

In this paper, we propose a prior guided transformer for accurate radiology reports generation. In the encoder part, a radiograph is firstly represented by a set of patch features, which is obtained through a convolutional neural network and a traditional transformer encoder. Then an Additive Gaussian model is applied to represent the prior knowledge based on unsupervised clustering and sparse attention. In the decoder part, prior embeddings are acquired by probabilistically sampling from the radiograph prior. Then the visual features, language embeddings, and prior embeddings are fused by our proposed Prior Guided Attention to generate accurate radiology reports. Experiment results show that our method achieves better performance than state-of-the-art methods on two public radiology datasets, which proves the effectiveness of our prior guided transformer.

源语言	英语
页（从-至）	5631-5640
页数	10
期刊	IEEE Journal of Biomedical and Health Informatics
卷	26
期	11
DOI	https://doi.org/10.1109/JBHI.2022.3197162
出版状态	已出版 - 1 11月 2022

访问文件

10.1109/JBHI.2022.3197162

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{1dfe99621b164c27b784c9099e0f34ce,

title = "Prior Guided Transformer for Accurate Radiology Reports Generation",

abstract = "In this paper, we propose a prior guided transformer for accurate radiology reports generation. In the encoder part, a radiograph is firstly represented by a set of patch features, which is obtained through a convolutional neural network and a traditional transformer encoder. Then an Additive Gaussian model is applied to represent the prior knowledge based on unsupervised clustering and sparse attention. In the decoder part, prior embeddings are acquired by probabilistically sampling from the radiograph prior. Then the visual features, language embeddings, and prior embeddings are fused by our proposed Prior Guided Attention to generate accurate radiology reports. Experiment results show that our method achieves better performance than state-of-the-art methods on two public radiology datasets, which proves the effectiveness of our prior guided transformer.",

keywords = "Transformer, prior knowledge, radiology reports generation, sparse attention",

author = "Bin Yan and Mingtao Pei and Meng Zhao and Caifeng Shan and Zhaoxing Tian",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2022",

month = nov,

day = "1",

doi = "10.1109/JBHI.2022.3197162",

language = "English",

volume = "26",

pages = "5631--5640",

journal = "IEEE Journal of Biomedical and Health Informatics",

issn = "2168-2194",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "11",

}

TY - JOUR

T1 - Prior Guided Transformer for Accurate Radiology Reports Generation

AU - Yan, Bin

AU - Pei, Mingtao

AU - Zhao, Meng

AU - Shan, Caifeng

AU - Tian, Zhaoxing

PY - 2022/11/1

Y1 - 2022/11/1

N2 - In this paper, we propose a prior guided transformer for accurate radiology reports generation. In the encoder part, a radiograph is firstly represented by a set of patch features, which is obtained through a convolutional neural network and a traditional transformer encoder. Then an Additive Gaussian model is applied to represent the prior knowledge based on unsupervised clustering and sparse attention. In the decoder part, prior embeddings are acquired by probabilistically sampling from the radiograph prior. Then the visual features, language embeddings, and prior embeddings are fused by our proposed Prior Guided Attention to generate accurate radiology reports. Experiment results show that our method achieves better performance than state-of-the-art methods on two public radiology datasets, which proves the effectiveness of our prior guided transformer.

AB - In this paper, we propose a prior guided transformer for accurate radiology reports generation. In the encoder part, a radiograph is firstly represented by a set of patch features, which is obtained through a convolutional neural network and a traditional transformer encoder. Then an Additive Gaussian model is applied to represent the prior knowledge based on unsupervised clustering and sparse attention. In the decoder part, prior embeddings are acquired by probabilistically sampling from the radiograph prior. Then the visual features, language embeddings, and prior embeddings are fused by our proposed Prior Guided Attention to generate accurate radiology reports. Experiment results show that our method achieves better performance than state-of-the-art methods on two public radiology datasets, which proves the effectiveness of our prior guided transformer.

KW - Transformer

KW - prior knowledge

KW - radiology reports generation

KW - sparse attention

UR - http://www.scopus.com/inward/record.url?scp=85136145965&partnerID=8YFLogxK

U2 - 10.1109/JBHI.2022.3197162

DO - 10.1109/JBHI.2022.3197162

M3 - Article

C2 - 35939478

AN - SCOPUS:85136145965

SN - 2168-2194

VL - 26

SP - 5631

EP - 5640

JO - IEEE Journal of Biomedical and Health Informatics

JF - IEEE Journal of Biomedical and Health Informatics

IS - 11

ER -

Prior Guided Transformer for Accurate Radiology Reports Generation

摘要

访问文件

其它文件与链接

指纹

引用此