ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition

Nada Boudjellal; Huaping Zhang; Asif Khan; Arshad Ahmad; Rashid Naseem; Jianyun Shang; Lin Dai

doi:10.1155/2021/6633213

ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition

Nada Boudjellal, Huaping Zhang^*, Asif Khan, Arshad Ahmad, Rashid Naseem, Jianyun Shang, Lin Dai

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

37 Citations (Scopus)

Abstract

The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.

Original language	English
Article number	6633213
Journal	Complexity
Volume	2021
DOIs	https://doi.org/10.1155/2021/6633213
Publication status	Published - 2021

Access to Document

10.1155/2021/6633213

Cite this

@article{2f6c61f01ef3485cba6fbe5a40ee60b3,

title = "ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition",

abstract = "The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.",

author = "Nada Boudjellal and Huaping Zhang and Asif Khan and Arshad Ahmad and Rashid Naseem and Jianyun Shang and Lin Dai",

note = "Publisher Copyright: {\textcopyright} 2021 Nada Boudjellal et al.",

year = "2021",

doi = "10.1155/2021/6633213",

language = "English",

volume = "2021",

journal = "Complexity",

issn = "1076-2787",

publisher = "Hindawi Limited",

}

TY - JOUR

T1 - ABioNER

T2 - A BERT-Based Model for Arabic Biomedical Named-Entity Recognition

AU - Boudjellal, Nada

AU - Zhang, Huaping

AU - Khan, Asif

AU - Ahmad, Arshad

AU - Naseem, Rashid

AU - Shang, Jianyun

AU - Dai, Lin

PY - 2021

Y1 - 2021

N2 - The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.

AB - The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.

UR - http://www.scopus.com/inward/record.url?scp=85103673081&partnerID=8YFLogxK

U2 - 10.1155/2021/6633213

DO - 10.1155/2021/6633213

M3 - Article

AN - SCOPUS:85103673081

SN - 1076-2787

VL - 2021

JO - Complexity

JF - Complexity

M1 - 6633213

ER -

ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition

Abstract

Access to Document

Other files and links

Fingerprint

Cite this