TY - JOUR
T1 - OreFormer
T2 - Ore Sorting Transformer Based on ConvNet and Visual Attention
AU - Liu, Yang
AU - Wang, Xueyi
AU - Zhang, Zelin
AU - Deng, Fang
N1 - Publisher Copyright:
© International Association for Mathematical Geosciences 2024.
PY - 2024/4
Y1 - 2024/4
N2 - Intelligent ore sorting stands as a pivotal technology in contemporary mining and production. To establish a more efficient general framework for mineral image recognition, this paper proposes to combine the inductive biases of convolutional neural networks (i.e., locality and translation invariance) with the non-local qualities of transformer architecture (i.e., globality and long-range dependencies) and introduce the self-attention mechanism, which leads to a new series model, called OreFormer. In fine-grained coal sorting, OreFormer demonstrated preferred performance across gas coal, coking coal, and anthracite, which are higher than the purely ConvNet or vision transformer. At the same time, OreFormer variant models achieved a preferred tradeoff between classification performance and efficiency (i.e., they can simultaneously maintain higher classification accuracy), less computational complexity, and smaller model size. In addition, OreFormer had excellent discriminative and feature representation capability, which can distinguish accurately mineral particles with minor apparent differences between categories.
AB - Intelligent ore sorting stands as a pivotal technology in contemporary mining and production. To establish a more efficient general framework for mineral image recognition, this paper proposes to combine the inductive biases of convolutional neural networks (i.e., locality and translation invariance) with the non-local qualities of transformer architecture (i.e., globality and long-range dependencies) and introduce the self-attention mechanism, which leads to a new series model, called OreFormer. In fine-grained coal sorting, OreFormer demonstrated preferred performance across gas coal, coking coal, and anthracite, which are higher than the purely ConvNet or vision transformer. At the same time, OreFormer variant models achieved a preferred tradeoff between classification performance and efficiency (i.e., they can simultaneously maintain higher classification accuracy), less computational complexity, and smaller model size. In addition, OreFormer had excellent discriminative and feature representation capability, which can distinguish accurately mineral particles with minor apparent differences between categories.
KW - Fine-grained multi-type and multi-class
KW - Locality
KW - Long-range dependencies
KW - Ore sorting
KW - Self-attention
UR - http://www.scopus.com/inward/record.url?scp=85182188227&partnerID=8YFLogxK
U2 - 10.1007/s11053-023-10298-x
DO - 10.1007/s11053-023-10298-x
M3 - Article
AN - SCOPUS:85182188227
SN - 1520-7439
VL - 33
SP - 521
EP - 538
JO - Natural Resources Research
JF - Natural Resources Research
IS - 2
ER -