TY - JOUR
T1 - Fp-zernike
T2 - An open-source structural database construction toolkit for fast structure retrieval
AU - Qi, Junhai
AU - Feng, Chenjie
AU - Shi, Yulin
AU - Yang, Jianyi
AU - Zhang, Fa
AU - Li, Guojun
AU - Han, Renmin
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-The-Art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-The-Art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/ biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.
AB - The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-The-Art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-The-Art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/ biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.
KW - Open-source
KW - PDB dataset
KW - Retrieval system
KW - Structure alignment
KW - Zernike descriptor
UR - http://www.scopus.com/inward/record.url?scp=85196608997&partnerID=8YFLogxK
U2 - 10.1093/gpbjnl/qzae007
DO - 10.1093/gpbjnl/qzae007
M3 - Article
C2 - 38894604
AN - SCOPUS:85196608997
SN - 1672-0229
VL - 22
JO - Genomics, Proteomics and Bioinformatics
JF - Genomics, Proteomics and Bioinformatics
IS - 1
M1 - qzae007
ER -