TY - JOUR
T1 - MiniDBG
T2 - A Novel and Minimal De Bruijn Graph for Read Mapping
AU - Yu, Changyong
AU - Zhao, Yuhai
AU - Zhao, Chu
AU - Jin, Jianyu
AU - Mao, Keming
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - The De Bruijn graph (DBG) has been widely used in the algorithms for indexing or organizing read and reference sequences in bioinformatics. However, a DBG model that can locate each node, edge and path on sequence has not been proposed so far. Recently, DBG has been used for representing reference sequences in read mapping tasks. In this process, it is not a one-to-one correspondence between the paths of DBG and the substrings of reference sequence. This results in the false path on DBG, which means no substrings of reference producing the path. Moreover, if a candidate path of a read is true, we need to locate it and verify the candidate on sequence. To solve these problems, we proposed a DBG model, called MiniDBG, which stores the position lists of a minimal set of edges. With the position lists, MiniDBG can locate any node, edge and path efficiently. We also proposed algorithms for generating MiniDBG based on an original DBG and algorithms for locating edges or paths on sequence. We designed and ran experiments on real datasets for comparing them with BWT-based and position list-based methods. The experimental results show that MiniDBG can locate the edges and paths efficiently with lower memory costs.
AB - The De Bruijn graph (DBG) has been widely used in the algorithms for indexing or organizing read and reference sequences in bioinformatics. However, a DBG model that can locate each node, edge and path on sequence has not been proposed so far. Recently, DBG has been used for representing reference sequences in read mapping tasks. In this process, it is not a one-to-one correspondence between the paths of DBG and the substrings of reference sequence. This results in the false path on DBG, which means no substrings of reference producing the path. Moreover, if a candidate path of a read is true, we need to locate it and verify the candidate on sequence. To solve these problems, we proposed a DBG model, called MiniDBG, which stores the position lists of a minimal set of edges. With the position lists, MiniDBG can locate any node, edge and path efficiently. We also proposed algorithms for generating MiniDBG based on an original DBG and algorithms for locating edges or paths on sequence. We designed and ran experiments on real datasets for comparing them with BWT-based and position list-based methods. The experimental results show that MiniDBG can locate the edges and paths efficiently with lower memory costs.
KW - Bioinformatics databases
KW - Graph algorithms
KW - Indexing methods
KW - Queries
UR - http://www.scopus.com/inward/record.url?scp=85179820183&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2023.3340251
DO - 10.1109/TCBB.2023.3340251
M3 - Article
C2 - 38060353
AN - SCOPUS:85179820183
SN - 1545-5963
VL - 21
SP - 129
EP - 142
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 1
ER -