dwMLCS: An Efficient MLCS Algorithm based on Dynamic and Weighted Directed Acyclic Graph

Changyong Yu, Dekuan Gao, Xu Guo, Haitao Ma, Yuhai Zhao, Guoren Wang

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The problem of finding the longest common subsequence (MLCS) for multiple sequences is a computationally intensive and challenging problem that has significant applications in various fields such as text comparison, pattern recognition, and gene diagnosis. Currently, the dominant point-based MLCS algorithms have become popular and extensively studied. Generally, they construct the directed acyclic graph (DAG) of matching points and convert the MLCS problem into a search for the longest paths in the DAG. Several improvements have been made, focusing on decreasing model size and reducing redundant computations. These include 1) hash methods for eliminating duplicated nodes, 2) dynamic structures for supporting smaller DAG and 3) path pruning strategy and so on. However, the algorithms are still too limited when facing large-scale MLCS problem due to 1) the dynamic structures are too time-consuming to maintain and 2) the path pruning relies heavily on the tightness of the lower and upper bound of the MLCS. These factors contribute to the large-scale MLCS problem remaining a challenge. We propose a novel algorithm for the large-scale MLCS problem, named dwMLCS. It is based on two models: one is a dynamic DAG model which is both space and time efficient. It can decrease the size of the DAG significantly. The other is a weighted DAG model with new successor strategies. With this model, we design the algorithm for finding a tighter lower bound of the MLCS. Then, the path pruning is conducted to further reduce the size of the DAG and eliminate redundant computation. Additionally, we propose an upper bound method for improving the efficiency of the path pruning strategy. The experimental results demonstrate that the effectiveness and efficiency of the models and algorithms proposed are better than state-of-the-art algorithms. The source codes of dwMLCS can be downloaded from web site <uri>https://github.com/BioLab310/dwMLCS</uri>.

Original languageEnglish
Pages (from-to)1-14
Number of pages14
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Approximation algorithms
  • Branch-and-bound algorithm
  • Computational modeling
  • Costs
  • Directed acyclic graph
  • Dynamic programming
  • Heuristic algorithms
  • Longest common subsequence
  • Lower bound
  • Search problems
  • Sorting

Fingerprint

Dive into the research topics of 'dwMLCS: An Efficient MLCS Algorithm based on Dynamic and Weighted Directed Acyclic Graph'. Together they form a unique fingerprint.

Cite this