TY - JOUR
T1 - MSLR
T2 - A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction
AU - Weng, Xutao
AU - Song, Hong
AU - Lin, Yucong
AU - Zhang, Xi
AU - Liu, Bowen
AU - Wu, You
AU - Yang, Jian
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/3
Y1 - 2024/3
N2 - Tabular data are widely used for prediction tasks, but they often suffer from the curse of dimensionality and noise, leading to degradation in the performance and robustness of prediction models. Self-supervised representation learning has emerged as a promising technique to overcome these challenges, but most existing methods are applicable to images, text, and others rather than tabular data. In this study, we propose a novel self-supervised representation learning method for tabular data based on multi-scale ladder reconstruction (MSLR). The method effectively learns low-dimensional and noise-resistant representations, thereby improving the prediction performance across various tabular datasets. The idea of MSLR is to employ a binning method to generate a sequence of fuzzy data with different noise scales, followed by training a neural network to recover the raw data from the most corrupted data in a circular manner. This process allows MSLR to learn fine-grained changes caused by noise while maintaining consistency and similarity at a coarse granularity. The proposed method is evaluated on five real-world datasets, namely, MIMIC-IV, Thyroid, Heart, Pima, and Adult, and compared with several baselines. The experimental results of downstream prediction tasks show that MSLR is robust to noisy data and performs better than other existing baseline methods.
AB - Tabular data are widely used for prediction tasks, but they often suffer from the curse of dimensionality and noise, leading to degradation in the performance and robustness of prediction models. Self-supervised representation learning has emerged as a promising technique to overcome these challenges, but most existing methods are applicable to images, text, and others rather than tabular data. In this study, we propose a novel self-supervised representation learning method for tabular data based on multi-scale ladder reconstruction (MSLR). The method effectively learns low-dimensional and noise-resistant representations, thereby improving the prediction performance across various tabular datasets. The idea of MSLR is to employ a binning method to generate a sequence of fuzzy data with different noise scales, followed by training a neural network to recover the raw data from the most corrupted data in a circular manner. This process allows MSLR to learn fine-grained changes caused by noise while maintaining consistency and similarity at a coarse granularity. The proposed method is evaluated on five real-world datasets, namely, MIMIC-IV, Thyroid, Heart, Pima, and Adult, and compared with several baselines. The experimental results of downstream prediction tasks show that MSLR is robust to noisy data and performs better than other existing baseline methods.
KW - Binning
KW - Multi-scale
KW - Representation learning
KW - Self-supervised learning
KW - Tabular data
UR - http://www.scopus.com/inward/record.url?scp=85183105678&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2024.120108
DO - 10.1016/j.ins.2024.120108
M3 - Article
AN - SCOPUS:85183105678
SN - 0020-0255
VL - 660
JO - Information Sciences
JF - Information Sciences
M1 - 120108
ER -