MSLR: A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction

Xutao Weng; Hong Song; Yucong Lin; Xi Zhang; Bowen Liu; You Wu; Jian Yang

doi:10.1016/j.ins.2024.120108

MSLR: A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction

Xutao Weng, Hong Song^*, Yucong Lin, Xi Zhang, Bowen Liu, You Wu, Jian Yang

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Tabular data are widely used for prediction tasks, but they often suffer from the curse of dimensionality and noise, leading to degradation in the performance and robustness of prediction models. Self-supervised representation learning has emerged as a promising technique to overcome these challenges, but most existing methods are applicable to images, text, and others rather than tabular data. In this study, we propose a novel self-supervised representation learning method for tabular data based on multi-scale ladder reconstruction (MSLR). The method effectively learns low-dimensional and noise-resistant representations, thereby improving the prediction performance across various tabular datasets. The idea of MSLR is to employ a binning method to generate a sequence of fuzzy data with different noise scales, followed by training a neural network to recover the raw data from the most corrupted data in a circular manner. This process allows MSLR to learn fine-grained changes caused by noise while maintaining consistency and similarity at a coarse granularity. The proposed method is evaluated on five real-world datasets, namely, MIMIC-IV, Thyroid, Heart, Pima, and Adult, and compared with several baselines. The experimental results of downstream prediction tasks show that MSLR is robust to noisy data and performs better than other existing baseline methods.

源语言	英语
文章编号	120108
期刊	Information Sciences
卷	660
DOI	https://doi.org/10.1016/j.ins.2024.120108
出版状态	已出版 - 3月 2024

访问文件

10.1016/j.ins.2024.120108

其它文件与链接

链接到 Scopus 的出版物

引用此

Weng, X., Song, H., Lin, Y., Zhang, X., Liu, B., Wu, Y., & Yang, J. (2024). MSLR: A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction. Information Sciences, 660, 文章 120108. https://doi.org/10.1016/j.ins.2024.120108

@article{499f502c69004e3d8f62d6b6610ec411,

title = "MSLR: A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction",

abstract = "Tabular data are widely used for prediction tasks, but they often suffer from the curse of dimensionality and noise, leading to degradation in the performance and robustness of prediction models. Self-supervised representation learning has emerged as a promising technique to overcome these challenges, but most existing methods are applicable to images, text, and others rather than tabular data. In this study, we propose a novel self-supervised representation learning method for tabular data based on multi-scale ladder reconstruction (MSLR). The method effectively learns low-dimensional and noise-resistant representations, thereby improving the prediction performance across various tabular datasets. The idea of MSLR is to employ a binning method to generate a sequence of fuzzy data with different noise scales, followed by training a neural network to recover the raw data from the most corrupted data in a circular manner. This process allows MSLR to learn fine-grained changes caused by noise while maintaining consistency and similarity at a coarse granularity. The proposed method is evaluated on five real-world datasets, namely, MIMIC-IV, Thyroid, Heart, Pima, and Adult, and compared with several baselines. The experimental results of downstream prediction tasks show that MSLR is robust to noisy data and performs better than other existing baseline methods.",

keywords = "Binning, Multi-scale, Representation learning, Self-supervised learning, Tabular data",

author = "Xutao Weng and Hong Song and Yucong Lin and Xi Zhang and Bowen Liu and You Wu and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Inc.",

year = "2024",

month = mar,

doi = "10.1016/j.ins.2024.120108",

language = "English",

volume = "660",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - MSLR

T2 - A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction

AU - Weng, Xutao

AU - Song, Hong

AU - Lin, Yucong

AU - Zhang, Xi

AU - Liu, Bowen

AU - Wu, You

AU - Yang, Jian

PY - 2024/3

Y1 - 2024/3

N2 - Tabular data are widely used for prediction tasks, but they often suffer from the curse of dimensionality and noise, leading to degradation in the performance and robustness of prediction models. Self-supervised representation learning has emerged as a promising technique to overcome these challenges, but most existing methods are applicable to images, text, and others rather than tabular data. In this study, we propose a novel self-supervised representation learning method for tabular data based on multi-scale ladder reconstruction (MSLR). The method effectively learns low-dimensional and noise-resistant representations, thereby improving the prediction performance across various tabular datasets. The idea of MSLR is to employ a binning method to generate a sequence of fuzzy data with different noise scales, followed by training a neural network to recover the raw data from the most corrupted data in a circular manner. This process allows MSLR to learn fine-grained changes caused by noise while maintaining consistency and similarity at a coarse granularity. The proposed method is evaluated on five real-world datasets, namely, MIMIC-IV, Thyroid, Heart, Pima, and Adult, and compared with several baselines. The experimental results of downstream prediction tasks show that MSLR is robust to noisy data and performs better than other existing baseline methods.

AB - Tabular data are widely used for prediction tasks, but they often suffer from the curse of dimensionality and noise, leading to degradation in the performance and robustness of prediction models. Self-supervised representation learning has emerged as a promising technique to overcome these challenges, but most existing methods are applicable to images, text, and others rather than tabular data. In this study, we propose a novel self-supervised representation learning method for tabular data based on multi-scale ladder reconstruction (MSLR). The method effectively learns low-dimensional and noise-resistant representations, thereby improving the prediction performance across various tabular datasets. The idea of MSLR is to employ a binning method to generate a sequence of fuzzy data with different noise scales, followed by training a neural network to recover the raw data from the most corrupted data in a circular manner. This process allows MSLR to learn fine-grained changes caused by noise while maintaining consistency and similarity at a coarse granularity. The proposed method is evaluated on five real-world datasets, namely, MIMIC-IV, Thyroid, Heart, Pima, and Adult, and compared with several baselines. The experimental results of downstream prediction tasks show that MSLR is robust to noisy data and performs better than other existing baseline methods.

KW - Binning

KW - Multi-scale

KW - Representation learning

KW - Self-supervised learning

KW - Tabular data

UR - http://www.scopus.com/inward/record.url?scp=85183105678&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2024.120108

DO - 10.1016/j.ins.2024.120108

M3 - Article

AN - SCOPUS:85183105678

SN - 0020-0255

VL - 660

JO - Information Sciences

JF - Information Sciences

M1 - 120108

ER -

MSLR: A Self-supervised Representation Learning Method for Tabular Data Based on Multi-scale Ladder Reconstruction

摘要

访问文件

其它文件与链接

指纹

引用此