TY - JOUR
T1 - Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection
AU - Liu, Jiabin
AU - Wang, Huadong
AU - Hang, Hanyuan
AU - Ma, Shumin
AU - Shen, Xin
AU - Shi, Yong
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers’ ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest)”, which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.
AB - Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers’ ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest)”, which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.
KW - Anomaly detection
KW - Anomaly detection
KW - Classification tree analysis
KW - Forestry
KW - Fourier transforms
KW - Random forests
KW - Self-supervised learning
KW - Task analysis
KW - data distribution
KW - random Fourier transform (RFT)
KW - random forest (RF) classifier
KW - random orthogonal rotations
KW - self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85183941915&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2023.3348833
DO - 10.1109/TNNLS.2023.3348833
M3 - Article
AN - SCOPUS:85183941915
SN - 2162-237X
SP - 1
EP - 15
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -