Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection

Jiabin Liu; Huadong Wang; Hanyuan Hang; Shumin Ma; Xin Shen; Yong Shi

doi:10.1109/TNNLS.2023.3348833

Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection

Jiabin Liu, Huadong Wang, Hanyuan Hang, Shumin Ma, Xin Shen, Yong Shi

School of Information and Electronics

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers’ ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest)”, which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.

Original language	English
Pages (from-to)	1-15
Number of pages	15
Journal	IEEE Transactions on Neural Networks and Learning Systems
DOIs	https://doi.org/10.1109/TNNLS.2023.3348833
Publication status	Accepted/In press - 2024

Keywords

Anomaly detection
Anomaly detection
Classification tree analysis
Forestry
Fourier transforms
Random forests
Self-supervised learning
Task analysis
data distribution
random Fourier transform (RFT)
random forest (RF) classifier
random orthogonal rotations
self-supervised learning

Access to Document

10.1109/TNNLS.2023.3348833

Cite this

@article{a6e92b31cae84cd3881504df13dd6ee2,

title = "Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection",

abstract = "Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers{\textquoteright} ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest)”, which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.",

keywords = "Anomaly detection, Anomaly detection, Classification tree analysis, Forestry, Fourier transforms, Random forests, Self-supervised learning, Task analysis, data distribution, random Fourier transform (RFT), random forest (RF) classifier, random orthogonal rotations, self-supervised learning",

author = "Jiabin Liu and Huadong Wang and Hanyuan Hang and Shumin Ma and Xin Shen and Yong Shi",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TNNLS.2023.3348833",

language = "English",

pages = "1--15",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

}

TY - JOUR

T1 - Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection

AU - Liu, Jiabin

AU - Wang, Huadong

AU - Hang, Hanyuan

AU - Ma, Shumin

AU - Shen, Xin

AU - Shi, Yong

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers’ ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest)”, which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.

AB - Anomaly detection, the task of differentiating abnormal data points from normal ones, presents a significant challenge in the realm of machine learning. Numerous strategies have been proposed to tackle this task, with classification-based methods, specifically those utilizing a self-supervised approach via random affine transformations (RATs), demonstrating remarkable performance on both image and non-image data. However, these methods encounter a notable bottleneck, the overlap of constructed labeled datasets across categories, which hampers the subsequent classifiers’ ability to detect anomalies. Consequently, the creation of an effective data distribution becomes the pivotal factor for success. In this article, we introduce a model called “self-supervised forest (sForest)”, which leverages the random Fourier transform (RFT) and random orthogonal rotations to craft a controlled data distribution. Our model utilizes the RFT to map input data into a new feature space. With this transformed data, we create a self-labeled training dataset using random orthogonal rotations. We theoretically prove that the data distribution formulated by our methodology is more stable compared to one derived from RATs. We then use the self-labeled dataset in a random forest (RF) classifier to distinguish between normal and anomalous data points. Comprehensive experiments conducted on both real and artificial datasets illustrate that sForest outperforms other anomaly detection methods, including distance-based, kernel-based, forest-based, and network-based benchmarks.

KW - Anomaly detection

KW - Classification tree analysis

KW - Forestry

KW - Fourier transforms

KW - Random forests

KW - Self-supervised learning

KW - Task analysis

KW - data distribution

KW - random Fourier transform (RFT)

KW - random forest (RF) classifier

KW - random orthogonal rotations

KW - self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85183941915&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2023.3348833

DO - 10.1109/TNNLS.2023.3348833

M3 - Article

AN - SCOPUS:85183941915

SN - 2162-237X

SP - 1

EP - 15

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

ER -

Self-Supervised Random Forest on Transformed Distribution for Anomaly Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this