Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Shenwang Jiang; Jianan Li; Ying Wang; Bo Huang; Zhang Zhang; Tingfa Xu

Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Shenwang Jiang, Jianan Li^*, Ying Wang, Bo Huang, Zhang Zhang, Tingfa Xu

^*Corresponding author for this work

School of Optics and Photonics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

19 Citations (Scopus)

Abstract

Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which is to re-weight sample by designing weighting function. However, it is only applicable for training data containing only either one type of data biases. In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data. How to handle them simultaneously is a key but under-explored problem. In this paper, we find that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.

Original language	English
Title of host publication	AAAI-22 Technical Tracks 6
Publisher	Association for the Advancement of Artificial Intelligence
Pages	7024-7032
Number of pages	9
ISBN (Electronic)	1577358767, 9781577358763
Publication status	Published - 30 Jun 2022
Event	36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online Duration: 22 Feb 2022 → 1 Mar 2022

Publication series

Name	Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Volume	36

Conference

Conference	36th AAAI Conference on Artificial Intelligence, AAAI 2022
City	Virtual, Online
Period	22/02/22 → 1/03/22

Cite this

@inproceedings{5279ced6c411430d89a584e96a627062,

title = "Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data",

abstract = "Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which is to re-weight sample by designing weighting function. However, it is only applicable for training data containing only either one type of data biases. In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data. How to handle them simultaneously is a key but under-explored problem. In this paper, we find that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.",

author = "Shenwang Jiang and Jianan Li and Ying Wang and Bo Huang and Zhang Zhang and Tingfa Xu",

note = "Publisher Copyright: Copyright {\textcopyright} 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 36th AAAI Conference on Artificial Intelligence, AAAI 2022 ; Conference date: 22-02-2022 Through 01-03-2022",

year = "2022",

month = jun,

day = "30",

language = "English",

series = "Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022",

publisher = "Association for the Advancement of Artificial Intelligence",

pages = "7024--7032",

booktitle = "AAAI-22 Technical Tracks 6",

}

Jiang, S, Li, J, Wang, Y, Huang, B, Zhang, Z & Xu, T 2022, Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data. in AAAI-22 Technical Tracks 6. Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, vol. 36, Association for the Advancement of Artificial Intelligence, pp. 7024-7032, 36th AAAI Conference on Artificial Intelligence, AAAI 2022, Virtual, Online, 22/02/22.

Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data. / Jiang, Shenwang; Li, Jianan; Wang, Ying et al.
AAAI-22 Technical Tracks 6. Association for the Advancement of Artificial Intelligence, 2022. p. 7024-7032 (Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022; Vol. 36).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

AU - Jiang, Shenwang

AU - Li, Jianan

AU - Wang, Ying

AU - Huang, Bo

AU - Zhang, Zhang

AU - Xu, Tingfa

PY - 2022/6/30

Y1 - 2022/6/30

N2 - Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which is to re-weight sample by designing weighting function. However, it is only applicable for training data containing only either one type of data biases. In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data. How to handle them simultaneously is a key but under-explored problem. In this paper, we find that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.

AB - Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which is to re-weight sample by designing weighting function. However, it is only applicable for training data containing only either one type of data biases. In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data. How to handle them simultaneously is a key but under-explored problem. In this paper, we find that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.

UR - http://www.scopus.com/inward/record.url?scp=85136485686&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85136485686

T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022

SP - 7024

EP - 7032

BT - AAAI-22 Technical Tracks 6

PB - Association for the Advancement of Artificial Intelligence

T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022

Y2 - 22 February 2022 through 1 March 2022

ER -

Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this