One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting

Miao Zhang; Huiqi Li; Shirui Pan; Xiaojun Chang; Chuan Zhou; Zongyuan Ge; Steven Su

doi:10.1109/TPAMI.2020.3035351

One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting

Miao Zhang, Huiqi Li^*, Shirui Pan^*, Xiaojun Chang, Chuan Zhou, Zongyuan Ge, Steven Su

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

42 Citations (Scopus)

Abstract

One-shot neural architecture search (NAS) has recently become mainstream in the NAS community because it significantly improves computational efficiency through weight sharing. However, the supernet training paradigm in one-shot NAS introduces catastrophic forgetting, where each step of the training can deteriorate the performance of other architectures that contain partially-shared weights with current architecture. To overcome this problem of catastrophic forgetting, we formulate supernet training for one-shot NAS as a constrained continual learning optimization problem such that learning the current architecture does not degrade the validation accuracy of previous architectures. The key to solving this constrained optimization problem is a novelty search based architecture selection (NSAS) loss function that regularizes the supernet training by using a greedy novelty search method to find the most representative subset. We applied the NSAS loss function to two one-shot NAS baselines and extensively tested them on both a common search space and a NAS benchmark dataset. We further derive three variants based on the NSAS loss function, the NSAS with depth constrain (NSAS-C) to improve the transferability, and NSAS-G and NSAS-LG to handle the situation with a limited number of constraints. The experiments on the common NAS search space demonstrate that NSAS and it variants improve the predictive ability of supernet training in one-shot NAS with remarkable and efficient performance on the CIFAR-10, CIFAR-100, and ImageNet datasets. The results with the NAS benchmark dataset also confirm the significant improvements these one-shot NAS baselines can make.

Original language	English
Article number	9247292
Pages (from-to)	2921-2935
Number of pages	15
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	43
Issue number	9
DOIs	https://doi.org/10.1109/TPAMI.2020.3035351
Publication status	Published - 1 Sept 2021
Externally published	Yes

Keywords

AutoML
catastrophic forgetting
continual learning
neural architecture search
novelty search

Access to Document

10.1109/TPAMI.2020.3035351

Cite this

@article{53dca8b6a7be40c99b7214bf1ff8e8ee,

title = "One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting",

abstract = "One-shot neural architecture search (NAS) has recently become mainstream in the NAS community because it significantly improves computational efficiency through weight sharing. However, the supernet training paradigm in one-shot NAS introduces catastrophic forgetting, where each step of the training can deteriorate the performance of other architectures that contain partially-shared weights with current architecture. To overcome this problem of catastrophic forgetting, we formulate supernet training for one-shot NAS as a constrained continual learning optimization problem such that learning the current architecture does not degrade the validation accuracy of previous architectures. The key to solving this constrained optimization problem is a novelty search based architecture selection (NSAS) loss function that regularizes the supernet training by using a greedy novelty search method to find the most representative subset. We applied the NSAS loss function to two one-shot NAS baselines and extensively tested them on both a common search space and a NAS benchmark dataset. We further derive three variants based on the NSAS loss function, the NSAS with depth constrain (NSAS-C) to improve the transferability, and NSAS-G and NSAS-LG to handle the situation with a limited number of constraints. The experiments on the common NAS search space demonstrate that NSAS and it variants improve the predictive ability of supernet training in one-shot NAS with remarkable and efficient performance on the CIFAR-10, CIFAR-100, and ImageNet datasets. The results with the NAS benchmark dataset also confirm the significant improvements these one-shot NAS baselines can make.",

keywords = "AutoML, catastrophic forgetting, continual learning, neural architecture search, novelty search",

author = "Miao Zhang and Huiqi Li and Shirui Pan and Xiaojun Chang and Chuan Zhou and Zongyuan Ge and Steven Su",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2021",

month = sep,

day = "1",

doi = "10.1109/TPAMI.2020.3035351",

language = "English",

volume = "43",

pages = "2921--2935",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "9",

}

TY - JOUR

T1 - One-Shot Neural Architecture Search

T2 - Maximising Diversity to Overcome Catastrophic Forgetting

AU - Zhang, Miao

AU - Li, Huiqi

AU - Pan, Shirui

AU - Chang, Xiaojun

AU - Zhou, Chuan

AU - Ge, Zongyuan

AU - Su, Steven

PY - 2021/9/1

Y1 - 2021/9/1

N2 - One-shot neural architecture search (NAS) has recently become mainstream in the NAS community because it significantly improves computational efficiency through weight sharing. However, the supernet training paradigm in one-shot NAS introduces catastrophic forgetting, where each step of the training can deteriorate the performance of other architectures that contain partially-shared weights with current architecture. To overcome this problem of catastrophic forgetting, we formulate supernet training for one-shot NAS as a constrained continual learning optimization problem such that learning the current architecture does not degrade the validation accuracy of previous architectures. The key to solving this constrained optimization problem is a novelty search based architecture selection (NSAS) loss function that regularizes the supernet training by using a greedy novelty search method to find the most representative subset. We applied the NSAS loss function to two one-shot NAS baselines and extensively tested them on both a common search space and a NAS benchmark dataset. We further derive three variants based on the NSAS loss function, the NSAS with depth constrain (NSAS-C) to improve the transferability, and NSAS-G and NSAS-LG to handle the situation with a limited number of constraints. The experiments on the common NAS search space demonstrate that NSAS and it variants improve the predictive ability of supernet training in one-shot NAS with remarkable and efficient performance on the CIFAR-10, CIFAR-100, and ImageNet datasets. The results with the NAS benchmark dataset also confirm the significant improvements these one-shot NAS baselines can make.

AB - One-shot neural architecture search (NAS) has recently become mainstream in the NAS community because it significantly improves computational efficiency through weight sharing. However, the supernet training paradigm in one-shot NAS introduces catastrophic forgetting, where each step of the training can deteriorate the performance of other architectures that contain partially-shared weights with current architecture. To overcome this problem of catastrophic forgetting, we formulate supernet training for one-shot NAS as a constrained continual learning optimization problem such that learning the current architecture does not degrade the validation accuracy of previous architectures. The key to solving this constrained optimization problem is a novelty search based architecture selection (NSAS) loss function that regularizes the supernet training by using a greedy novelty search method to find the most representative subset. We applied the NSAS loss function to two one-shot NAS baselines and extensively tested them on both a common search space and a NAS benchmark dataset. We further derive three variants based on the NSAS loss function, the NSAS with depth constrain (NSAS-C) to improve the transferability, and NSAS-G and NSAS-LG to handle the situation with a limited number of constraints. The experiments on the common NAS search space demonstrate that NSAS and it variants improve the predictive ability of supernet training in one-shot NAS with remarkable and efficient performance on the CIFAR-10, CIFAR-100, and ImageNet datasets. The results with the NAS benchmark dataset also confirm the significant improvements these one-shot NAS baselines can make.

KW - AutoML

KW - catastrophic forgetting

KW - continual learning

KW - neural architecture search

KW - novelty search

UR - http://www.scopus.com/inward/record.url?scp=85096110886&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2020.3035351

DO - 10.1109/TPAMI.2020.3035351

M3 - Article

C2 - 33147140

AN - SCOPUS:85096110886

SN - 0162-8828

VL - 43

SP - 2921

EP - 2935

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 9

M1 - 9247292

ER -

One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this