Feature Augmentation with Reinforcement Learning

Jiabin Liu; Chengliang Chai; Yuyu Luo; Yin Lou; Jianhua Feng; Nan Tang

doi:10.1109/ICDE53745.2022.00317

Feature Augmentation with Reinforcement Learning

Jiabin Liu, Chengliang Chai^*, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

27 Citations (Scopus)

Abstract

Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

Original language	English
Title of host publication	Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
Publisher	IEEE Computer Society
Pages	3360-3372
Number of pages	13
ISBN (Electronic)	9781665408837
DOIs	https://doi.org/10.1109/ICDE53745.2022.00317
Publication status	Published - 2022
Externally published	Yes
Event	38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Online, Malaysia Duration: 9 May 2022 → 12 May 2022

Publication series

Name	Proceedings - International Conference on Data Engineering
Volume	2022-May
ISSN (Print)	1084-4627

Conference

Conference	38th IEEE International Conference on Data Engineering, ICDE 2022
Country/Territory	Malaysia
City	Virtual, Online
Period	9/05/22 → 12/05/22

Keywords

Feature Augmentation
Machine Learning

Access to Document

10.1109/ICDE53745.2022.00317

Cite this

Liu, J., Chai, C., Luo, Y., Lou, Y., Feng, J., & Tang, N. (2022). Feature Augmentation with Reinforcement Learning. In Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022 (pp. 3360-3372). (Proceedings - International Conference on Data Engineering; Vol. 2022-May). IEEE Computer Society. https://doi.org/10.1109/ICDE53745.2022.00317

@inproceedings{a8c4c03d6f6b4ee4bfb3b52688402370,

title = "Feature Augmentation with Reinforcement Learning",

abstract = "Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.",

keywords = "Feature Augmentation, Machine Learning",

author = "Jiabin Liu and Chengliang Chai and Yuyu Luo and Yin Lou and Jianhua Feng and Nan Tang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 38th IEEE International Conference on Data Engineering, ICDE 2022 ; Conference date: 09-05-2022 Through 12-05-2022",

year = "2022",

doi = "10.1109/ICDE53745.2022.00317",

language = "English",

series = "Proceedings - International Conference on Data Engineering",

publisher = "IEEE Computer Society",

pages = "3360--3372",

booktitle = "Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022",

address = "United States",

}

Liu, J, Chai, C, Luo, Y, Lou, Y, Feng, J & Tang, N 2022, Feature Augmentation with Reinforcement Learning. in Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022. Proceedings - International Conference on Data Engineering, vol. 2022-May, IEEE Computer Society, pp. 3360-3372, 38th IEEE International Conference on Data Engineering, ICDE 2022, Virtual, Online, Malaysia, 9/05/22. https://doi.org/10.1109/ICDE53745.2022.00317

Feature Augmentation with Reinforcement Learning. / Liu, Jiabin; Chai, Chengliang; Luo, Yuyu et al.
Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022. IEEE Computer Society, 2022. p. 3360-3372 (Proceedings - International Conference on Data Engineering; Vol. 2022-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Feature Augmentation with Reinforcement Learning

AU - Liu, Jiabin

AU - Chai, Chengliang

AU - Luo, Yuyu

AU - Lou, Yin

AU - Feng, Jianhua

AU - Tang, Nan

PY - 2022

Y1 - 2022

N2 - Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

AB - Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

KW - Feature Augmentation

KW - Machine Learning

UR - http://www.scopus.com/inward/record.url?scp=85136435706&partnerID=8YFLogxK

U2 - 10.1109/ICDE53745.2022.00317

DO - 10.1109/ICDE53745.2022.00317

M3 - Conference contribution

AN - SCOPUS:85136435706

T3 - Proceedings - International Conference on Data Engineering

SP - 3360

EP - 3372

BT - Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022

PB - IEEE Computer Society

T2 - 38th IEEE International Conference on Data Engineering, ICDE 2022

Y2 - 9 May 2022 through 12 May 2022

ER -

Feature Augmentation with Reinforcement Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this