Feature Augmentation with Reinforcement Learning

Jiabin Liu, Chengliang Chai*, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Citations (Scopus)

Abstract

Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PublisherIEEE Computer Society
Pages3360-3372
Number of pages13
ISBN (Electronic)9781665408837
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Online, Malaysia
Duration: 9 May 202212 May 2022

Publication series

NameProceedings - International Conference on Data Engineering
Volume2022-May
ISSN (Print)1084-4627

Conference

Conference38th IEEE International Conference on Data Engineering, ICDE 2022
Country/TerritoryMalaysia
CityVirtual, Online
Period9/05/2212/05/22

Keywords

  • Feature Augmentation
  • Machine Learning

Fingerprint

Dive into the research topics of 'Feature Augmentation with Reinforcement Learning'. Together they form a unique fingerprint.

Cite this