Feature Augmentation with Reinforcement Learning

Jiabin Liu, Chengliang Chai*, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

27 引用 (Scopus)

摘要

Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

源语言英语
主期刊名Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
出版商IEEE Computer Society
3360-3372
页数13
ISBN(电子版)9781665408837
DOI
出版状态已出版 - 2022
已对外发布
活动38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Online, 马来西亚
期限: 9 5月 202212 5月 2022

出版系列

姓名Proceedings - International Conference on Data Engineering
2022-May
ISSN(印刷版)1084-4627

会议

会议38th IEEE International Conference on Data Engineering, ICDE 2022
国家/地区马来西亚
Virtual, Online
时期9/05/2212/05/22

指纹

探究 'Feature Augmentation with Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此

Liu, J., Chai, C., Luo, Y., Lou, Y., Feng, J., & Tang, N. (2022). Feature Augmentation with Reinforcement Learning. 在 Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022 (页码 3360-3372). (Proceedings - International Conference on Data Engineering; 卷 2022-May). IEEE Computer Society. https://doi.org/10.1109/ICDE53745.2022.00317