TY - GEN
T1 - Feature Augmentation with Reinforcement Learning
AU - Liu, Jiabin
AU - Chai, Chengliang
AU - Luo, Yuyu
AU - Lou, Yin
AU - Feng, Jianhua
AU - Tang, Nan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.
AB - Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.
KW - Feature Augmentation
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85136435706&partnerID=8YFLogxK
U2 - 10.1109/ICDE53745.2022.00317
DO - 10.1109/ICDE53745.2022.00317
M3 - Conference contribution
AN - SCOPUS:85136435706
T3 - Proceedings - International Conference on Data Engineering
SP - 3360
EP - 3372
BT - Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PB - IEEE Computer Society
T2 - 38th IEEE International Conference on Data Engineering, ICDE 2022
Y2 - 9 May 2022 through 12 May 2022
ER -