Feature Augmentation with Reinforcement Learning

Jiabin Liu; Chengliang Chai; Yuyu Luo; Yin Lou; Jianhua Feng; Nan Tang

doi:10.1109/ICDE53745.2022.00317

Feature Augmentation with Reinforcement Learning

Jiabin Liu, Chengliang Chai^*, Yuyu Luo, Yin Lou, Jianhua Feng, Nan Tang

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

27 引用（Scopus）

摘要

Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

源语言	英语
主期刊名	Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
出版商	IEEE Computer Society
页	3360-3372
页数	13
ISBN（电子版）	9781665408837
DOI	https://doi.org/10.1109/ICDE53745.2022.00317
出版状态	已出版 - 2022
已对外发布	是
活动	38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Online, 马来西亚期限: 9 5月 2022 → 12 5月 2022

出版系列

姓名	Proceedings - International Conference on Data Engineering
卷	2022-May
ISSN（印刷版）	1084-4627

会议

会议	38th IEEE International Conference on Data Engineering, ICDE 2022
国家/地区	马来西亚
市	Virtual, Online
时期	9/05/22 → 12/05/22

访问文件

10.1109/ICDE53745.2022.00317

其它文件与链接

链接到 Scopus 的出版物

引用此

Liu, J., Chai, C., Luo, Y., Lou, Y., Feng, J., & Tang, N. (2022). Feature Augmentation with Reinforcement Learning. 在 Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022 (页码 3360-3372). (Proceedings - International Conference on Data Engineering; 卷 2022-May). IEEE Computer Society. https://doi.org/10.1109/ICDE53745.2022.00317

@inproceedings{a8c4c03d6f6b4ee4bfb3b52688402370,

title = "Feature Augmentation with Reinforcement Learning",

abstract = "Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.",

keywords = "Feature Augmentation, Machine Learning",

author = "Jiabin Liu and Chengliang Chai and Yuyu Luo and Yin Lou and Jianhua Feng and Nan Tang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 38th IEEE International Conference on Data Engineering, ICDE 2022 ; Conference date: 09-05-2022 Through 12-05-2022",

year = "2022",

doi = "10.1109/ICDE53745.2022.00317",

language = "English",

series = "Proceedings - International Conference on Data Engineering",

publisher = "IEEE Computer Society",

pages = "3360--3372",

booktitle = "Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022",

address = "United States",

}

Liu, J, Chai, C, Luo, Y, Lou, Y, Feng, J & Tang, N 2022, Feature Augmentation with Reinforcement Learning. 在 Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022. Proceedings - International Conference on Data Engineering, 卷 2022-May, IEEE Computer Society, 页码 3360-3372, 38th IEEE International Conference on Data Engineering, ICDE 2022, Virtual, Online, 马来西亚, 9/05/22. https://doi.org/10.1109/ICDE53745.2022.00317

Feature Augmentation with Reinforcement Learning. / Liu, Jiabin; Chai, Chengliang; Luo, Yuyu 等.
Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022. IEEE Computer Society, 2022. 页码 3360-3372 (Proceedings - International Conference on Data Engineering; 卷 2022-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Feature Augmentation with Reinforcement Learning

AU - Liu, Jiabin

AU - Chai, Chengliang

AU - Luo, Yuyu

AU - Lou, Yin

AU - Feng, Jianhua

AU - Tang, Nan

PY - 2022

Y1 - 2022

N2 - Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

AB - Sufficient good features are indispensable to train well-performed machine learning models. However, it is com-mon that good features are not always enough, where feature augmentation is necessary to enrich high-quality features by joining with other tables. There are two main challenges for the problem. Given a set of tables where we can augment features from, the first challenge is that there are a lot of ways of joining multiple tables and deciding which features (or attributes) to use - selecting the best set of features to augment is hard. Moreover, we may need to materialize the join results for different join options, doing full materialization might be time consuming - efficient but approximate methods are needed. In this paper, we first introduce the design space of the feature augmentation problem. Then, to address the above challenges, we propose a reinforcement learning based framework, namely AutoFeature, to augment the features following an exploration-exploitation strategy. AutoFeature keeps exploring the features in tables that have led to performance improvement. At the same time, AutoFeature also exploits the tables (features) that are rarely selected. In this way, the search space of tables (features) to be augmented can be well explored and a subset of good features can be selected. AutoFeature utilizes sampling techniques to achieve high efficiency. We implement two algorithms, one with multi-arm bandit and the other with branch Deep Q Networks (branch DQN), to realize the framework of AutoFeature. We conducted experiments on three real-world datasets School/XuetangE/Air using 16/23/34 candidate tables with 695/204/338 candidate features. Extensive results show that AutoFeature outperforms other methods by 12.4% and 9.8% on AUC values on two classification datasets (School and XuetangE) and by 0.113 on the MSE value on Air in terms of the model performance.

KW - Feature Augmentation

KW - Machine Learning

UR - http://www.scopus.com/inward/record.url?scp=85136435706&partnerID=8YFLogxK

U2 - 10.1109/ICDE53745.2022.00317

DO - 10.1109/ICDE53745.2022.00317

M3 - Conference contribution

AN - SCOPUS:85136435706

T3 - Proceedings - International Conference on Data Engineering

SP - 3360

EP - 3372

BT - Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022

PB - IEEE Computer Society

T2 - 38th IEEE International Conference on Data Engineering, ICDE 2022

Y2 - 9 May 2022 through 12 May 2022

ER -

Feature Augmentation with Reinforcement Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此