摘要
Learning from label proportions (LLP), in which the training data is in the form of bags and only the proportion of each class in each bag is available, has attracted wide interest in machine learning. However, how to solve high-dimensional LLP problem is still a challenging task. In this paper, we propose a novel algorithm called learning from label proportions based on random forests (LLP-RF), which has the advantage of dealing with high-dimensional LLP problem. First, by defining the hidden class labels inside target bags as random variables, we formulate a robust loss function based on random forests and take the corresponding proportion information into LLP-RF by penalizing the difference between the ground truth and estimated label proportion. Second, a simple but efficient alternating annealing method is employed to solve the corresponding optimization model. At last, various experiments demonstrate that our algorithm can obtain the best accuracies on high-dimensional data compared with several recently developed methods.
源语言 | 英语 |
---|---|
页(从-至) | 9-18 |
页数 | 10 |
期刊 | Neural Networks |
卷 | 103 |
DOI | |
出版状态 | 已出版 - 7月 2018 |
已对外发布 | 是 |