Learning from label proportions on high-dimensional data

Yong Shi, Jiabin Liu, Zhiquan Qi*, Bo Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

20 Citations (Scopus)

Abstract

Learning from label proportions (LLP), in which the training data is in the form of bags and only the proportion of each class in each bag is available, has attracted wide interest in machine learning. However, how to solve high-dimensional LLP problem is still a challenging task. In this paper, we propose a novel algorithm called learning from label proportions based on random forests (LLP-RF), which has the advantage of dealing with high-dimensional LLP problem. First, by defining the hidden class labels inside target bags as random variables, we formulate a robust loss function based on random forests and take the corresponding proportion information into LLP-RF by penalizing the difference between the ground truth and estimated label proportion. Second, a simple but efficient alternating annealing method is employed to solve the corresponding optimization model. At last, various experiments demonstrate that our algorithm can obtain the best accuracies on high-dimensional data compared with several recently developed methods.

Original languageEnglish
Pages (from-to)9-18
Number of pages10
JournalNeural Networks
Volume103
DOIs
Publication statusPublished - Jul 2018
Externally publishedYes

Keywords

  • High-dimensional data
  • Learning from label proportions (LLP)
  • Optimization
  • Random forests

Fingerprint

Dive into the research topics of 'Learning from label proportions on high-dimensional data'. Together they form a unique fingerprint.

Cite this