Active learning for cross language text categorization

Yue Liu; Lin Dai; Weitao Zhou; Heyan Huang

doi:10.1007/978-3-642-30217-6_17

Active learning for cross language text categorization

Yue Liu, Lin Dai^*, Weitao Zhou, Heyan Huang

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Cross Language Text Categorization (CLTC) is the task of assigning class labels to documents written in a target language (e.g. Chinese) while the system is trained using labeled examples in a source language (e.g. English). With the technique of CLTC, we can build classifiers for multiple languages employing the existing training data in only one language, therefore avoid the cost of preparing training data for each individual language. One challenge for CLTC is the culture differences between languages, which causes the classifier trained on the source language doesn't perform well on the target language. In this paper, we propose an active learning algorithm for CLTC, which takes full advantage of both labeled data in the source language and unlabeled data in the target language. The classifier first learns the classification knowledge from the source language, and then learns the cultural dependent knowledge from the target language. In addition, we extend our algorithm to double viewed form by considering the source and target language as two views of the classification problem. Experiments show that our algorithm can effectively improve the cross language classification performance.

Original language	English
Title of host publication	Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings
Pages	195-206
Number of pages	12
Edition	PART 1
DOIs	https://doi.org/10.1007/978-3-642-30217-6_17
Publication status	Published - 2012
Event	16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012 - Kuala Lumpur, Malaysia Duration: 29 May 2012 → 1 Jun 2012

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Number	PART 1
Volume	7301 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012
Country/Territory	Malaysia
City	Kuala Lumpur
Period	29/05/12 → 1/06/12

Keywords

Active Learning
Cross Language Text Categorization

Access to Document

10.1007/978-3-642-30217-6_17

Cite this

Liu, Y., Dai, L., Zhou, W., & Huang, H. (2012). Active learning for cross language text categorization. In Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings (PART 1 ed., pp. 195-206). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7301 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-30217-6_17

Liu, Yue ; Dai, Lin ; Zhou, Weitao et al. / Active learning for cross language text categorization. Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings. PART 1. ed. 2012. pp. 195-206 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).

@inproceedings{e0a86f7d20e2440bbaa8c0b8e553297b,

title = "Active learning for cross language text categorization",

abstract = "Cross Language Text Categorization (CLTC) is the task of assigning class labels to documents written in a target language (e.g. Chinese) while the system is trained using labeled examples in a source language (e.g. English). With the technique of CLTC, we can build classifiers for multiple languages employing the existing training data in only one language, therefore avoid the cost of preparing training data for each individual language. One challenge for CLTC is the culture differences between languages, which causes the classifier trained on the source language doesn't perform well on the target language. In this paper, we propose an active learning algorithm for CLTC, which takes full advantage of both labeled data in the source language and unlabeled data in the target language. The classifier first learns the classification knowledge from the source language, and then learns the cultural dependent knowledge from the target language. In addition, we extend our algorithm to double viewed form by considering the source and target language as two views of the classification problem. Experiments show that our algorithm can effectively improve the cross language classification performance.",

keywords = "Active Learning, Cross Language Text Categorization",

author = "Yue Liu and Lin Dai and Weitao Zhou and Heyan Huang",

year = "2012",

doi = "10.1007/978-3-642-30217-6_17",

language = "English",

isbn = "9783642302169",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

number = "PART 1",

pages = "195--206",

booktitle = "Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings",

edition = "PART 1",

note = "16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012 ; Conference date: 29-05-2012 Through 01-06-2012",

}

Liu, Y, Dai, L, Zhou, W & Huang, H 2012, Active learning for cross language text categorization. in Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings. PART 1 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 7301 LNAI, pp. 195-206, 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012, Kuala Lumpur, Malaysia, 29/05/12. https://doi.org/10.1007/978-3-642-30217-6_17

Active learning for cross language text categorization. / Liu, Yue; Dai, Lin; Zhou, Weitao et al.
Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings. PART 1. ed. 2012. p. 195-206 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7301 LNAI, No. PART 1).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Active learning for cross language text categorization

AU - Liu, Yue

AU - Dai, Lin

AU - Zhou, Weitao

AU - Huang, Heyan

PY - 2012

Y1 - 2012

N2 - Cross Language Text Categorization (CLTC) is the task of assigning class labels to documents written in a target language (e.g. Chinese) while the system is trained using labeled examples in a source language (e.g. English). With the technique of CLTC, we can build classifiers for multiple languages employing the existing training data in only one language, therefore avoid the cost of preparing training data for each individual language. One challenge for CLTC is the culture differences between languages, which causes the classifier trained on the source language doesn't perform well on the target language. In this paper, we propose an active learning algorithm for CLTC, which takes full advantage of both labeled data in the source language and unlabeled data in the target language. The classifier first learns the classification knowledge from the source language, and then learns the cultural dependent knowledge from the target language. In addition, we extend our algorithm to double viewed form by considering the source and target language as two views of the classification problem. Experiments show that our algorithm can effectively improve the cross language classification performance.

AB - Cross Language Text Categorization (CLTC) is the task of assigning class labels to documents written in a target language (e.g. Chinese) while the system is trained using labeled examples in a source language (e.g. English). With the technique of CLTC, we can build classifiers for multiple languages employing the existing training data in only one language, therefore avoid the cost of preparing training data for each individual language. One challenge for CLTC is the culture differences between languages, which causes the classifier trained on the source language doesn't perform well on the target language. In this paper, we propose an active learning algorithm for CLTC, which takes full advantage of both labeled data in the source language and unlabeled data in the target language. The classifier first learns the classification knowledge from the source language, and then learns the cultural dependent knowledge from the target language. In addition, we extend our algorithm to double viewed form by considering the source and target language as two views of the classification problem. Experiments show that our algorithm can effectively improve the cross language classification performance.

KW - Active Learning

KW - Cross Language Text Categorization

UR - http://www.scopus.com/inward/record.url?scp=84861421272&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-30217-6_17

DO - 10.1007/978-3-642-30217-6_17

M3 - Conference contribution

AN - SCOPUS:84861421272

SN - 9783642302169

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 195

EP - 206

BT - Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings

T2 - 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2012

Y2 - 29 May 2012 through 1 June 2012

ER -

Liu Y, Dai L, Zhou W, Huang H. Active learning for cross language text categorization. In Advances in Knowledge Discovery and Data Mining - 16th Pacific-Asia Conference, PAKDD 2012, Proceedings. PART 1 ed. 2012. p. 195-206. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). doi: 10.1007/978-3-642-30217-6_17

Active learning for cross language text categorization

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this