A Two-Step Resume Information Extraction Algorithm

Jie Chen; Chunxia Zhang; Zhendong Niu

doi:10.1155/2018/5761287

A Two-Step Resume Information Extraction Algorithm

Jie Chen, Chunxia Zhang, Zhendong Niu^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

45 引用（Scopus）

摘要

With the rapid growth of Internet-based recruiting, there are a great number of personal resumes among recruiting systems. To gain more attention from the recruiters, most resumes are written in diverse formats, including varying font size, font colour, and table cells. However, the diversity of format is harmful to data mining, such as resume information extraction, automatic job matching, and candidates ranking. Supervised methods and rule-based methods have been proposed to extract facts from resumes, but they strongly rely on hierarchical structure information and large amounts of labelled data, which are hard to collect in reality. In this paper, we propose a two-step resume information extraction approach. In the first step, raw text of resume is identified as different resume blocks. To achieve the goal, we design a novel feature, Writing Style, to model sentence syntax information. Besides word index and punctuation index, word lexical attribute and prediction results of classifiers are included in Writing Style. In the second step, multiple classifiers are employed to identify different attributes of fact information in resumes. Experimental results on a real-world dataset show that the algorithm is feasible and effective.

源语言	英语
文章编号	5761287
期刊	Mathematical Problems in Engineering
卷	2018
DOI	https://doi.org/10.1155/2018/5761287
出版状态	已出版 - 2018

访问文件

10.1155/2018/5761287

其它文件与链接

链接到 Scopus 的出版物

引用此

Chen, J., Zhang, C., & Niu, Z. (2018). A Two-Step Resume Information Extraction Algorithm. Mathematical Problems in Engineering, 2018, 文章 5761287. https://doi.org/10.1155/2018/5761287

@article{aa0d0ba73dcb44e7a67863dafbca5754,

title = "A Two-Step Resume Information Extraction Algorithm",

abstract = "With the rapid growth of Internet-based recruiting, there are a great number of personal resumes among recruiting systems. To gain more attention from the recruiters, most resumes are written in diverse formats, including varying font size, font colour, and table cells. However, the diversity of format is harmful to data mining, such as resume information extraction, automatic job matching, and candidates ranking. Supervised methods and rule-based methods have been proposed to extract facts from resumes, but they strongly rely on hierarchical structure information and large amounts of labelled data, which are hard to collect in reality. In this paper, we propose a two-step resume information extraction approach. In the first step, raw text of resume is identified as different resume blocks. To achieve the goal, we design a novel feature, Writing Style, to model sentence syntax information. Besides word index and punctuation index, word lexical attribute and prediction results of classifiers are included in Writing Style. In the second step, multiple classifiers are employed to identify different attributes of fact information in resumes. Experimental results on a real-world dataset show that the algorithm is feasible and effective.",

author = "Jie Chen and Chunxia Zhang and Zhendong Niu",

note = "Publisher Copyright: {\textcopyright} 2018 Jie Chen et al.",

year = "2018",

doi = "10.1155/2018/5761287",

language = "English",

volume = "2018",

journal = "Mathematical Problems in Engineering",

issn = "1024-123X",

publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - A Two-Step Resume Information Extraction Algorithm

AU - Chen, Jie

AU - Zhang, Chunxia

AU - Niu, Zhendong

PY - 2018

Y1 - 2018

N2 - With the rapid growth of Internet-based recruiting, there are a great number of personal resumes among recruiting systems. To gain more attention from the recruiters, most resumes are written in diverse formats, including varying font size, font colour, and table cells. However, the diversity of format is harmful to data mining, such as resume information extraction, automatic job matching, and candidates ranking. Supervised methods and rule-based methods have been proposed to extract facts from resumes, but they strongly rely on hierarchical structure information and large amounts of labelled data, which are hard to collect in reality. In this paper, we propose a two-step resume information extraction approach. In the first step, raw text of resume is identified as different resume blocks. To achieve the goal, we design a novel feature, Writing Style, to model sentence syntax information. Besides word index and punctuation index, word lexical attribute and prediction results of classifiers are included in Writing Style. In the second step, multiple classifiers are employed to identify different attributes of fact information in resumes. Experimental results on a real-world dataset show that the algorithm is feasible and effective.

AB - With the rapid growth of Internet-based recruiting, there are a great number of personal resumes among recruiting systems. To gain more attention from the recruiters, most resumes are written in diverse formats, including varying font size, font colour, and table cells. However, the diversity of format is harmful to data mining, such as resume information extraction, automatic job matching, and candidates ranking. Supervised methods and rule-based methods have been proposed to extract facts from resumes, but they strongly rely on hierarchical structure information and large amounts of labelled data, which are hard to collect in reality. In this paper, we propose a two-step resume information extraction approach. In the first step, raw text of resume is identified as different resume blocks. To achieve the goal, we design a novel feature, Writing Style, to model sentence syntax information. Besides word index and punctuation index, word lexical attribute and prediction results of classifiers are included in Writing Style. In the second step, multiple classifiers are employed to identify different attributes of fact information in resumes. Experimental results on a real-world dataset show that the algorithm is feasible and effective.

UR - http://www.scopus.com/inward/record.url?scp=85047631459&partnerID=8YFLogxK

U2 - 10.1155/2018/5761287

DO - 10.1155/2018/5761287

M3 - Article

AN - SCOPUS:85047631459

SN - 1024-123X

VL - 2018

JO - Mathematical Problems in Engineering

JF - Mathematical Problems in Engineering

M1 - 5761287

ER -

A Two-Step Resume Information Extraction Algorithm

摘要

访问文件

其它文件与链接

指纹

引用此