Highly Parallel SPARQL Engine for RDF

Fan Feng; Weikang Zhou; Ding Zhang; Jinhui Pang

doi:10.1007/978-981-15-7981-3_5

Highly Parallel SPARQL Engine for RDF

Fan Feng, Weikang Zhou, Ding Zhang, Jinhui Pang^*

^*Corresponding author for this work

School of Computer Science and Technology

Tianjin University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

In this paper, a highly parallel batch processing engine is designed for SPARQL queries. Machine learning algorithms were applied to make time predictions of queries and reasonably group them, and further make reasonable estimates of the memory footprint of the queries to arrange the order of each group of queries. Finally, the query is processed in parallel by introducing pthreads. Based on the above three points, a spall time prediction algorithm was proposed, including data processing, to better deal with batch SPARQL queries, and the introduction of pthread can make our query processing faster. Since data processing was added to query time prediction, the method can be implemented in any set of data-queries. Experiments show that the engine can optimize time and maximize the use of memory when processing batch SPARQL queries.

Original language	English
Title of host publication	Data Science - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Proceedings
Editors	Jianchao Zeng, Weipeng Jing, Xianhua Song, Zeguang Lu
Publisher	Springer
Pages	61-71
Number of pages	11
ISBN (Print)	9789811579806
DOIs	https://doi.org/10.1007/978-981-15-7981-3_5
Publication status	Published - 2020
Event	6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020 - Taiyuan, China Duration: 18 Sept 2020 → 21 Sept 2020

Publication series

Name	Communications in Computer and Information Science
Volume	1257 CCIS
ISSN (Print)	1865-0929
ISSN (Electronic)	1865-0937

Conference

Conference	6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020
Country/Territory	China
City	Taiyuan
Period	18/09/20 → 21/09/20

Keywords

Multithreading
Performance prediction
Pthread
SPARQL

Access to Document

10.1007/978-981-15-7981-3_5

Cite this

Feng, F., Zhou, W., Zhang, D., & Pang, J. (2020). Highly Parallel SPARQL Engine for RDF. In J. Zeng, W. Jing, X. Song, & Z. Lu (Eds.), Data Science - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Proceedings (pp. 61-71). (Communications in Computer and Information Science; Vol. 1257 CCIS). Springer. https://doi.org/10.1007/978-981-15-7981-3_5

@inproceedings{7b886e82401241d79c0a347c0d6ef12e,

title = "Highly Parallel SPARQL Engine for RDF",

abstract = "In this paper, a highly parallel batch processing engine is designed for SPARQL queries. Machine learning algorithms were applied to make time predictions of queries and reasonably group them, and further make reasonable estimates of the memory footprint of the queries to arrange the order of each group of queries. Finally, the query is processed in parallel by introducing pthreads. Based on the above three points, a spall time prediction algorithm was proposed, including data processing, to better deal with batch SPARQL queries, and the introduction of pthread can make our query processing faster. Since data processing was added to query time prediction, the method can be implemented in any set of data-queries. Experiments show that the engine can optimize time and maximize the use of memory when processing batch SPARQL queries.",

keywords = "Multithreading, Performance prediction, Pthread, SPARQL",

author = "Fan Feng and Weikang Zhou and Ding Zhang and Jinhui Pang",

note = "Publisher Copyright: {\textcopyright} 2020, The Author(s).; 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020 ; Conference date: 18-09-2020 Through 21-09-2020",

year = "2020",

doi = "10.1007/978-981-15-7981-3_5",

language = "English",

isbn = "9789811579806",

series = "Communications in Computer and Information Science",

publisher = "Springer",

pages = "61--71",

editor = "Jianchao Zeng and Weipeng Jing and Xianhua Song and Zeguang Lu",

booktitle = "Data Science - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Proceedings",

address = "Germany",

}

Feng, F, Zhou, W, Zhang, D & Pang, J 2020, Highly Parallel SPARQL Engine for RDF. in J Zeng, W Jing, X Song & Z Lu (eds), Data Science - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Proceedings. Communications in Computer and Information Science, vol. 1257 CCIS, Springer, pp. 61-71, 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Taiyuan, China, 18/09/20. https://doi.org/10.1007/978-981-15-7981-3_5

Highly Parallel SPARQL Engine for RDF. / Feng, Fan; Zhou, Weikang; Zhang, Ding et al.
Data Science - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Proceedings. ed. / Jianchao Zeng; Weipeng Jing; Xianhua Song; Zeguang Lu. Springer, 2020. p. 61-71 (Communications in Computer and Information Science; Vol. 1257 CCIS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Highly Parallel SPARQL Engine for RDF

AU - Feng, Fan

AU - Zhou, Weikang

AU - Zhang, Ding

AU - Pang, Jinhui

PY - 2020

Y1 - 2020

N2 - In this paper, a highly parallel batch processing engine is designed for SPARQL queries. Machine learning algorithms were applied to make time predictions of queries and reasonably group them, and further make reasonable estimates of the memory footprint of the queries to arrange the order of each group of queries. Finally, the query is processed in parallel by introducing pthreads. Based on the above three points, a spall time prediction algorithm was proposed, including data processing, to better deal with batch SPARQL queries, and the introduction of pthread can make our query processing faster. Since data processing was added to query time prediction, the method can be implemented in any set of data-queries. Experiments show that the engine can optimize time and maximize the use of memory when processing batch SPARQL queries.

AB - In this paper, a highly parallel batch processing engine is designed for SPARQL queries. Machine learning algorithms were applied to make time predictions of queries and reasonably group them, and further make reasonable estimates of the memory footprint of the queries to arrange the order of each group of queries. Finally, the query is processed in parallel by introducing pthreads. Based on the above three points, a spall time prediction algorithm was proposed, including data processing, to better deal with batch SPARQL queries, and the introduction of pthread can make our query processing faster. Since data processing was added to query time prediction, the method can be implemented in any set of data-queries. Experiments show that the engine can optimize time and maximize the use of memory when processing batch SPARQL queries.

KW - Multithreading

KW - Performance prediction

KW - Pthread

KW - SPARQL

UR - http://www.scopus.com/inward/record.url?scp=85090041965&partnerID=8YFLogxK

U2 - 10.1007/978-981-15-7981-3_5

DO - 10.1007/978-981-15-7981-3_5

M3 - Conference contribution

AN - SCOPUS:85090041965

SN - 9789811579806

T3 - Communications in Computer and Information Science

SP - 61

EP - 71

BT - Data Science - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020, Proceedings

A2 - Zeng, Jianchao

A2 - Jing, Weipeng

A2 - Song, Xianhua

A2 - Lu, Zeguang

PB - Springer

T2 - 6th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2020

Y2 - 18 September 2020 through 21 September 2020

ER -

Highly Parallel SPARQL Engine for RDF

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this