Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks

Yuyu Luo; Nan Tang; Guoliang Li; Chengliang Chai; Wenbo Li; Xuedi Qin

doi:10.1145/3448016.3457261

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks

Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, Xuedi Qin

科研成果: 期刊稿件 › 会议文章 › 同行评审

57 引用（Scopus）

摘要

Natural language (NL) is a promising interaction paradigm for data visualization (VIS). However, there are not any NL to VIS (NL2VIS) benchmarks available. Our goal is to provide the first NL2VIS benchmark to enable and push the field of NL2VIS, especially with deep learning technologies. In this paper, we propose a NL2VIS synthesizer (NL2SQL-to-NL2VIS) that synthesizes NL2VIS benchmarks by piggybacking NL2SQL benchmarks. The intuition is based on the semantic connection between SQL queries and VIS queries: SQL queries specify what data is needed and VIS queries additionally need to specify how to visualize. However, different from SQL that has well-defined syntax, VIS languages (e.g., Vega-Lite, VizQL, ggplot2) are syntactically very different. To provide NL2VIS benchmarks that can support many VIS languages, we use a unified intermediate representation, abstract syntax trees (ASTs), for both SQL and VIS queries. We can synthesize multiple VIS trees through adding/deleting nodes to/from an SQL tree. Each VIS tree can then be converted to (any) VIS language. The NL for VIS will be modified based on the NL for SQL to reflect corresponding tree edits. We produce the first NL2VIS benchmark (nvBench), by applying NL2SQL-to-NL2VIS on a popular NL2SQL benchmark Spider, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. Our method reduces the man-hour to 5.7% of developing a NL2VIS benchmark from scratch (or building a NL2VIS benchmark from scratch takes 17.5× man-hours of our method). Extensive human validation, through 23 experts and 312 crowd workers, demonstrates the high-quality of nvBench. In order to verify that nvBench can enable learning-based approaches, we develop a SEQ2VIS model. Our experimental results show that SEQ2VIS works well and significantly outperforms the state-of-the-art methods of the NL2VIS task.

源语言	英语
页（从-至）	1235-1247
页数	13
期刊	Proceedings of the ACM SIGMOD International Conference on Management of Data
DOI	https://doi.org/10.1145/3448016.3457261
出版状态	已出版 - 2021
已对外发布	是
活动	2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, 中国期限: 20 6月 2021 → 25 6月 2021

访问文件

10.1145/3448016.3457261

其它文件与链接

链接到 Scopus 的出版物

引用此

Luo, Y., Tang, N., Li, G., Chai, C., Li, W., & Qin, X. (2021). Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1235-1247. https://doi.org/10.1145/3448016.3457261

@article{c78de5bdbb1f4a62b27364a6ddc70a85,

title = "Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks",

abstract = "Natural language (NL) is a promising interaction paradigm for data visualization (VIS). However, there are not any NL to VIS (NL2VIS) benchmarks available. Our goal is to provide the first NL2VIS benchmark to enable and push the field of NL2VIS, especially with deep learning technologies. In this paper, we propose a NL2VIS synthesizer (NL2SQL-to-NL2VIS) that synthesizes NL2VIS benchmarks by piggybacking NL2SQL benchmarks. The intuition is based on the semantic connection between SQL queries and VIS queries: SQL queries specify what data is needed and VIS queries additionally need to specify how to visualize. However, different from SQL that has well-defined syntax, VIS languages (e.g., Vega-Lite, VizQL, ggplot2) are syntactically very different. To provide NL2VIS benchmarks that can support many VIS languages, we use a unified intermediate representation, abstract syntax trees (ASTs), for both SQL and VIS queries. We can synthesize multiple VIS trees through adding/deleting nodes to/from an SQL tree. Each VIS tree can then be converted to (any) VIS language. The NL for VIS will be modified based on the NL for SQL to reflect corresponding tree edits. We produce the first NL2VIS benchmark (nvBench), by applying NL2SQL-to-NL2VIS on a popular NL2SQL benchmark Spider, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. Our method reduces the man-hour to 5.7% of developing a NL2VIS benchmark from scratch (or building a NL2VIS benchmark from scratch takes 17.5× man-hours of our method). Extensive human validation, through 23 experts and 312 crowd workers, demonstrates the high-quality of nvBench. In order to verify that nvBench can enable learning-based approaches, we develop a SEQ2VIS model. Our experimental results show that SEQ2VIS works well and significantly outperforms the state-of-the-art methods of the NL2VIS task.",

keywords = "benchmark, natural language interface, natural language to visualization, visualization",

author = "Yuyu Luo and Nan Tang and Guoliang Li and Chengliang Chai and Wenbo Li and Xuedi Qin",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 2021 International Conference on Management of Data, SIGMOD 2021 ; Conference date: 20-06-2021 Through 25-06-2021",

year = "2021",

doi = "10.1145/3448016.3457261",

language = "English",

pages = "1235--1247",

journal = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

issn = "0730-8078",

}

TY - JOUR

T1 - Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks

AU - Luo, Yuyu

AU - Tang, Nan

AU - Li, Guoliang

AU - Chai, Chengliang

AU - Li, Wenbo

AU - Qin, Xuedi

PY - 2021

Y1 - 2021

N2 - Natural language (NL) is a promising interaction paradigm for data visualization (VIS). However, there are not any NL to VIS (NL2VIS) benchmarks available. Our goal is to provide the first NL2VIS benchmark to enable and push the field of NL2VIS, especially with deep learning technologies. In this paper, we propose a NL2VIS synthesizer (NL2SQL-to-NL2VIS) that synthesizes NL2VIS benchmarks by piggybacking NL2SQL benchmarks. The intuition is based on the semantic connection between SQL queries and VIS queries: SQL queries specify what data is needed and VIS queries additionally need to specify how to visualize. However, different from SQL that has well-defined syntax, VIS languages (e.g., Vega-Lite, VizQL, ggplot2) are syntactically very different. To provide NL2VIS benchmarks that can support many VIS languages, we use a unified intermediate representation, abstract syntax trees (ASTs), for both SQL and VIS queries. We can synthesize multiple VIS trees through adding/deleting nodes to/from an SQL tree. Each VIS tree can then be converted to (any) VIS language. The NL for VIS will be modified based on the NL for SQL to reflect corresponding tree edits. We produce the first NL2VIS benchmark (nvBench), by applying NL2SQL-to-NL2VIS on a popular NL2SQL benchmark Spider, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. Our method reduces the man-hour to 5.7% of developing a NL2VIS benchmark from scratch (or building a NL2VIS benchmark from scratch takes 17.5× man-hours of our method). Extensive human validation, through 23 experts and 312 crowd workers, demonstrates the high-quality of nvBench. In order to verify that nvBench can enable learning-based approaches, we develop a SEQ2VIS model. Our experimental results show that SEQ2VIS works well and significantly outperforms the state-of-the-art methods of the NL2VIS task.

AB - Natural language (NL) is a promising interaction paradigm for data visualization (VIS). However, there are not any NL to VIS (NL2VIS) benchmarks available. Our goal is to provide the first NL2VIS benchmark to enable and push the field of NL2VIS, especially with deep learning technologies. In this paper, we propose a NL2VIS synthesizer (NL2SQL-to-NL2VIS) that synthesizes NL2VIS benchmarks by piggybacking NL2SQL benchmarks. The intuition is based on the semantic connection between SQL queries and VIS queries: SQL queries specify what data is needed and VIS queries additionally need to specify how to visualize. However, different from SQL that has well-defined syntax, VIS languages (e.g., Vega-Lite, VizQL, ggplot2) are syntactically very different. To provide NL2VIS benchmarks that can support many VIS languages, we use a unified intermediate representation, abstract syntax trees (ASTs), for both SQL and VIS queries. We can synthesize multiple VIS trees through adding/deleting nodes to/from an SQL tree. Each VIS tree can then be converted to (any) VIS language. The NL for VIS will be modified based on the NL for SQL to reflect corresponding tree edits. We produce the first NL2VIS benchmark (nvBench), by applying NL2SQL-to-NL2VIS on a popular NL2SQL benchmark Spider, which covers 105 domains, supports seven common types of visualizations, and contains 25,750 (NL, VIS) pairs. Our method reduces the man-hour to 5.7% of developing a NL2VIS benchmark from scratch (or building a NL2VIS benchmark from scratch takes 17.5× man-hours of our method). Extensive human validation, through 23 experts and 312 crowd workers, demonstrates the high-quality of nvBench. In order to verify that nvBench can enable learning-based approaches, we develop a SEQ2VIS model. Our experimental results show that SEQ2VIS works well and significantly outperforms the state-of-the-art methods of the NL2VIS task.

KW - benchmark

KW - natural language interface

KW - natural language to visualization

KW - visualization

UR - http://www.scopus.com/inward/record.url?scp=85108968403&partnerID=8YFLogxK

U2 - 10.1145/3448016.3457261

DO - 10.1145/3448016.3457261

M3 - Conference article

AN - SCOPUS:85108968403

SN - 0730-8078

SP - 1235

EP - 1247

JO - Proceedings of the ACM SIGMOD International Conference on Management of Data

JF - Proceedings of the ACM SIGMOD International Conference on Management of Data

T2 - 2021 International Conference on Management of Data, SIGMOD 2021

Y2 - 20 June 2021 through 25 June 2021

ER -

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks

摘要

访问文件

其它文件与链接

指纹

引用此