Taste: Towards Practical Deep Learning-based Approaches for Semantic Type Detection in the Cloud

Tao Li*, Feng Liang, Jinqi Quan, Chuang Huang, Teng Wang, Runhuai Huang, Jie Wu, Xiping Hu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, we have witnessed more and more data management, preparation, and wrangling services appearing in the cloud. Semantic type detection is important for these services that rely on semantic types to interpret data and provide useful functions accordingly. Meanwhile, deep learning (DL) has been introduced for semantic type detection and transforming the field. However, existing DL-based approaches, albeit successfully achieving high F1 scores, are not practical in the real cloud environment because they suffer from issues like low efficiency and high intrusiveness to user data sources. To address these issues, we present Taste, a novel semantic type detection framework with two phases, each associated with a DL-driven detection task. The intuition behind this framework is that metadata (e.g., column name, statistics) contain rich technical and business information, which can be leveraged to detect semantic types effectively while only incurring lightweight impact on user data sources. Thus, we design the detection task in the first phase purely using native metadata from user data sources as input. In contrast, the second phase is optional and only activated when there is a high uncertainty with the first task’s result. It then needs to retrieve both metadata and column content to derive semantic types more reliably. Furthermore, we adopt multi-task learning and develop a novel DL model, called Asymmetric Double-Tower Detection (ADTD), to support the two tasks simultaneously. This design enables caching and reuse of the latent representations from the first task to reduce inference time. In the implementation, we further introduce a pipelined execution mechanism to boost performance for massive user table processing. Evaluation results show that Taste achieves state-of-the-art performance in execution time and F1 score, and is more robust under different data privacy settings, demonstrating its potential for real application in cloud environment.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT
PublisherOpenProceedings.org
Pages324-326
Number of pages3
Edition2
ISBN (Electronic)9783893180981, 9783893180998
DOIs
Publication statusPublished - 11 Nov 2024
Externally publishedYes
Event28th International Conference on Extending Database Technology, EDBT 2025 - Barcelona, Spain
Duration: 25 Mar 202528 Mar 2025

Publication series

NameAdvances in Database Technology - EDBT
Number2
Volume28
ISSN (Electronic)2367-2005

Conference

Conference28th International Conference on Extending Database Technology, EDBT 2025
Country/TerritorySpain
CityBarcelona
Period25/03/2528/03/25

Fingerprint

Dive into the research topics of 'Taste: Towards Practical Deep Learning-based Approaches for Semantic Type Detection in the Cloud'. Together they form a unique fingerprint.

Cite this