TY - GEN
T1 - Taste
T2 - 28th International Conference on Extending Database Technology, EDBT 2025
AU - Li, Tao
AU - Liang, Feng
AU - Quan, Jinqi
AU - Huang, Chuang
AU - Wang, Teng
AU - Huang, Runhuai
AU - Wu, Jie
AU - Hu, Xiping
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2024/11/11
Y1 - 2024/11/11
N2 - In recent years, we have witnessed more and more data management, preparation, and wrangling services appearing in the cloud. Semantic type detection is important for these services that rely on semantic types to interpret data and provide useful functions accordingly. Meanwhile, deep learning (DL) has been introduced for semantic type detection and transforming the field. However, existing DL-based approaches, albeit successfully achieving high F1 scores, are not practical in the real cloud environment because they suffer from issues like low efficiency and high intrusiveness to user data sources. To address these issues, we present Taste, a novel semantic type detection framework with two phases, each associated with a DL-driven detection task. The intuition behind this framework is that metadata (e.g., column name, statistics) contain rich technical and business information, which can be leveraged to detect semantic types effectively while only incurring lightweight impact on user data sources. Thus, we design the detection task in the first phase purely using native metadata from user data sources as input. In contrast, the second phase is optional and only activated when there is a high uncertainty with the first task’s result. It then needs to retrieve both metadata and column content to derive semantic types more reliably. Furthermore, we adopt multi-task learning and develop a novel DL model, called Asymmetric Double-Tower Detection (ADTD), to support the two tasks simultaneously. This design enables caching and reuse of the latent representations from the first task to reduce inference time. In the implementation, we further introduce a pipelined execution mechanism to boost performance for massive user table processing. Evaluation results show that Taste achieves state-of-the-art performance in execution time and F1 score, and is more robust under different data privacy settings, demonstrating its potential for real application in cloud environment.
AB - In recent years, we have witnessed more and more data management, preparation, and wrangling services appearing in the cloud. Semantic type detection is important for these services that rely on semantic types to interpret data and provide useful functions accordingly. Meanwhile, deep learning (DL) has been introduced for semantic type detection and transforming the field. However, existing DL-based approaches, albeit successfully achieving high F1 scores, are not practical in the real cloud environment because they suffer from issues like low efficiency and high intrusiveness to user data sources. To address these issues, we present Taste, a novel semantic type detection framework with two phases, each associated with a DL-driven detection task. The intuition behind this framework is that metadata (e.g., column name, statistics) contain rich technical and business information, which can be leveraged to detect semantic types effectively while only incurring lightweight impact on user data sources. Thus, we design the detection task in the first phase purely using native metadata from user data sources as input. In contrast, the second phase is optional and only activated when there is a high uncertainty with the first task’s result. It then needs to retrieve both metadata and column content to derive semantic types more reliably. Furthermore, we adopt multi-task learning and develop a novel DL model, called Asymmetric Double-Tower Detection (ADTD), to support the two tasks simultaneously. This design enables caching and reuse of the latent representations from the first task to reduce inference time. In the implementation, we further introduce a pipelined execution mechanism to boost performance for massive user table processing. Evaluation results show that Taste achieves state-of-the-art performance in execution time and F1 score, and is more robust under different data privacy settings, demonstrating its potential for real application in cloud environment.
UR - http://www.scopus.com/inward/record.url?scp=105007857397&partnerID=8YFLogxK
U2 - 10.48786/edbt.2025.26
DO - 10.48786/edbt.2025.26
M3 - Conference contribution
AN - SCOPUS:105007857397
T3 - Advances in Database Technology - EDBT
SP - 324
EP - 326
BT - Advances in Database Technology - EDBT
PB - OpenProceedings.org
Y2 - 25 March 2025 through 28 March 2025
ER -