On Aligning Tuples for Regression

  • Chenguang Fang
  • , Shaoxu Song
  • , Yinan Mei
  • , Ye Yuan
  • , Jianmin Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Regression models are learned over multiple variables, e.g., using engine torque and speed to predict its fuel consumption. In practice, the values of these variables are often collected separately, e.g., by different sensors in a vehicle, and need to be aligned first in a tuple before learning. Unfortunately, flowing to various issues like network delays, values generated at the same time could be recorded with different timestamps, making the alignment diffcult. According to our study in a vehicle manufacturer, engine torque, speed and fuel consumption values are mostly not recorded with the same timestamps. Aligning tuples by simply concatenating values of variables with equal timestamps leads to limited data for learning regression model. To deal with timestamp variations, existing time series matching techniques rely on the similarity of values and timestamps, which unfortunately are very likely to be absent among the variables in regression (no similarity between engine torque and speed values). In this sense, we propose to bridge tuple alignment and regression. Rather than similar values and timestamps, we align the values of different variables in a tuple that (i) are recorded in a short period, i.e., time constraint, and more importantly (ii) coincide well with the regression model, known as model constraint. Our theoretical and technical contributions include (1) formulating the problem of tuple alignment with time and model constraints, (2) proving NP-completeness of the problem, (3) devising an approximation algorithm with performance guarantee, and (4) proposing efficient pruning strategies for the algorithm. Experiments over real world datasets, including the aforesaid engine data collected by a vehicle manufacturer, demonstrate that our proposal outperforms the existing methods on alignment accuracy and improves regression precision.

Original languageEnglish
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages336-346
Number of pages11
ISBN (Electronic)9781450393850
DOIs
Publication statusPublished - 14 Aug 2022
Externally publishedYes
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States
Duration: 14 Aug 202218 Aug 2022

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUnited States
CityWashington
Period14/08/2218/08/22

Keywords

  • alignment
  • regression model
  • time series

Fingerprint

Dive into the research topics of 'On Aligning Tuples for Regression'. Together they form a unique fingerprint.

Cite this