TY - GEN
T1 - On Aligning Tuples for Regression
AU - Fang, Chenguang
AU - Song, Shaoxu
AU - Mei, Yinan
AU - Yuan, Ye
AU - Wang, Jianmin
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/14
Y1 - 2022/8/14
N2 - Regression models are learned over multiple variables, e.g., using engine torque and speed to predict its fuel consumption. In practice, the values of these variables are often collected separately, e.g., by different sensors in a vehicle, and need to be aligned first in a tuple before learning. Unfortunately, flowing to various issues like network delays, values generated at the same time could be recorded with different timestamps, making the alignment diffcult. According to our study in a vehicle manufacturer, engine torque, speed and fuel consumption values are mostly not recorded with the same timestamps. Aligning tuples by simply concatenating values of variables with equal timestamps leads to limited data for learning regression model. To deal with timestamp variations, existing time series matching techniques rely on the similarity of values and timestamps, which unfortunately are very likely to be absent among the variables in regression (no similarity between engine torque and speed values). In this sense, we propose to bridge tuple alignment and regression. Rather than similar values and timestamps, we align the values of different variables in a tuple that (i) are recorded in a short period, i.e., time constraint, and more importantly (ii) coincide well with the regression model, known as model constraint. Our theoretical and technical contributions include (1) formulating the problem of tuple alignment with time and model constraints, (2) proving NP-completeness of the problem, (3) devising an approximation algorithm with performance guarantee, and (4) proposing efficient pruning strategies for the algorithm. Experiments over real world datasets, including the aforesaid engine data collected by a vehicle manufacturer, demonstrate that our proposal outperforms the existing methods on alignment accuracy and improves regression precision.
AB - Regression models are learned over multiple variables, e.g., using engine torque and speed to predict its fuel consumption. In practice, the values of these variables are often collected separately, e.g., by different sensors in a vehicle, and need to be aligned first in a tuple before learning. Unfortunately, flowing to various issues like network delays, values generated at the same time could be recorded with different timestamps, making the alignment diffcult. According to our study in a vehicle manufacturer, engine torque, speed and fuel consumption values are mostly not recorded with the same timestamps. Aligning tuples by simply concatenating values of variables with equal timestamps leads to limited data for learning regression model. To deal with timestamp variations, existing time series matching techniques rely on the similarity of values and timestamps, which unfortunately are very likely to be absent among the variables in regression (no similarity between engine torque and speed values). In this sense, we propose to bridge tuple alignment and regression. Rather than similar values and timestamps, we align the values of different variables in a tuple that (i) are recorded in a short period, i.e., time constraint, and more importantly (ii) coincide well with the regression model, known as model constraint. Our theoretical and technical contributions include (1) formulating the problem of tuple alignment with time and model constraints, (2) proving NP-completeness of the problem, (3) devising an approximation algorithm with performance guarantee, and (4) proposing efficient pruning strategies for the algorithm. Experiments over real world datasets, including the aforesaid engine data collected by a vehicle manufacturer, demonstrate that our proposal outperforms the existing methods on alignment accuracy and improves regression precision.
KW - alignment
KW - regression model
KW - time series
UR - https://www.scopus.com/pages/publications/85137148756
U2 - 10.1145/3534678.3539373
DO - 10.1145/3534678.3539373
M3 - Conference contribution
AN - SCOPUS:85137148756
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 336
EP - 346
BT - KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Y2 - 14 August 2022 through 18 August 2022
ER -