TY - GEN
T1 - Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense
AU - Zhong, Xiaoshi
AU - Wang, Muyin
AU - Zhang, Hongkun
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions. Source code can be found at https://github.com/xszhong/LSavg.
AB - Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions. Source code can be found at https://github.com/xszhong/LSavg.
KW - Power-law distributions
KW - average strategy
KW - least-squares estimation (LSE)
KW - long-tailed noises
UR - http://www.scopus.com/inward/record.url?scp=85129797410&partnerID=8YFLogxK
U2 - 10.1145/3485447.3511995
DO - 10.1145/3485447.3511995
M3 - Conference contribution
AN - SCOPUS:85129797410
T3 - WWW 2022 - Proceedings of the ACM Web Conference 2022
SP - 2748
EP - 2758
BT - WWW 2022 - Proceedings of the ACM Web Conference 2022
PB - Association for Computing Machinery, Inc
T2 - 31st ACM World Wide Web Conference, WWW 2022
Y2 - 25 April 2022 through 29 April 2022
ER -