Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense

Xiaoshi Zhong*, Muyin Wang, Hongkun Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions. Source code can be found at https://github.com/xszhong/LSavg.

Original languageEnglish
Title of host publicationWWW 2022 - Proceedings of the ACM Web Conference 2022
PublisherAssociation for Computing Machinery, Inc
Pages2748-2758
Number of pages11
ISBN (Electronic)9781450390965
DOIs
Publication statusPublished - 25 Apr 2022
Event31st ACM Web Conference, WWW 2022 - Virtual, Lyon, France
Duration: 25 Apr 202229 Apr 2022

Publication series

NameWWW 2022 - Proceedings of the ACM Web Conference 2022

Conference

Conference31st ACM Web Conference, WWW 2022
Country/TerritoryFrance
CityVirtual, Lyon
Period25/04/2229/04/22

Keywords

  • Power-law distributions
  • average strategy
  • least-squares estimation (LSE)
  • long-tailed noises

Fingerprint

Dive into the research topics of 'Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense'. Together they form a unique fingerprint.

Cite this