Abstract
In the recent research of network sampling, some sampling concepts are misunderstood, and the variance of subnets is not taken into account. We propose the correct definition of the sample and sampling rate in network sampling, as well as the formula for calculating the variance of subnets. Then, three commonly used sampling strategies are applied to databases of the connecting nearest-neighbor (CNN) model, random network and small-world network to explore the variance in network sampling. As proved by the results, snowball sampling obtains the most variance of subnets, but does well in capturing the network structure. The variance of networks sampled by the hub and random strategy are much smaller. The hub strategy performs well in reflecting the property of the whole network, while random sampling obtains more accurate results in evaluating clustering coefficient.
Original language | English |
---|---|
Article number | 7004654 |
Pages (from-to) | 1098-1106 |
Number of pages | 9 |
Journal | Journal of Systems Engineering and Electronics |
Volume | 25 |
Issue number | 6 |
DOIs | |
Publication status | Published - 1 Dec 2014 |
Keywords
- complex network
- sample
- sampling
- variance of subnets