Measuring Degrees of Separation

Background Information: The degree of separation in a network is equal to the average length of the shortest path between pairs of nodes. Extracting the degrees of separation for a large network is computationally demanding. The computation involves averaging the degrees of separation of each individual node in the network. Thus, the time required to compute the degrees of separation depends number of nodes. In fact, the computational demands grow quadratically as the number of nodes increase. In order to circumvent the demanding calculation, it is possible to estimate the degree of separation by first selecting a small sample of nodes, and then averaging the degrees of separation for this limited set. Therefore, by sacrificing some accuracy, we a able to save a lot of computation time.

Experiment: I investigated exactly how much accuracy is lost by the estimation method described above.

Method: I created a virtual network of 1000 people using the simulation described in my first post. I ran the simulation until the network’s giant component fully developed (to see how this was done, see my second post). Then I calculated the degree of separation for various sample sizes. I also calculated the exact degree of separation by using the entire population as the sample. The result is summarized on the plot below:

DS_vs_prop

The actual value of the degrees of separation, as shown by the blue line,was 9.66. The blue dots are estimates made by various sample sizes. As expected, as I increased the sample size, the sample points converged to the real value. The space between the green lines represent an area of 95% accuracy. As seen on the graph, even when we limited our sample to as little as 0.02% of the population (20 people), we obtained a value that is 95% accurate. The sample size required for accuracy levels of 99% and 99.5% are also shown on the graph.

Conclusion: Using samples as small as 0.02% of the population, I was able to estimate the degrees of freedom with an accuracy of 95% . Therefore, I support the practice of taking a small sample of nodes into account when computing the degrees of freedom in a large network.

My simulation source code can be found here (Visual C++ 2005).

Posted in Topics: Education, Mathematics, Science, Social Studies, Technology

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • connotea
  • Technorati
  • YahooMyWeb
Jump down to leave a comment.

Leave a Comment

You must be logged in to post a comment.



* You can follow any responses to this entry through the RSS 2.0 feed.