By Roozbeh Nia, Christian Bird, Premkumar Devanbu, and Vladimir Filkov

Published in Proceedings of the Seventh Working Conference on Mining Software Repositories

Social network methods are frequently used to analyze networks derived
from Open Source Project communication and collaboration data. Such
studies typically discover patterns in the information flow between
contributors or contributions in these projects. Social network metrics
have also been used to predict defect occurrence. However, such
studies often ignore or side-step the issue of whether (and in what
way) the metrics and networks of study are influenced by
inadequate or missing data.

In previous studies email archives of OSS projects have provided a useful
trace of the communication and co-ordination activities of the
participants. These traces have been used to construct social networks
that are then subject to various types of analysis. However, during the
construction of these networks, some assumptions are made, that may not
always hold; this leads to incomplete, and sometimes incorrect networks.
THe question then becomes, do these errors affect the validity of the
ensuing analysis? In this paper we specifically examine the stability of
network metrics in the presence of inadequate and missing data. The issues
that we study are: 1) the effect of paths with broken information flow
(i.e. consecutive edges which are out of temporal order) on measures of
centrality of nodes in the network, and 2) the effect of missing links on
such measures. We demonstrate on three different OSS projects that while
these issues do change network topology, the metrics used in the analysis
are stable with respect to such changes.


  author = {Roozbeh Nia and Christian Bird and Premkumar Devanbu and Vladimir
  title = {Validity of Network Analyses in Open Source Projects},
  booktitle = {Proceedings of the Seventh Working Conference on Mining Software
  year = {2010},
  publisher = {IEEE Computer Society},
  location = {Cape Town, South Africa}

Validity of Network Analyses in Open Source Projects (MSR 10)