# Christian Bird

Background

I am currently a Researcher at Microsoft Research. I received my bachelor's degree in CS from Brigham Young University in 2003. I've worked at Caldera, Lineo, Embedix, Motorola, and Freescale. I've also dabbled in the OSS world, but I've had much more experience observing and mining OSS projects than participating in them. In June, 2010 I finished my Ph.D. in Computer Science studying empirical software engineering under advisor Prem Devanbu at UC Davis. I'm currently working on ways to use data to guide decisions of stakeholders in large software projects.

I've had the opportunity of working with many wonderful people and at some amazing places during the course of my research. I owe a debt of gratitude to those who have taught me and allowed me to work with them.

• Harald Gall, Abraham Bernstein, and Adrian Bachmann at the University of Zurich
• Nachiappan Nagappan, Brendan Murphy, Tom Zimmermann, and Andy Begel during two internships at Microsoft Research
• Clay Williams, Patrick Wagstrom, Peri Tarr, and Tim Klinger during an internship at IBM Research
• Peter Rigby and Daniel German at the University of Victoria
• Prem Devanbu, Zhendong Su, Vladimir Filkov, Raissa D'Souza, Earl Barr, Daryl Posnett, Eirik Aune, Patrick Duffy, Alex Gourley, Zach Saul, David Pattison, Foyzur Rahman, and Roozbeh Nia at the University of California, Davis
Curriculum vitae

Current Professional Activities

I am currently on the program committees for Please consider submitting!

Publications

Since my graduated career was largely funded on a grant from the NSF, the taxpayers of the United States are paying for the research I (and many others) did. As such, I think it is ridiculous that access to most of the papers produced by this research is restricted and requires payment (try downloading my paper from the ACM here as an example). I therefore list them below.

Dwelling in Software: aspects of the felt-life of engineers in large software projects
Richard Harper, Christian Bird, Thomas Zimmermann, and Brendan Murphy. In Proceedings of the European Conference on Computer-Supported Cooperative Work, 2013.

The organizational and social aspects of software engineering (SE) are now increasingly well investigated. This paper proposes that there are a number of approaches taken in research that can be distinguished not by their method or topic but by the different views they construct of the human agent acting in SE. These views have implications for the pragmatic outcome of the research, such as whether systems design suggestions are made, proposals for the development of practical reasoning tools or the effect of Social Network Systems on engineer’s sociability. This paper suggests that these studies tend to underemphasize the felt-life of engineers, a felt-life that is profoundly emotional though played in reference to ideas of moral propriety and ethics. This paper will present a study of this felt-life, suggesting it consists of a form of digital dwelling. The perspective this view affords are contrasted with process and ‘scientific’ approaches to the human agent in SE, and with the more humanistic studies of SE reasoning common in CSCW.

@inproceedings{harper2013ecscw,
author={Richard Harper and Christian Bird and Thomas Zimmermann and Brendan Murphy},
Title={{Dwelling in Software: aspects of the felt-life of engineers in large software projects}},
Booktitle = {Proceedings of the European Conference on Computer-Supported Cooperative Work},
Year=2013
}


Leveraging the Crowd: How 48,000 Users Helped Improve Lync Performance
Robert Musson, Jacqueline Richards, Danyel Fisher, Christian Bird, Brian Bussone, and Sandipan Ganguly. In IEEE Software, 2013.

Performance is a critical component of customer satisfaction with network based applications. Unfortunately, evaluating the performance of collaborative software that operates in extremely heterogeneous environments is difficult to do accurately using traditional techniques such as modeling workloads or testing in controlled environments. In an attempt to evaluate performance of an application “in the wild” during development, we deploy early versions of the software, collecting performance data from application users for key usage scenarios. Our analysis package produces a number of visualizations to help development teams to identify and prioritize performance issues. Our approach has helped teams focus on performance early in the development cycle and enabled them to evaluate their progress, identify defects, and estimate timelines. We present our approach, discuss its deployment and impact, and outline future improvements.

@article{musson2013performance,
author = {Robert Musson and Jacqueline Richards and Danyel Fisher and Christian Bird and Brian Bussone and Sandipan Ganguly},
title={{Leveraging the Crowd: How 48,000 Users Helped Improve Lync Performance}},
journal={{IEEE Software}},
publisher={IEEE Computer Society},
year={2013}
}


The Design of Bug Fixes
Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan. In Proceedings of the International Conference on Software Engineering, 2013.

When software engineers fix bugs, they may have several options as to how to fix those bugs. Which fix is chosen has many implications, both for practitioners and researchers: What is the risk of introducing other bugs during the fix? Is the bug fix in the same code that caused the bug? Is the change fixing the cause or just covering a symptom? In this paper, we investigate the issue of alternative fixes to bugs and present an empirical study of how engineers make design choices about how to fix bugs. Based on qualitative interviews with 40 engineers working on a variety of products, 6 bug triage meetings, and a survey filled out by 326 engineers, we found that there are a number of factors, many of them non-technical, that influence how bugs are fixed, such as how close to release the software is. We also discuss several implications for research and practice, including ways to make bug prediction and localization more accurate.

@inproceedings{murphyhill2013dbf,
Author={Emerson Murphy-Hill and Thomas Zimmermann and Christian Bird and Nachiappan Nagappan},
Title={The Design of Bug Fixes},
Booktitle={Proceedings of the 35th International Conference on Software Engineering},
Year={2013}


Expectations, Outcomes, and Challenges of Modern Code Review
Alberto Bacchelli and Christian Bird. In Proceedings of the International Conference on Software Engineering, 2013.

Code review is a common software engineering practice employed both in open source and industrial contexts. Review today is less formal and more "lightweight" than the code inspections performed and studied in the 70s and 80s. We empirically explore the motivations, challenges, and outcomes of tool-based code reviews. We observed, interviewed, and surveyed developers and managers and manually classified hundreds of review comments across diverse teams at Microsoft. Our study reveals that while finding defects remains the main motivation for review, reviews are less about defects than expected and instead provide additional benefits such as knowledge transfer, increased team awareness, and creation of alternative solutions to problems. Moreover, we find that code and change understanding is the key aspect of code reviewing and that developers employ a wide range of mechanisms to meet their understanding needs, most of which are not met by current tools. We provide recommendations for practitioners and researchers.

@inproceedings{bacchelli2013eoc,
Author={Alberto Bacchelli and Christian Bird},
Title={Expectations, Outcomes, and Challenges of Modern Code Review},
Booktitle={Proceedings of the 35th International Conference on Software Engineering},
Year={2013}
}

Improving Developer Participation Rates in Surveys
Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and Thomas Zimmermann. In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering, 2013.

Doing high quality research about the human side of software engineering necessitates the participation of real software developers in studies, but getting high levels of participation is a challenge for software engineering researchers. In this paper, we discuss several factors that software engineering researchers can use when recruiting participants, drawn from a combination of general research on survey design, research on persuasion, and our experience in conducting surveys. We study these factors by performing post-hoc analysis on several previously conducted surveys. Our results provide insight into the factors associated with increased response rates, which are neither wholly composed of factors associated strictly with persuasion research, nor those of conventional wisdom in software engineering.

@inproceedings{smith2013chase,
Author={Edward Smith and Robert Loftin and Emerson Murphy-Hill and Christian Bird and Thomas Zimmermann},
Title={{Improving Developer Participation Rates in Surveys}},
Booktitle={Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering},
Year={2013},
Publisher = {IEEE}
}

What Effect does Distributed Version Control have on OSS Project Organization
Peter Rigby, Earl Barr, Christian Bird, Premkumar Devanbu, and Daniel German. In Proceedings of the International Workshop on Release Engineering, 2013.

Many Open Source Software (OSS) projects are moving form Centralized Version Control (CVC) to Distributed Version Control (DVC). The effect of this shift on project organization and developer collaboration is not well understood. In this paper, we use a theoretical argument to evaluate the appropriateness of using DVC in the context of two very common organization forms in OSS: a dictatorship and a peer group. We find that DVC facilitates large hierarchical communities as well as smaller groups of developers, while CVC allows for consensus-building by a peer group. We also find that the flexibility of DVC systems allows for diverse styles of developer collaboration. With CVC, changes flow up and down (and publicly) via a central repository. In contrast, DVC facilitates collaboration in which work output can flow sideways (and privately) between collaborators, with no repository being inherently more important or central. These sideways flows are a relatively new concept. Developers on the Linux project, who tend to be experienced DVC users, cluster around “sandboxes:” repositories where developers can work together on a particular topic, isolating their changes from other developers. In this work, we focus on two large, mature OSS projects to illustrate these findings. However, we suggest that social media sites like GitHub may engender other original styles of collaboration that deserve further study.

@inproceedings{rigby2013releng,
Author={Peter C. Rigby and Earl T. Barr and Christian Bird and Premkumar Devanbu and Daniel M. German},
Title={{What Effect does Distributed Version Control have on OSS Project Organization}},
Booktitle={Proceedings of the International Workshop on Release Engineering},
Year={2013},
Publisher = {IEEE}
}


Gerrit Software Code Review Data from Android
Murtuza Mukadam, Christian Bird, and Peter Rigby. In Proceedings of the International Working Conference on Mining Software Repositories (Data Track), 2013.

Over the past decade, a number of tools and systems have been developed to manage various aspects of the software development lifecycle. Until now, tool supported code review, an important aspect of software development, has been largely ignored. With the advent of open source code review tools such as Gerrit along with projects that use them, code review data is now available for collection, analysis, and triangulation with other software development data. In this paper, we extract Android peer review data from Gerrit. We describe the Android peer review process, the reverse engineering of the Gerrit JSON API, our data mining and cleaning methodology, database schema, and provide an example of how the data can be used to answer an empirical software engineering question. The database is available for use by the research community.

@inproceedings{mukadam2013gerrit,
Author={Murtuza Mukadam and Christian Bird and Peter C. Rigby},
Title = {Gerrit Software Code Review Data from Android},
Booktitle = {Proceedings of the International Working Conference on Mining Software Repositories (Data Track)},
Publisher = {IEEE},
Year={2013}
}


Collecting a Heap of Shapes
Earl Barr, Christian Bird, and Mark Marron. In Proceedings of the International Symposium on Software Testing and Analysis, 2013.

The program heap is fundamentally a simple mathematical concept — a set of objects and a connectivity relation on them. However, a large gap exists between the set of heap structures that could be constructed and those that programmers actually build. To understand this gap, we empirically study heap structures and sharing relations in large object-oriented programs. To scale and make sense of real world heaps, any analysis must employ abstraction; our abstraction groups sets of objects by role and the aliasing present in pointer sets. We find that the heaps of real-world programs are, in practice, fundamentally simple structures that are largely constructed from a small number of simple structures (85% simple aggregation on average) and sharing idioms, such as the sharing of immutable or unique (e.g. singleton) objects. This result provides actionable information for rethinking the design of annotation systems, memory allocation/collection, and program analyses.

@inproceedings{barr2013shapes,
title = {{Collecting a Heap of Shapes}},
author = {Earl T. Barr and Christian Bird and Mark Marron},
booktitle = {Proceedings of the International Symposium on Software Testing and Analysis},
year = {2013},
publisher = {ACM}
}

Adoption and Use of Java Generics
Chris Parnin, Christian Bird, and Emerson Murphy-Hill. In Journal of Empirical Software Engineering, , 2013.

Support for generic programming was added to the Java language in 2004, representing perhaps the most significant change to one of the most widely used programming languages today. Researchers and language designers anticipated this addition would relieve many long-standing problems plaguing developers, but surprisingly, no one has yet measured how generics have been adopted and used in practice. In this paper, we report on the first empirical investigation into how Java generics have been integrated into open source software by automatically mining the history of 40 popular open source Java programs, traversing more than 650 million lines of code in the process. We evaluate five hypotheses and research questions about how Java developers use generics. For example, our results suggest that generics sometimes reduce the number of type casts and that generics are usually adopted by a single champion in a project, rather than all committers. We also offer insights into why some features may be adopted sooner and others features may be held back.

@article{parnin2012auj,
Author = {Chris Parnin and Christian Bird and Emerson Murphy-Hill},
Title = {{Adoption and Use of Java Generics}},
Journal = {Empirical Software Engineering, An International Journal},
Publisher = {Springer-Verlag},
Year = {to appear}
}


Assessing the Value of Branches with What-If Analysis
Christian Bird and Thomas Zimmermann. In Proceedings of the 29th International Symposium on Foundations of Software Engineering, 2012.

Branches within source code management systems (SCMs) allow a software project to divide work among its teams for concurrent development by isolating changes. However, this benefit comes with several costs: increased time required for changes to move through the system and pain and error potential when integrating changes across branches. In this paper, we present the results of a survey to characterize how developers use branches in a large industrial project and common problems that they face. One of the major problems mentioned was the long delay that it takes changes to move from one team to another, which is often caused by having too many branches (branchmania). To monitor branch health, we introduce a novel what-if analysis to assess alternative branch structures with respect to two properties, isolation and liveness. We demonstrate with several scenarios how our what-if analysis can support branch decisions. By removing high-cost-low-benefit branches in Windows based on our what-if analysis, changes would each have saved 8.9 days of delay and only introduced 0.04 additional conflicts on average.

@inproceedings{bird2012avb,
Author = {Christian Bird and Thomas Zimmermann},
Title = {Assessing the Value of Branches with What-if Analysis},
Booktitle = {Proceedings of the 20th International Symposium on Foundations of Software Engineering},
Year = {2012},
Publisher = {ACM}
}

Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?
Abram Hindle, Christian Bird, Thomas Zimmermann, and Nachiappan Nagappan. In Proceedings of the 28th IEEE International Conference on Software Maintenance, 2012.

Large organizations like Microsoft tend to rely on formal requirements documentation in order to specify and design the software products that they develop. These documents are meant to be tightly coupled with the actual implementation of the features they describe. In this paper we evaluate the value of high-level topic-based requirements traceability in the version control system, using Latent Dirichlet Allocation (LDA). We evaluate LDA topics on practitioners and check if the information extracted matches the perception that Program Managers and Developers have about the effort put into addressing certain topics. We found that effort extracted from version control that was relevant to a topic often matched the perception of the managers and developers of what occurred at the time. Furthermore we found evidence that many of the identified topics made sense to practitioners and matched their perception of what occurred. But for some topics, we found that practitioners had difficulty interpreting and labelling them. In summary, we investigate the high-level traceability of requirements topics to version control commits via topic analysis and validate with the actual stakeholders the relevance of these topics extracted from requirements.

@inproceedings{hindle2012rri,
Author = {Abram Hindle and Christian Bird and Thomas Zimmermann and Nachiappan Nagappan},
Title = {Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?},
Booktitle = {Proceedings of the 28th IEEE International Conference on Software Maintenance},
Year = {2012},
Publisher = {IEEE}
}

The Effect of Branching Strategies on Software Quality
Emad Shihab, Christian Bird, and Thomas Zimmermann. In Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2012.

Branching plays a major role in the development process of large software. Branches provide isolation so that multiple pieces of the software system can be modified in parallel without affecting each other during times of instability. However, branching has its own issues. The need to move code across branches introduces additional overhead and branch use can lead to integration failures due to conflicts or unseen dependencies. Although branches are used extensively in commercial and open source development projects, the effects that different branch strategies have on software quality are not yet well understood. In this paper, we present the first empirical study that evaluates and quantifies the relationship between software quality and various aspects of the branch structure used in a software project. We examine Windows Vista and Windows 7 and compare components that have different branch characteristics to quantify differences in quality. We also examine the effectiveness of two branching strategies – branching according to the software architecture versus branching according to organizational structure. We find that, indeed, branching does have an effect on software quality and that misalignment of branching structure and organiza-tional structure is associated with higher post-release failure rates.

@inproceedings{shihab2012ebs,
Author = {Emad Shihab and Christian Bird and Thomas Zimmermann},
Title = {The Effect of Branching Strategies on Software Quality},
Booktitle = {Proceedings of the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement},
Year = {2012},
Publisher = {ACM/IEEE}
}


Who? Where? What? Examining Distributed Development in Two Large Open Source Projects
Christian Bird and Nachiappan Nagappan. In International Working Conference on Mining Software Repositories, Zurich, Switzerland, 2012.

To date, a large body of knowledge has been built up around understanding open source software development. However, there is limited research on examining levels of geographic and organizational distribution within open source software projects, despite many studies examining these same aspects in commercial contexts. We set out to fill this gap in OSS knowledge by manually collecting data for two large, mature, successful projects in an effort to assess how distributed both geographically and organizationally. Both Firefox and Eclipse have been the subject of many studies and are ubiquitous in the areas of software development and internet usage respectively. Further, both receive substantial development contributions from many companies. As such, both are worthy of study in order to understand the development processes that they use, how distributed the projects are, and what, if any, relationship distribution has with quality. To this end, we identified the top contributors that made 95% of the changes over multiple major releases of Firfox and Eclipse and determined their geographic locations and organizational affiliations. We found that Firefox is very geographically distributed with over a third of its components receiving major contributions from developers on different continents, and that components that are highly distributed have no more defects than those that are not. In contrast, Eclipse is directed and developed largely by one company; with IBM making 96% of the total commits (49% coming from one lab in Ottawa, Canada). We further examined the distribution in each project’s constituent subsystems and report the relationship of pre- and post-release defects with geographic and organizational factors.

@inproceedings{bird2012www,
Author = {Christian Bird and Nachiappan Nagappan},
Title = {{Who? What? Where? Examining Distributed Development in Two Large Open Source Projects}},
Booktitle = {Proceedings of the Working Conference on Mining Software Repositories},
Year = {2012},
location = {Zurich, Switzerland}
}


Cohesive and Isolated Development with Branches
Earl Barr, Christian Bird, Peter Rigby, Abram Hindle, Daniel German, and Premkumar Devanbu. In Proceedings of the International Conference on Fundamental Approaches to Software Engineering, 2012.

The adoption of distributed version control (DVC), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC? This new generation of version control supports two important new features: distributed repositories and histories that preserve branches and merges. Through interviews with lead developers in OSS projects and a quantitative analysis of mined data from the histories of sixty project, we find that the vast majority of the projects now using DVC continue to use a centralized model of code sharing, while using branching much more extensively than before their transition to DVC. We then examine the Linux history in depth in an effort to understand and evaluate how branches are used and what benefits they provide. We find that they enable natural collaborative processes: DVC branching allows developers to collaborate on tasks in highly cohesive branches, while enjoying reduced interference from developers working on other tasks, even if those tasks are strongly coupled to theirs.

@inproceedings{barr2012cid,
Author = {Earl T. Barr and Christian Bird and
Peter C. Rigby and Abram Hindle and Daniel M. German
and Premkumar Devanbu},
Title = {{Cohesive and Isolated Development with Branches}},
Year = {2012},
Booktitle = {{Proceedings of the International Conference on
Fundamental Approaches to Software Engineering}},
Publisher = {Springer},
}

Collaborative Software Development in Ten Years: Diversity, Tools, and Remix Culture
Thomas Zimmermann and Christian Bird. In Proceedings of the Workshop on The Future of Collaborative Software Development, 2012.

@inproceedings{zimmermann2012ccs,
Author = {Thomas Zimmermann and Christian Bird},
Title = {{Collaborative Software Development in Ten Years: Diversity, Tools, and Remix Culture}},
Booktitle = {Proceedings of the Workshop on The Future of Collaborative Software Development},
Year = {2012},
}

Clones: What is that Smell?
Foyzur Rahman, Christian Bird, and Premkumar Devanbu. In Empirical Software Engineering, An International Journal, 2012.

Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell (Fowler et al. 1999 ) and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses the relationship between cloning and defect proneness. For the four medium to large open source projects that we studied, we find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Third, we find little evidence that clones with more copies are actually more error prone. Fourth, we find little evidence to support the claim that clone groups that span more than one file or directory are more defect prone than collocated clones. Finally, we find that developers do not need to put a disproportionately higher effort to fix clone dense bugs. Our findings do not support the claim that clones are really a “bad smell” (Fowler et al. 1999 ). Perhaps we can clone, and breathe easily, at the same time.

@article{rahman2012cwt,
Author = {Foyzur Rahman and Christian Bird and Premkumar Devanbu},
Title = {{Clones: What \textbf{is} that Smell?}},
Journal = {Empirical Software Engineering, An International Journal},
Publisher = {Springer-Verlag},
issn = {1382-3256},
Year = {2012},
url = {http://dx.doi.org/10.1007/s10664-011-9195-3}
}


The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining
Tim Menzies, Christian Bird, Thomas Zimmermann, Wolfram Schulte, and Ekrem Kocaganeli. In Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering, 2011.

The practices of industrial and academic data mining are very different. These differences have significant implications for (a) how we manage industrial data mining projects; (b) the direction of academic studies in data mining; and (c) training programs for engineers who seek to use data miners in an industrial setting.

@inproceedings{menzies2011ise,
Author = {Tim Menzies and Christian Bird and
Tom Zimmermann and Wolfram Schulte and Ekrem Kocaganeli},
Title = {{The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining}},
booktitle = {{Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering}},
year = {2011},
publisher = {ACM}
}

Failure is a Four Letter Word: A Satire in Empirical Research
Andreas Zeller, Thomas Zimmermann, and Christian Bird. In Proceedings of the 7th International Conference on Predictor Models in Software Engineering, 2011.

Background: The past years have seen a surge of techniques predicting failure-prone locations based on more or less complex metrics. Few of these metrics are actionable, though. Aims: This paper explores a simple, easy-to-implement method to predict and avoid failures in software systems. The IROP method links elementary source code features to known software failures in a lightweight, easy-to-implement fashion. Method: We sampled the Eclipse data set mapping defects to files in three Eclipse releases. We used logistic regression to associate programmer actions with defects, tested the predictive power of the resulting classifier in terms of precision and recall, and isolated the most defect-prone actions. We also collected initial feedback on possible remedies. Results: In our sample set, IROP correctly predicted up to 74% of the failure-prone modules, which is on par with the most elaborate predictors available. We isolated a set of four easy-to-remember recommendations, telling programmers precisely what to do to avoid errors. Initial feedback from developers suggests that these recommendations are straightforward to follow in practice. Conclusions: With the abundance of software development data, even the simplest methods can produce "actionable" results.

@inproceedings{zeller-promise-2011,
title = "Failure is a Four-Letter Word: A Satire in Empirical Research",
author = "Andreas Zeller and Thomas Zimmermann and Christian Bird",
year = "2011",
month = "September",
booktitle = "Proceedings of the 7th International Conference on Predictive Models in Software Engineering",
}

Sociotechnical Coordination and Collaboration in Open Source Software
Christian Bird. In Proceedings of the 27th IEEE International Conference on Software Maintenance, Williamsburg, VA, 2011.

Over the past decade, a new style of software development, termed open source software (OSS) has emerged and has originated large, mature, stable, and widely used software projects. As software continues to grow in size and complexity, so do development teams. Consequently, coordination and communication within these teams play larger roles in productivity and software quality. My dissertation focuses on the relationships between developers in large open source projects and how software affects and is affected by these relationships. Fortunately, source code repository histories, mailing list archives, and bug databases from OSS projects contain latent data from which we can reconstruct a rich view of a project over time and analyze these sociotechnical relationships. We present methods of obtaining and analyzing this data as well as the results of empirical studies whose goal is to answer questions that can help stakeholders understand and make decisions about their own teams. We answer questions such as “Do large OSS project really have a disorganized bazaar-like structure?” “What is the relationship between social and development behavior in OSS?” “How does one progress from a project newcomer to a full-fledged, core developer?” and others in an attempt to understand how large, successful OSS projects work and also to contrast them with projects in commercial settings.

@inproceeings{bird2011scc,
Author = {Christian Bird},
title = {{Sociotechnical Coordination and Collaboration in Open Source Software}},
booktitle = {{Proceedings of the 27th IEEE International Conference on Software Maintenance}}
year = {2011},
location = {Williamsburg, VA, USA},
abbrv = {ICSM 11},
publisher = {IEEE},
}

Understanding a Developer Social Network and its Evolution
Qiaona Hong, Sunghun Kim, S. C. Cheung, and Christian Bird. In Proceedings of the 27th IEEE International Conference on Software Maintenance, 2011.

With the increase of large scale software projects, software development and maintenance demand the participation of a group of developers instead of individuals. Therefore having a thorough understanding of the group of developers is critical in terms of improving development and maintenance quality and reducing cost. In contrast to most commercial software endeavors, developers in open source software (OSS) projects enjoy more freedom to organize and contribute to a project in their own working style. Their interactions through various means in the project generate a latent developer social network (DSN). We have observed that developers and their relationships in these DSNs change continually under the influence of differences in the set of active developers and their changing activities. Revealing and understanding the structure and evolution of these social networks as well as their similarities and differences from other more general social networks (GSNs) is of value to our software engineering community, as it allows us to begin building an understanding of how well the findings from other fields based on GSNs apply to DSN. In this paper, we compare DSNs with popular GSNs such as Facebook, Twitter, Cyworld (a large social network in South Korea), and the Amazon recommendation network. Interesting results were found. For instance, while most social networks exhibit power law degree distributions, our DSNs do not. In addition, we also examine how DSNs evolve over time, highlighting how events within a project (such as a release or the departure of prominent developers) impact the makeup of the DSNs, and observe the evolution of topological properties such as modularity and the paths of communities within these networks.

@inproceedings{hong2011uds,
author = {Qiaona Hong and Sunghun Kim and S. C. Cheung and Christian Bird},
title = {Understanind a Developer Social Network and its Evolution},
booktitle = {Proceedings of the 27th IEEE International Conference on Software Maintenance},
year = {2011}
}

Don't Touch My Code! Examining the Effects of Ownership on Software Quality
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. In Proceedings of the the eighth joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Szeged, Hungary, 2011.

Ownership is a key aspect of large-scale software development. We examine the relationship between different ownership measures and software failures in two large software projects: Windows Vista and Windows 7. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. We also empirically identify reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution based defect prediction. Finally we provide recommendations for source code change policies and utilization of resources such as code inspections based on our results.

@inproceedings{bird2011dtm,
Author = {Christian Bird and Nachiappan Nagappan and Brendan Murphy and Harald Gall and Premkumar Devanbu},
Booktitle = {Proceedings of the the eighth joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering},
Title = {{Don't Touch My Code! Examining the Effects of Ownership on Software Quality} },
Year = {2011},
publisher = {ACM}
}


Java Generics Adoption: How New Features are Introduced, Championed, or Ignored
Chris Parnin, Christian Bird, and Emerson Murphy-Hill. In Proceedings of the International Working Conference on Mining Software Respositories, 2011.

Support for generic programming was added to the Java language in 2004, representing perhaps the most significant change to one of the most widely used programming lan- guages today. Researchers and language designers antici- pated this this addition would relieve many long-standing problems plaguing developers, but surprisingly, no one has yet measured whether generics actually provide such relief. In this paper, we report on the first empirical investigation into how Java generics have been integrated into open source software by automatically mining the history of 20 popular open source Java programs, traversing more than 500 million lines of code in the process. We evaluate five hypotheses, each based on assertions made by prior researchers, about how Java developers use generics. For example, our results suggest that generics do not significantly reduce the number of type casts and that generics are usually adopted by a single champion in a project, rather than all committers.},

@inproceedings{parnin2011jga,
Author = {Chris Parnin and Christian Bird and Emerson Murphy-Hill},
Title = {{Java Generics Adoption: How New Features are Introduceded, Champion, or Ignored}},
BookTitle = {Proceedings of the International Working Conference on Mining Software Repositories},
Year = {2011},
}


A Theory of Branches as Goals and Virtual Teams
Christian Bird, Thomas Zimmermann, and Alex Teterev. In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering, 2011.

A common method of managing the complexity of both technical and organizational relationships in a large software project is to use branches within the source code management system to partition the work into teams and tasks. We claim that the files modified on a branch are changed together in a cohesive way to accomplish some task such as adding a feature, fixing a related set of bugs, or implementing a subsystem, which we collectively refer to as the goal of the branch. Further, the developers that work on a branch represent a virtual team. In this paper, we develop a theory of the relationship between goals and virtual teams on different branches. Due to expertise, ownership, and awareness concerns, we expect that if two branches have similar goals, they will also have similar virtual teams or be at risk for communication and coordination breakdowns with the accompanying negative effects. In contrast, we do not expect the converse to always be true. In the first step towards an actionable result, we have evaluated this theory empirically on two releases of the Windows operating system and found support in both.

@inproceedings{bird2011tbg,
Author = {Christian Bird and Thomas Zimmermann and Alex Teterev},
Title = {{A Theory of Branches as Goals and Virtual Teams}},
Booktitle = {Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering},
Year = {2011}
}

An Empirical Study on the Influence of Pattern Roles on Change-Proneness
Daryl Posnett, Christian Bird, and Premkumar Devanbu. In Empirical Software Engineering, an International Journal, 2010.

Identifying change-prone sections of code can help managers plan and allocate maintenance effort. Design patterns have been used to study change-proneness and are widely believed to support certain kinds of changes, while inhibiting others. Recently, several studies have analyzed recorded changes to classes playing design pattern roles and find that the patterns “folklore” offers a reasonable explanation for the reality: certain pattern roles do seem to be less change-prone than others. We push this analysis on two fronts: first, we deploy W. Pree’s metapatterns, which group patterns purely by structure (rather than intent), and argue that metapatterns are a simpler model to explain recent findings by Di Penta et al. (2008). Second, we study the effect of the size of the classes playing the design pattern and metapattern roles. We find that size explains more of the variance in change-proneness than either design pattern or metapattern roles. We also find that both design pattern and metapattern roles were strong determinants of size. We conclude, therefore, that size appears to be a stronger determinant of change-proneness than either design pattern or metapattern roles, and observed differences in change-proneness between roles might be due to differences in the sizes of the classes playing those roles. The size of a class can be found much more quickly, easily and accurately than its pattern-roles. Thus, while identifying design pattern roles may be important for other reasons, as far as identifying change-prone classes, sheer size might be a better indicator.

@article{posnett2011esi,
Author = {Daryl Posnett and Christian Bird and Premkumar Devanbu},
Title = {{An Empirical Study on the Influence of Pattern Roles on Change-Proneness}},
Journal = {Empirical Software Engineering, An International Journal},
Publisher = {Springer-Verlag},
issn = {1382-3256},
Year = {2010},
pages = {1-28},
}

LINKSTER: Enabling Efficient Manual Inspection and Annotation of Mined Data
Christian Bird, Adrian Bachmann, Foyzur Rahman, and Abraham Bernstein. In Demonstration Track, Proceedings of the 17th SIGSOFT Symposium on Foundations of Software Engineering, (formal demonstration) Santa Fe, New Mexico, USA, 2010.

While many uses of mined software engineering data are automatic in nature, some techniques and studies either require, or can be improved, by manual methods. Unfortunately, manually inspecting, analyzing, and annotating mined data can be difficult and tedious, especially when information from multiple sources must be integrated. Oddly, while there are numerous tools and frameworks for automatically mining and analyzing data, there is a dearth of tools which facilitate manual methods. To fill this void, we have developed LINKSTER, a tool which integrates data from bug databases, source code repositories, and mailing list archives to allow manual inspection and annotation. LINKSTER has already been used successfully by an OSS project lead to obtain data for one empirical study.

@inproceedings{bird2010lee,
Author = {Christian Bird and Adrian Bachman and Foyzur Rahman and Abraham Bernstein},
Title = {{Linkster: Enabling Efficient Manual Mining}},
Booktitle = {Demonstration Track, Proceedings of the 17th SIGSOFT Symposium on Foundations of Software Engineering},
Publisher = {ACM},
Year = {2010}
}

The Missing Links: Bugs and Bug-fix Commits
Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar Devanbu, and Abraham Bernstein. In SIGSOFT '10/FSE-18: Proceedings of the 16th ACM SIGSOFT Symposium on Foundations of Software Engineering, Santa Fe, New Mexico, USA, 2010.

Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in manually-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that are not. This is a difficult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkageur, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache Webserver project to exhaustively annotate over all five hundred commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data.

@inproceedings{bachmann2010mlb,
Author = {Adrian Bachmann and Christian Bird and Foyzur Rahman and
Premkumar Devanbu and Abraham Bernstein},
Title = {{The Missing Links: Bugs and Bug-fix Commits}},
Booktitle = {SIGSOFT '10/FSE-18: Proceedings of the 16th ACM SIGSOFT Symposium  on Foundations of Software Engineering},
Publisher = {ACM},
Year = {2010},
}

On the Shoulders of Giants
Earl Barr, Christian Bird, Eric Hyatt, Tim Menzies, and Gregorio Robles. In FSE/SDP Workshop on the Future of Software Engineering Research, Santa Fe, New Mexico, USA, 2010.

Science rests on peer review and the wide-spread dissemination of knowledge. Software engineering research will advance further and faster if the sharing of data and tools were easier and more wide-spread. Pragmatic concerns hinder the realization of this ideal: the time and effort required and the risk of being scooped. We examine the costs and benefits of facilitating sharing in our field in an effort to help the community understand what problems exist and find a solution. We examine how other fields, such as medicine and physics, handle sharing, describe the value of sharing for replication and innovation, and address practical concerns such as standards and warehousing. To launch what we hope will become an ongoing discussion of solutions in our community, we present some ways forward that mitigate the risk of sharing --- partial sharing, registry, escrow, and market.

@inproceedings{barr2010sg,
Author = {Earl Barr and Christian Bird and Eric Hyatt and Tim Menzies and Gregorio Robles},
Title = {{On the Shoulders of Giants}},
Booktitle = {{FSE/SDP Workshop on the Future of Software Engineering Research}},
Year = {2010}
}

THEX: Mining Metapatterns in Java
Daryl Posnett, Christian Bird, and Premkumar Devanbu. In Proceedings of the Seventh Working Conference on Mining Software Repositories, Cape Town, South Africa, 2010.

Design patterns are codified solutions to common object-oriented design (OOD) problems in software development. One of the proclaimed benefits of the use of design patterns is that they decouple functionality and enable different parts of a system to change frequently without undue disruption throughout the system. These OOD patterns have received a wealth of attention in the research community since their introduction; however, identifying them in source code is a difficult problem. In contrast, metapatterns have similar effects on software design, by enabling portions of the system to be extended or modified easily, but are purely structural in nature, and thus easier to detect. Our long-term goal is to evaluate the effects of different OOD patterns on coordination in software teams as well as outcomes such as developer productivity and software quality. In this paper we present Thex, a metapattern detector which scales to large codebases and works on any Java bytecode. We evaluate Thex by examining its performance on codebases with known design patterns (and therefore metapatterns) and find that it performs quite well, with recall of over 90%.

@inproceedings{posnett2010tmm,
Author = {Daryl Posnett and Christian Bird and Premkumar Devanbu},
Title = {{Thex: Mining Metapatterns in Java}},
Booktitle = {Proceedings of the Seventh Working Conference on Mining Software Repositories},
Publisher = {IEEE Computer Society},
Year = {2010},
Location = {Cape Town, South Africa},
}

Clones: What is that Smell?
Foyzur Rahman, Christian Bird, and Premkumar Devanbu. In Proceedings of the Seventh Working Conference on Mining Software Repositories, (Best Paper Award) Cape Town, South Africa, 2010.

Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses relationship between cloning and defect proneness. We find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Finally, we find little evidence that clones with more copies are actually more error prone. Our findings don't support the claim that clones are really a "bad smell". Perhaps we can clone, and breathe easy, at the same time.

@inproceedings{rahman2010cws,
Author = {Foyzur Rahman and Christian Bird and Premkumar Devanbu},
Title = {{Clones: What \emph{is} that Smell?}},
Booktitle = {Proceedings of the Seventh Working Conference on Mining Software Repositories},
Publisher = {IEEE Computer Society},
Year = {2010},
Location = {Cape Town, South Africa},
}

Validity of Network Analyses in Open Source Projects
Roozbeh Nia, Christian Bird, Premkumar Devanbu, and Vladimir Filkov. In Proceedings of the Seventh Working Conference on Mining Software Repositories, Cape Town, South Africa, 2010.

Social network methods are frequently used to analyze networks derived from Open Source Project communication and collaboration data. Such studies typically discover patterns in the information flow between contributors or contributions in these projects. Social network metrics have also been used to predict defect occurrence. However, such studies often ignore or side-step the issue of whether (and in what way) the metrics and networks of study are influenced by inadequate or missing data. In previous studies email archives of OSS projects have provided a useful trace of the communication and co-ordination activities of the participants. These traces have been used to construct social networks that are then subject to various types of analysis. However, during the construction of these networks, some assumptions are made, that may not always hold; this leads to incomplete, and sometimes incorrect networks. THe question then becomes, do these errors affect the validity of the ensuing analysis? In this paper we specifically examine the stability of network metrics in the presence of inadequate and missing data. The issues that we study are: 1) the effect of paths with broken information flow (i.e. consecutive edges which are out of temporal order) on measures of centrality of nodes in the network, and 2) the effect of missing links on such measures. We demonstrate on three different OSS projects that while these issues do change network topology, the metrics used in the analysis are stable with respect to such changes.

@inproceedings{nia2010vna,
Author = {Roozbeh Nia and Christian Bird and Premkumar Devanbu and Vladimir Filkov},
Title = {Validity of Network Analyses in Open Source Projects},
Booktitle = {Proceedings of the Seventh Working Conference on Mining Software Repositories},
Publisher = {IEEE Computer Society},
Year = {2010},
Location = {Cape Town, South Africa},
}

Putting it All Together: Using Socio-Technical Networks to Predict Failures
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. In Proceedings of the 17th International Symposium on Software Reliability Engineering, Mysore, India, 2009.

Studies have shown that social factors in development organizations have a dramatic effect on software quality. Separately, program dependency information has also been used successfully to predict which software components are more fault prone. Interestingly, the influence of these two phenomena have only been studied separately. Intuition and practical experience suggests, however, that task assignment (i.e. who worked on which components and how much) and dependency structure (which components have dependencies on others) together interact to influence the quality of the resulting software. We study the influence of combined socio-technical software networks on the fault-proneness of individual software components within a system. The network properties of a software component in this combined network are able to predict if an entity is failure prone with greater accuracy than prior methods which use dependency or contribution information in isolation. We evaluate our approach in different settings by using it on Windows Vista and across six releases of the Eclipse development environment including using models built from one release to predict failure prone components in the next release. We compare this to previous work. In every case, our method performs as well or better and is able to more accurately identify those software components that have more post-release failures, with precision and recall rates as high as 85%.

@inproceedings{bird2009pat,
Author = {Christian Bird and Nachiappan Nagappan and Premkumar Devanbu and Harald Gall and Brendan Murphy},
Booktitle = {Proceedings of the 17th International Symposium on Software Reliability Engineering},
Title = {{Putting it All Together: Using Socio-Technical Networks to Predict Failures} },
Year = {2009}
}

Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista (CACM)
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. In Communications of the ACM, 2009.

Existing literature on distributed development in software engineering, and other fields discuss various challenges, including cultural barriers, expertise transfer difficulties, and communication and coordination overhead. Conventional wisdom, in fact, holds that distributed software development is riskier and more challenging than collocated development. We revisit this belief, empirically studying the overall development of Windows Vista and comparing the post-release failures of components that were developed in a distributed fashion with those that were developed by collocated teams. We found a negligible difference in failures. This difference becomes even less significant when controlling for the number of developers working on a binary. Furthermore, we also found that component characteristics (such as code churn, complexity, dependency information, and test code coverage) differ very little between distributed and collocated components. Finally, we examine the software process used during the Vista development cycle and examine how it may have mitigated some of the difficulties of distributed development introduced in prior work in this area.

@article{bird2009dddb,
Author = {Christian Bird and Nachiappan Nagappan and   Premkumar Devanbu and Harald Gall and Brendan Murphy},
journal = {Communications of the ACM},
Title = {{Does Distributed Development Affect Software Quality? An Empirical   Case Study of Windows Vista} },
publisher = {ACM},
address = {New York, NY, USA},
Month = {August},
volume = {52},
number = {8},
pages = {85--93},
pdf = {http://portal.acm.org/citation.cfm?id=1536639},
Year = {2009}
}

Fair and Balanced? Bias in Bug-Fix Datasets
Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, and Premkumar Devanbu. In Proceedings of the the Seventh joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, Amsterdam, Netherlands, 2009.

Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of systematic bias. We then investigate the potential effects of "unfair, imbalanced" datasets on the performance of prediction techniques. We draw the lesson that bias is a critical problem that threatens both the effectiveness of processes that rely on biased datasets to build prediction models and the generalizability of hypotheses tested on biased data.

@inproceedings{bird2009fbb,
Author = {Christian Bird and Adrian Bachmann and Eirik Aune and John Duffy and Abraham Bernstein and
Booktitle = {Proceedings of the the Seventh joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering},
Location = {Amsterdam, Netherlands},
Title = {{Fair and Balanced? Bias in Bug-Fix Datasets}},
Year = {2009}
}

The Promises and Perils of Mining Git
Christian Bird, Peter Rigby, Earl Barr, David Hamilton, Daniel German, and Premkumar Devanbu. In Proceedings of the Sixth Working Conference on Mining Software Repositories, Vancouver, Canada, 2009.

We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as How do contributions flow between developers to the official project repository?'' However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data.

@inproceedings{bird2009ppm,
Author = {Christian Bird and Peter C. Rigby and Earl T. Barr and David J. Hamilton and Daniel M. German and Prem Devanbu},
Booktitle = {Proceedings of the Sixth Working Conference on Mining Software Repositories},
Publisher = {IEEE Computer Society},
Title = {{The Promises and Perils of Mining Git}},
Year = {2009}
}

Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. In Proceedings of the 31st International Conference on Software Engineering, (ACM SIGSOFT Distinguished Paper Award) Vancouver, Canada, 2009.

Existing literature on distributed development in software engineering, and other fields discuss various challenges, including cultural barriers, expertise transfer difficulties, and communication and coordination overhead. Conventional wisdom, in fact, holds that distributed software development is riskier and more challenging than collocated development. We revisit this belief, empirically studying the overall development of Windows Vista and comparing the post-release failures of components that were developed in a distributed fashion with those that were developed by collocated teams. We found a negligible difference in failures. This difference becomes even less significant when controlling for the number of developers working on a binary. Furthermore, we also found that component characteristics (such as code churn, complexity, dependency information, and test code coverage) differ very little between distributed and collocated components. Finally, we examine the software process used during the Vista development cycle and examine how it may have mitigated some of the difficulties of distributed development introduced in prior work in this area.

@inproceedings{bird2009ddd,
Author = {Christian Bird and Nachiappan Nagappan and   Premkumar Devanbu and Harald Gall and Brendan Murphy},
Booktitle = {Proceedings of the 31st International Conference on Software Engineering},
Title = {{Does Distributed Development Affect Software Quality? An Empirical   Case Study of Windows Vista} },
Year = {2009},
publisher = {IEEE Computer Society}
}

Structure and Dynamics of Research Collaboration in Computer Science
Christian Bird, Earl Barr, Andre Nash, Premkumar Devanbu, Vladimir Filkov, and Zhendong Su. In Proceedings of the Ninth SIAM International Conference on Data Mining, Sparks, Nevada, USA, 2009.

Complex systems exhibit emergent patterns of behavior at different levels of organization. Powerful network analysis methods, developed in physics and social sciences, have been successfully used to tease out patterns that relate to community structure and network dynamics. In this paper, we mine the complex network of collaboration relationships in computer science, and adapt these network analysis methods to study collaboration and interdisciplinary research at the individual, within-area and network-wide levels. We start with a collaboration graph extracted from the DBLP bibliographic database and use extrinsic data to define research areas within computer science. Using topological measures on the collaboration graph, we find significant differences in the behavior of individuals among areas based on their collaboration patterns. We use community structure analysis, betweenness centralization, and longitudinal assortativity as metrics within each area to determine how centralized, integrated, and cohesive they are. Of special interest is how research areas change with time. We longitudinally examine the area overlap and migration patterns of authors, and empirically confirm some computer science folklore. We also examine the degree to which the research areas and their key conferences are interdisciplinary. We find that data mining and software engineering are very interdisciplinary while theory and cryptography are not. Specifically, it appears that SDM and ICSE attract authors who publish in many areas while FOCS and STOC do not. We also examine isolation both within and between areas. One interesting discovery is that cryptography is highly isolated within the larger computer science community, but densely interconnected within itself.

@inproceedings{bird2009sdr,
Author = {Christian Bird and Earl Barr and   Andre Nash and Premkumar Devanbu and Vladimir Filkov   and Zhendong Su},
Booktitle = {Proceedings of the Ninth SIAM International Conference on Data Mining},
Publisher = {SIAM},
Title = {{Structure and Dynamics of Research Collaboration in Computer Science} },
Year = {2009}
}

Latent Social Structure in Open Source Projects
Christian Bird, David Pattison, Raissa D'Souza, Vladimir Filkov, and Premkumar Devanbu. In Proceedings of the 16th ACM SIGSOFT Symposium on Foundations of Software Engineering, Atlanta, Georgia, USA, 2008.

Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational "cathedrals" are to be contrasted with the "bazaar-like" nature of Open Source Software (OSS) Projects, which have no pre-designed organizational structure. Any structure that exists is dynamic, self-organizing, latent, and usually not explicitly stated. Still, in large, complex, successful, OSS projects, we do expect that subcommunities will form spontaneously within the developer teams. Studying these subcommunities, and their behavior can shed light on how successful OSS projects self-organize. This phenomenon could well hold important lessons for how commercial software teams might be organized. Building on known well-established techniques for detecting community structure in complex networks, we extract and study latent subcommunities from the email social network of several projects: Apache HTTPD, Python, PostgresSQL, Perl, and Apache ANT. We then validate them with software development activity history. Our results show that subcommunities do indeed spontaneously arise within these projects as the projects evolve. These subcommunities manifest most strongly in technical discussions, and are significantly connected with collaboration behaviour.

@inproceedings{bird2008lss,
Author = {Christian Bird and David Pattison and Raissa D'Souza and   Vladimir Filkov and Premkumar Devanbu},
Booktitle = {SIGSOFT '08/FSE-16: Proceedings of the 16th ACM SIGSOFT Symposium  on Foundations of Software Engineering},
Location = {Atlanta, Georgia, USA},
Pages = {24--35},
Publisher = {ACM},
Title = {{Latent Social Structure in Open Source Projects} },
Year = {2008}
}

Talk and Work: a Preliminary Report
David Pattison, Christian Bird, and Premkumar Devanbu. In Proceedings of the Fifth International Working Conference on Mining Software Repositories, Leipzig, Germany, 2008.

@inproceedings{pattison2008twp,
Author = {David Pattison and Christian Bird and Premkumar Devanbu},
Booktitle = {Proceedings of the Fifth International Working Conference on Mining Software Repositories},
Pages = {113--116},
Publisher = {ACM},
Title = {{Talk and Work: a Preliminary Report} },
Year = {2008},
}

Recommending Random Walks
Zachary Saul, Vladimir Filkov, Premkumar Devanbu, and Christian Bird. In Proceedings of the the Sixth joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, (Nominated for Distinguished Paper Award) Dubrovnik, Croatia, 2007.

@inproceedings{saul2007rrw,
Address = {New York, NY, USA},
Author = {Zachary M. Saul and Vladimir Filkov and Premkumar Devanbu and Christian Bird},
Booktitle = {ESEC-FSE '07: Proceedings of the the Sixth joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering},
Pages = {15--24},
Publisher = {ACM},
Title = {{Recommending Random Walks} },
Year = {2007}
}

Detecting Patch Submission and Acceptance in OSS Projects
Christian Bird, Alex Gourley, and Premkumar Devanbu. In Proceedings of the Fourth International Workshop on Mining Software Repositories, Minneapolis, Minnesota, USA, 2007.

@inproceedings{bird2007dps,
Author = {Christian Bird and Alex Gourley and Prem Devanbu},
Booktitle = {Proceedings of the Fourth International Workshop on Mining Software Repositories},
Publisher = {IEEE Computer Society},
Title = {{Detecting Patch Submission and Acceptance in OSS Projects} },
Year = {2007}
}

Open Borders? Immigration in Open Source Projects
Christian Bird, Alex Gourley, Premkumar Devanbu, Anand Swaminathan, and Greta Hsu. In Proceedings of the Fourth International Workshop on Mining Software Repositories, Minneapolis, Minnesota, USA, 2007.

@inproceedings{bird2007obi,
Author = {Christian Bird and Alex Gourley and Prem Devanbu and Anand Swaminathan and Greta Hsu},
Booktitle = {Proceedings of the Fourth International Workshop on Mining Software Repositories},
Publisher = {IEEE Computer Society},
Title = {{Open Borders? Immigration in Open Source Projects} },
Year = {2007}
}

Visualizing Social Interaction in Open Source Software Projects
Michael Ogawa, Kwan-Liu Ma, Christian Bird, Premkumar Devanbu, and Alex Gourley. In Sixth International Asia-Pacific Symposium on Visualization, 2007.

@inproceedings{ogawa2007vsi,
Author = {Michael Ogawa and Kwan-Liu Ma and Christian Bird and Premkumar T. Devanbu and Alex Gourley},
Booktitle = {Sixth International Asia-Pacific Symposium on Visualization},
Pages = {25--32},
Title = {{Visualizing Social Interaction in Open Source Software Projects} },
Year = {2007}
}

Mining Email Social Networks
Christian Bird, Alex Gourley, Premkumar Devanbu, Michael Gertz, and Anand Swaminathan. In Proceedings of the Third International Workshop on Mining software repositories, Shanghai, China, 2006.

@inproceedings{bird2006mes,
Author = {Christian Bird and Alex Gourley and Prem Devanbu and Michael Gertz and Anand Swaminathan},
Booktitle = {Proceedings of the Third International Workshop on Mining software repositories},
Pages = {137--143},
Publisher = {ACM}
}

Mining Email Social Networks in Postgres
Christian Bird, Alex Gourley, Premkumar Devanbu, Michael Gertz, and Anand Swaminathan. In Proceedings of the Third International Workshop on Mining software repositories (Challenge Track), Shanghai, China, 2006.

@inproceedings{bird2006mesp,
Author = {Christian Bird and Alex Gourley and Prem Devanbu and Michael Gertz and Anand Swaminathan},
Booktitle = {Proceedings of the Third International Workshop on Mining software repositories (Challenge Track)},
Publisher = {ACM},
Title = {{Mining Email Social Networks in Postgres} },
Year = {2006}
}

Others
Color Scheme