Submit to the International Conference on Automated Software Engineering

ase

I’m serving on the program committee for ASE this year.  Please consider submitting to it.

The IEEE/ACM Automated Software Engineering (ASE) Conference series is the premier research forum for automated software engineering. Each year, it brings together researchers and practitioners from academia and industry to discuss foundations, techniques and tools for automating the analysis, design, implementation, testing, and maintenance of large software systems. In 2015, ASE will be celebrating its 30th year as a premier venue for novel work in software automation. (more…)

Using Developer-Interaction Trails to Triage Change Requests (MSR 2015)

forest-trail

By Motahareh Bahrami Zanjani, Huzefa Kagdi, and Christian Bird

Published in Proceedings of the International Conference on Mining Software Repositories

The paper presents an approach, namely iHDev, to recommend developers who are most likely to implement incoming change requests. The basic premise of iHDev is that the developers who interacted with the source code relevant to a given change request are most likely to best assist with its resolution. A machine-learning technique is first used to locate source-code entities relevant to the textual description of a given change request. iHDev then mines interaction trails (i.e., Mylyn sessions) associated with these source-code entities to recommend a ranked list of developers. iHDev integrates the interaction trails in a unique way to perform its task, which was not investigated previously.
An empirical study on open source systems Mylyn and Eclipse Project was conducted to assess the effectiveness of iHDev. A number of change requests were used in the evaluated benchmark. Recall for top one to five recommended developers and Mean Reciprocal Rank (MRR) values are reported. Furthermore, a comparative study with two previous approaches that use commit histories and/or the source-code authorship information for developer recommendation was performed. Results show that iHDev could provide a recall gain of up to 127.27% with equivalent or improved MRR values by up to 112.5%.

(more…)

Characteristics of Useful Code Reviews: An Empirical Study at Microsoft (MSR 2015)

friendly-code-review

By Amiangshu Bosu, Michaela Greiler, and Christian Bird

Published in Proceedings of the International Conference on Mining Software Repositories

Over the past decade, both open source and commercial software projects have adopted contemporary peer code review practices as a quality control mechanism. Prior research has shown that developers spend a large amount of time and effort performing code reviews. Therefore, identifying factors that lead to useful code reviews can benefit projects by increasing code review effectiveness and quality. In a three-stage mixed research study, we qualitatively investigated what aspects of code reviews make them useful to developers, used our findings to build and verify a classification model that can distinguish between useful and not useful code review feedback, and finally we used this classifier to classify review comments enabling us to empirically investigate factors that lead to more effective code review feedback.
In total, we analyzed 1.5 millions review comments from five Microsoft projects and uncovered many factors that affect the usefulness of review feedback. For example, we found that the proportion of useful comments made by a reviewer increases dramatically in the first year that he or she is at Microsoft but tends to plateau afterwards. In contrast, we found that the more files that are in a change, the lower the proportion of comments in the code review that will be of value to the author of the change. Based on our findings, we provide recommendations for practitioners to improve effectiveness of code reviews.

(more…)

Lessons Learned from Building and Deploying a Code Review Analytics Platform (MSR 2015)

analytics

By Christian Bird, Trevor Carnahan, and Michaela Greiler

Published in Proceedings of the International Conference on Mining Software Repositories

Tool-based code review is growing in popularity and has become a standard part of the development process at Microsoft. Adoption of these tools makes it possible to mine data from code reviews and provide access to it. In this paper, we present an experience report for CodeFlow Analytics, a system that collects code review data, generates metrics from this data, and provides a number of ways for development teams to access the metrics and data. We discuss the design, design decisions and challenges that we encountered when building CodeFlow Analytics. We contacted teams that used CodeFlow Analytics over the past two years and discuss what prompted them to use CodeFlow Analytics, how they have used it, and what the impact has been. Further, we survey research that has been enabled by using the CodeFlow Analytics platform. We provide a series of lessons learned from this experience to help others embarking on a task of building an analytics platform in an enterprise setting.

(more…)

The Uniqueness of Changes: Characteristics and Applications (MSR 2015)

snowflakes

By Baishakhi Ray, Meiyappan Nagappan, Christian Bird, Nachiappan Nagappan, and Thomas Zimmermann

Published in Proceedings of the International Conference on Mining Software Repositories

Changes in software development come in many forms. Some changes are frequent, idiomatic, or repetitive e.g. adding checks for nulls or logging important values) while others are unique. We hypothesize that unique changes are different from the more common similar (or non-unique) changes in important ways; they may require more expertise or represent code that is more complex or prone to mistakes. As such, these unique changes are worthy of study. In this paper, we present a definition of unique changes and provide a method for identifying them in software project history. Based on the results of applying our technique on the Linux kernel and two large projects at Microsoft, we present an empirical study of unique changes. We explore how prevalent unique changes are and investigate where they occur along the architecture of the project. We further investigate developers’ contribution towards uniqueness of changes. We also describe potential applications of leveraging the uniqueness of change and implement two of those applications, evaluating the risk of changes based on uniqueness and providing change recommendations for non-unique changes.

(more…)

Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets (ICSE 2015)

exploded-bike

By Michael Barnett, Christian Bird, Joao Brunet, and Shuvendu K. Lahiri

Published in Proceedings of the 37th International Conference on Software Engineering

Code Reviews, an important and popular mechanism for quality assurance, are often performed on a changeset, a set of modified files that are meant to be committed to a source repository as an atomic action. Understanding a code review is more difficult when the changeset consists of multiple, independent, code differences. We introduce ClusterChanges, an automatic technique for decomposing changesets and evaluate its effectiveness through both a quantitative analysis and a qualitative user study.

(more…)

Build it yourself! Homegrown Tools in a Large Software Company (ICSE 2015)

tools

By Edward K. Smith, Christian Bird, and Thomas Zimmermann

Published in Proceedings of the 37th International Conference on Software Engineering

Developers sometimes take the initiative to build tools to solve problems they face. What motivates developers to build these tools? What is the value for a company? Are the tools built useful for anyone besides their creator? We conducted a qualitative study of tool building, adoption, and impact within Microsoft. This paper presents our findings on the extrinsic and intrinsic factors linked to toolbuilding, the value of building tools, and the factors associated with tool spread. We find that the majority of developers build tools. While most tools never spread beyond their creator’s team, most have more than one user, and many have more than one collaborator. Organizational cultures that are receptive towards toolbuilding produce more tools, and more collaboration on tools. When nurtured and spread, homegrown tools have the potential to create significant impact on organizations.

(more…)

The Design Space of Bug Fixes and How Developers Navigate It (TSE)

By Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan

When software engineers fix bugs, they may have several options as to how to fix those bugs. Which fix they choose has many implications, both for practitioners and researchers: What is the risk of introducing other bugs during the fix? Is the bug fix in the same code that caused the bug? Is the change fixing the cause or just covering a symptom? In this paper, we investigate alternative fixes to bugs and present an empirical study of how engineers make design choices about how to fix bugs. We start with a motivating case study of the Pex4Fun environment. Then, based on qualitative interviews with 40 engineers working on a variety of products, data from 6 bug triage meetings, and a survey filled out by 326 Microsoft engineers and 37 developers from other companies, we found a number of factors, many of them non-technical, that influence how bugs are fixed, such as how close to release the software is. We also discuss implications for research and practice, including how to make bug prediction and localization more accurate.

(more…)

Learning Natural Coding Conventions (FSE 2014)

By Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton

Published in Proceedings of the 22nd International Symposium on Foundations of Software Engineering

ACM SigSoft Distinguished Paper

Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project’s coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94% accuracy in its top suggestions for identifier names. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.

(more…)