An Empirical Analysis of the Co-evolution of Schema and Code in Database Applications

This is a supplementary web page to accompany our paper which was accepted in the Proceedings of ESEC/FSE 2013.

  

Overview

Modern database applications are among the most widely used and complex software systems. They constantly evolve,responding to changes to data, database schemas, and code. It is challenging to manage these changes and ensure that everything co-evolves consistently. For example, when a database schema is modified, all the code that interacts with the database must be changed accordingly. Although database evolution and software evolution have been extensively studied in isolation, the co-evolution of schema and code has largely been unexplored.

This paper presents the first comprehensive empirical analysis of the co-evolution of database schemas and code in ten popular large open-source database applications, totaling over 160K revisions. Our major findings include: 1) Database schemas evolve frequently during the application lifecycle, exhibiting a variety of change types with similar distributions across the studied applications; 2) Overall,schema changes induce significant code-level modifications, while certain change types have more impact on code than others; and 3) Co-change analyses can be viable to automate or assist with database application evolution. We have also observed that: 1) 80% of the schema changes happened in 20-30% of the tables, while nearly 40% of the tables did not change; and 2) Referential integrity constraints and stored procedures are rarely used in our studied subjects. We believe that our study reveals new insights into how database applications evolve and useful guidelines for designing assistive tools to aid their evolution.

More details are shown in the full version of this study.

  

Data

We studied ten poplular database applications, the data collected from these projects are listed as follows:

  

Results

  • RQ1: How frequently and extensively do schemas evolve?

    (1) The evolution trend of tables/columns in the studied projects.
    dataset   gnuplot script   graph
    (2) Growth and change rates of schema size.
    tables

  • RQ2: How do database schemas evolve?

    (1) Distribution of atomic changes w.r.t. the high-level schema change categories.
    dataset   gnuplot script   graph
    (2) Distribution of atomic changes w.r.t. the low-level schema change categories.
    tables
    (3) Distribution of addition/deletion/change on schema.
    dataset   gnuplot script   graph

  • RQ3: How much application code has co-changed with a schema change?

    (1) Manual study on the valididy of co-change analysis.
    (i) How many valid DB revisions contain the co-change information of schema and code?
    dataset   gnuplot script   graph
    (ii) How much code-level change is truly caused by schema changes?
    dataset   gnuplot script   graph
    (2) Estimated co-changed code size w.r.t. high-level schema changes.
    dataset   R script

  • Discussions

    (1) Coverage of schema changes by tables.
    dataset   gnuplot script   graph
    (2) Distribution on the number of atomic schema changes across different tables.
    dataset   gnuplot script   graph

  

Code

  • DAChangeAnalyzer - an analyzer that can extract changed lines, functions, classes and files from the selected revisions in SVN repository.

  

Tools

  • SVNKit - A pure Java toolkit to work with subversion repositories.

  • Java-diff-utils - an opensource library for performing the comparison operations between texts.

  • Mysqldiff - is suite of Perl modules and accompanying CLI script for comparing the schema (table structures) of two MySQL databases.

  • R - is a software environment for statistical computing and graphics.

  • Gnuplot - is portable command-line driven graphing utility.

  

Authors

  • Dong Qiu - Southest University, China
    E-mail: dongqiu at seu dot edu dot cn

  • Bixin Li - Southest University, China
    E-mail: bx.li at seu dot edu dot cn

  • Zhendong Su - University of California, Davis, USA
    E-mail: su at cs dot ucdavis dot edu