Summary: Delta assists you in minimizing "interesting" files subject to a test of their interestingness. A common such situation is when attempting to isolate a small failure-inducing substring of a large input that causes your program to exhibit a bug. Our implementation is based on the Delta Debugging algorithm found here: http://www.st.cs.uni-sb.de/dd/

Maintainer: Daniel S. Wilkerson
Developers: Scott McPeak

This work was supported by professors Alex Aiken and George Necula and was done at UC Berkeley.

Here are the current releases. Feel free to just get the current Subversion repository version as a guest user.

Introduction

The best way to understand how to use delta is with an example of its usage. Below is one helpfully written up for me by Simon Goldsmith. For those wanting more, I also wrote a more detailed document describing each tool: Using Delta.

Note that what follows is an example of using delta to minimize an input file to a program that reads programs, much as a compiler does. Note two features of file minimization that are present in the example.

Do a controlled experiment. Below we don't just minimize a file that causes Oink to produce an error message, we minimize a file that causes gcc to accept AND oink to reject in a specific way. That is, the test delta does is a controlled experiment, where gcc is the control. Ignoring this aspect of the problem seems to be a frequent mistake of first time users.

Exploit nested structure. One may minimize files of simpler syntax than C++ but really all files are interesting in the first place because they are in some language or another. Some simple configuration files are literally just a list of lines but most languages have some nested structure. Multidelta filters the input through the topformflat utility (included) to suppress any newlines past a particular nesting depth; this "explains" the nesting structure to the otherwise line-oriented delta utility (a brilliantly simple idea of Scott McPeak's). If your input file language has no nesting structure, you can hack on multidelta to remove the filtration through topformflat or just use the raw delta program. If your language has a different nesting structure than C/C++, you can write your own multidelta and substitute it. A simple flex program should suffice; it need not be terribly accurate for delta to do well.

Note also that this example is edited for simplicity from the raw output; we sincerely hope we did not introduce any bugs.

Using multidelta to turn an interesting file into a smaller interesting file

Simon Goldsmith 8 April / 12 Sept, 2005.

(1) Make a new directory and copy the file there.

% mkdir deltaexample
% cd deltaexample/
% cp ../nsCSSDataBlock/orig/nsCSSDataBlock-23801-1112390043.cpp.g.ii ./foo.ii
% chmod +w foo.ii

(2) (optional) Put a read-only backup copy of the file in, say, orig/ .

% mkdir orig
% cp foo.i orig/
% chmod -R a-w orig

(3) Write a script (do not call it 'test' as that is a system utility program) to test the interestingness of the file, as we do below.

Note that for this example, "interesting" means the file passes gcc but fails oink with a particular error message. That is, if 1) gcc accepts, and 2) oink rejects with the desired error message, then we return zero (meaning "interesting"). If anything else happens then we return a nonzero exit code (meaning "not interesting")

Some reminders about shell: a zero exit code means "true"; so for the purposes of &&, a zero exit code means "keep going" and grep returns 0 if it matches, nonzero if not. We redirect output to /dev/null because the output of delta is noisy. Be careful of quoting hell: notice that we've used '.' to match characters like single quote.

% cat > test1.sh
#!/bin/bash

FILE=foo.ii
OINK=/home/simon/oink_all/oink/oink
GCC=/usr/bin/gcc

$GCC -c $FILE -o /dev/null &> /dev/null && $OINK $FILE | grep 'error: cannot convert argument type .class .* const &. to receiver parameter type' &> /dev/null
^D

(4) Make the script executable and run it on the file -- make sure it returns 0. Optionally turn off the redirection to /dev/null temporarily to check the error message that is being found by the grep.

% chmod +x test1.sh
% ./test1.sh foo.i ; echo $?
0

(5) Run multidelta with the script on the file several times at, say, levels 0 0 1 1 2 2 10.

% multidelta -level=0 ./test1.sh foo.ii
(check email)
% multidelta -level=0 ./test1.sh foo.ii
(read slashdot)
% multidelta -level=1 ./test1.sh foo.ii
% multidelta -level=1 ./test1.sh foo.ii
% multidelta -level=2 ./test1.sh foo.ii
% multidelta -level=2 ./test1.sh foo.ii
% multidelta -level=10 ./test1.sh foo.ii
% multidelta -level=10 ./test1.sh foo.ii

(6) The input file will be modified in place and you should be left with something smaller.

[simon@otter][deltaexample]$ ls -l
total 116
-rw-r--r--  1 simon simon  8451 Sep 12 17:10 foo.ii
-rw-r--r--  1 simon simon  8948 Sep 12 17:10 foo.ii.bak
-rw-r--r--  1 simon simon  8451 Sep 12 17:10 foo.ii.ok
-rw-r--r--  1 simon simon 57739 Sep 12 17:10 log
-rw-r--r--  1 simon simon  2752 Sep 12 17:10 multidelta.log
dr-xr-xr-x  2 simon simon  4096 Sep 12 17:16 orig/
-rwxr-xr-x  1 simon simon   385 Sep 12 16:36 test1.sh*
-rw-r--r--  1 simon simon    11 Sep 12 16:02 test1.sh~

[simon@otter][deltaexample]$ ls -l orig/
total 552
-r--r--r--  1 simon simon 558970 Sep 12 16:00 foo.ii

(7) Hack on foo.ii by hand, re-running test1.sh each time to check it is still "interesting". Sometimes it helps to hack on foo.ii a little to get delta unstuck and then rerun delta again. You might want to run indent as well whenever you stop to look at foo.ii as topformflat makes a mess.

Final file:

class A {};
int main() {
  const A *val;
  val->~A ();
}

Note that the original file was about 560 KB!