Graham Allan's Final Year Project (B)Log Book: October 2009

Sunday, 25 October 2009

Brainstorming techniques for Enhancing Findbugs - implement a planned feature.

While reading through the Findbugs manual, a section jumped out at me: Chapter 8, Filter Files. Since I had previously mentioned I would have to find out the level of support Findbugs had for excluding error reports, I chose to skip the first 7 chapters to have a quick glance.

Even before the introduction began, another comment caught my eye:

Planned Features
Filters are currently only supported by the Command Line interface. Eventually, filter support will be added to the GUI.

Being naive, I thought the support for filtering files would already be included in the GUI version of Findbugs. This would represent a serious candidate for a choice of how to enhance Findbugs - taking on the work of implementing filtering options through the GUI.

The filtering is done through a matching scheme, which offers a flexible and powerful way to define which bugs are excluded. A set of the possible ways to match bugs are:

Match all bug reports for a class.
Match certain tests from a class by specifying their abbreviations.
Match certain tests from all classes by specifying their abbreviations.
Match certain tests from all classes by specifying their category.
Match bug types from specified methods of a class by their abbreviations.
Match a particular bug pattern in a particular method.
Match a particular bug pattern with a given priority in a particular method.
Match minor bugs introduced by AspectJ compiler (you are probably not interested in these unless you are an AspectJ developer).
Match bugs in specific parts of the code base
Match bugs on fieds or methods with specific signatures

With the exclusion matchers available, it would be possible to exclude specific bugs in as fine a granularity as a method in a class. It would also allow wider matching, like 'ignore all of this bug type in this list of packages'. Using information attached to each bug report, such as location or bug type, the GUI would be able to provide a choice of potential matchers, greatly reducing the cost of reducing false positives.

Providing a GUI of defining these exclusion filters would have the following advantages:

The bugs to stop reporting are controlled by the developers rather than an algorithm. Developers are a cynical bunch, and it would probably take a lot to convince them to place their faith in machine learning techniques. NB: it may be interesting to hold a survey with this, with Findbugs users.
Filters are already an established model for removing bug reports. Though currently accessible through the command line, they do have documentation, and will no doubt have users who have became familiar with the technique. This would reduce the barrier to entry for using the developed system.
Including filtering in the GUI is a planned feature. This would greatly improve the chances of the developed system being somehow involved with the Findbugs project.
Completely compatible with the crowdsourcing technique described in an earlier post, but would still provide a benefit for single or small development teams.
The GUI would be able to recognise cases where, rather than specify an exclusion filter, an annotation could be used. For instance, when it's known a parameter is not null, rather than ignore the error, place Findbugs' @Nonnull annotation on it.

Some disadvantages (including personal rather than technical) are:

Implementing a GUI for Findbugs is unlikely to require machine learning techniques, removing the necessity for me to work with them (this may be a hidden benefit :-P)
False positive removal would still be a manual process, which is a bit of a burden. Though there is always going to be the trade-off of developer burden vs. trustability I think.

Not that I'm investing in this avenue for enhancing Findbugs, but it's probably my favoured technique so far.

Saturday, 24 October 2009

Brainstorming techniques for Enhancing Findbugs - Crowdsourcing

One possible technique for removing the number of false positives (FPs) reported by Findbugs is crowdsourcing. This technique is completely different from the supervised learning[todo:link] described earlier. The technique of crowdsourcing involves combining the efforts of a large group of users (supervised learning does not necessarily have to be monogomous, that was just the scenario previously described). The actions required from users tend to be small, quick, unobtrusive and low in number. This is key difference from a supervised learning approach, where a single user's input would be collected over time, requiring a greater investment.

Using a crowdsourcing technique to reduce FPs would probably involve distributing an ignore-list for Findbugs. If memory serves, this is already a feature of Findbugs, though I don't know the file format, or its flexibility. When a user comes across a false positive, they would simply mark it as such. The rest of the developers on the project would then share the ignore-list and the FP would no longer be reported by their instance of Findbugs. The mechanism for achieving this which springs to mind is simple - commit the ignore-list to version control. Is it likely a team will be using Findbugs if they are not employing one of the first requirements for team development?

However, if a team is not using version control, there are alternatives. The Findbugs system could be modified to submit the ignore action to a centralised server when performed, and retrieve the ignore-list on each run. In this case the centralised server does not need to be anything complicated - a web server would suffice, the persistance layer would only need to be the ignore-list file.

There are some benefits to this system:

Simplicity - the concept is incredibly simple. The mapping between user action and the FPs ignored is 1:1. The system does no learning to use one action to ignore many FPs. Not only does this simplify development, it would lower the cognitive load on developers.
Safety - the system will not neglect any true errors. Users may set the system to ignore a true error, but as far as I know, that's a risk with Findbugs as it is anyway. But at least in that case the user is consciously making the decision to ignore the error, rather than the system doing it 'behind their back'.

There are also some obvious and unsurmountable disadvantages:

Inherently unscaledownable (not a word I know, but I like it) - a single or low-user development team will not benefit from a crowdsourcing system. The only real advantage a system could offer for a single developer would be to provide a better interface for ignoring FPs. There may not be any problem with the current interface for doing this, I've yet to find this out.
Higher maintanence - each FP must be explicitly ignored. If crowdsourcing is the only mechanism used and there's no intelligence about the system and reducing the number of FPs would take more developer effort.

Despite these disadvantages, crowdsourcing is still something to consider. However, whether crowdsourcing is the chosen technique, it's likely whatever system is implemented, it will have to be team-aware.

Friday, 23 October 2009

Brainstorming possible techniques for Enhancing Findbugs - Supervised Learning

One possible technique for reducing the false positives (FPs) would be to have developers train a machine learning system to recognise occurrences. Under this scheme, the machine would recognise a user action which marked a reported bug as a FP. At this point, there are several pieces of information which would be candidates for use as training data:

- location of the FP
- the type(s) involved
- the type of error reported by Findbugs

At this point I'm only considering a scenario where a single developer works on the codebase. This list is also not comprehensive, just what I can think of on a Friday morning travel into uni.

Location of the FP
The location of the FP includes the complete range of scope, e.g. from project to package to class to method (though there are other locations to be considered, such as constructors or static initialisation blocks). At this point, the system would look for FPs marked in a similar location. Consider a hypothetical FP which reports a potential null dereference of a parameter in a package-private constructor . The calls to the constructor already check for nullness of the parameter in question, and construction is limited to the package. At this point the system could learn that this constructor, within this package, is checked for nullness. If another package private constructor is added, it could potentially use the learned information to remove errors relating to null dereferencing.


class Foo {
    ...
    Foo(Object potentiallyNull) {
        this.bar = potentiallyNull.getBar(); // Error: potential null dereference
    }
}

class FooCreator {
    ...
    void createFoo() {
        ...
        if(potentiallyNull == null) throw new NullPointerException();
        this.foo = new Foo(potentiallyNull); // Null dereference inside Foo constructor is not possible*
    }
}

* ignoring thread safety for this example.

This is example is not the greatest, as there is the possibility of overlooking true errors (who says a new call to the constructor is doing the null checks previous callers were?) but it does demonstrate how location can be taken into account to learn about the codebase.

It may be likely that Findbugs can't always trace back to the calls to check as it may be too expensive. But if this type of learning could trigger an expensive check only in cases where it was likely to remove an FP, it would be ensure true errors were not neglected. The original trigger would be the developer performing an action which said "It's safe to trigger this check in this situation". The problem with this approach is that the code to run such an expensive check would depend on the bug pattern detector, and must be explicitly written for each case - which is perhaps anaethema to machine learning.

The (Java) Types Involved
Re-using the example from above, but reversed (a method call has an error reported that it could be passing in a potential null value to a constructor), the system could learn about the type involved. The code below shows the scenario:


class Foo {
    ...
    Foo(Object potentiallyNull) {
        if(potentiallyNull != null)
            this.bar = potentiallyNull.getBar(); // Null dereference is not possible
    }
}

class FooCreator {
    ...
    void createFoo() {
        ...
        this.foo = new Foo(potentiallyNull); // Error: passing potentially null value to method
    }
}

In this case, the error is an FP - the type Foo checks its parameters for nullness. This again is similar to the previous point, it might be too expensive for Findbugs to follow method calls to check if nullness is checked for. But if the system can learn that the type Foo always checks for nullness, it can stop reporting errors. Or, when the Foo constructor is called, the system can learn to trigger a specific bug detector that will follow the control, knowing that checking a call to Foo is not overly expensive.

Type of Error Reported by Findbugs
The last possibility for supervised learning is slightly simpler. An example of how a machine could learn how to handle certain errors is to do with a class of errors reported by Findbugs: 'bad style' errors. But a developer may have a justifiable reason for ignoring these errors in certain situations. Although a blanket ignore-list for error types is available in Findbugs (I think, note to self: double check), the developer may not want to ignore it everywhere. The system could be used to recognise where a certain error won't be accepted by the developer and begin to remove these.

Learning to reduce FPs by only one of these factors is likely to be quite dangerous. In order to do learn effectively, without removing true errors, it's likely a combination of the location, the code and the type of error would need to be combined.

There will almost certainly be more information which is both available and useful at the point where a system is learning, and these will hopefully become apparent soon. Also, the hypothetical examples here are not necessarily representative of the errors Findbugs actually reports, as I just made them up. Further usage and inspection of Findbugs is will hopefully shed more light on this.

It is my uninformed opinion that supervised learning of the kind described here is likely to be frought with peril, but maybe, unlike the Hoff, I'm just hiding in the darkness, afraid to come into the light.

Wednesday, 21 October 2009

LaTeX vs. Open Office 3

One decision that could have a major impact on my productivity during the final year project is the choice of authoring software for the report.

For the last major report I wrote I used LaTeX, with Kile to edit and quick 'n' dirty bash scripts to build. The result looked quite nice (if I do say so myself), and while I'm glad I used LaTeX, it was quite a painful learning curve at times.

For the final year project I'm at a stage where I'm considering whether to use LaTeX again, or go with Open Office 3. Running Kubuntu 8.04 as my primary environment rules out Microsoft Word. I thought listing the pros and cons, as well as MoSCoW'ing the features would be a fruitful endeavor.

LaTeX
Pros

since the entire document, including structure and markup, is written directly into the latex source, the files are quite version control friendly
the output is very nice
the writing environment can be very lean, Kile requires less resource to run than Open Office (though this is an assumption based on no facts at all). Not that I ever did, but I could edit the document through a command terminal if I so desired
organising references, with BibTeX, is very easy
because there's a "build phase" with the document, changes to diagrams and such like are picked up on the next build, without having to refresh through the document somehow
comments could be left in the source, which was useful for leaving TODO notes which could be easily grep'd and counted/located
some features which would be incredibly tedious to do manually, such as creating a table of contents, list of figures, alphabetised glossary and references, and an index, is quite straight forward

Cons

there is a build phase while the LaTeX source is built and converted to PDF. Upon nearing completion of the report mentioned earlier, when it was over 15,000 words and 50 pages, the build phase took about 20 seconds. Doesn't seem like much, but when you're feeling your way around new commands it's a definite pain. I guess this could be split up into smaller sections but I'm not sure how well that works
it's not WYSIWYG, meaning I had to trigger a build to see how something looked
not very portable - Open Office is installed on most of the machines I would use, whereas I needed certain LaTeX packages which were not part of the standard distribution. This would mean I couldn't build a PDF on university lab machines, for example.

MoSCoW Requirements
Some of the pros and cons listed are not really things I'm too concerned about. Prioritising my requirements of Open Office will help me better decide how to find out which one to choose for this project.
Must

allow automatic generation of the following: table of contents; bibliography.
allow automatic section, page and figure referencing
have an easy way to manage and cite references
allow exporting to PDF
be able to handle large documents, I think the report is going to end up around 30,000 words or so

Should

be able to 'link' to files for things like graphics, or data sources, so that they changes will be reflected in the report
allow leaving comments, or todo's in the document, which are ignored when printed/PDF is generated
allow splitting up the documents to be able to work on smaller parts at a time, to improve performance

Could

allow automatic generation of a list of figures
allow generating a PDF from the command line, so I don't need to run Open Office if all I've done is modify a graphic (would depend on 'linking')

Won't need (but Would be nice)

a diff view which would highlight the difference between versions
a plugin which offered SVN integration (one click commit, decent editor for commit message)

From what I know of Open Office, I think I could tick off most of the requirements as marked, and with the added benefit of portability (in my circumstances), it does have something which it can hold over LaTeX.

With just a little more research I should be able to make my mind up...

Project Overview

A direct excerpt from my project specification (short-titled "Enhancing Findbugs").

Overview
Findbugs is a static analyser which can detect errors in Java programs. Findbugs can find a large number of potential errors, however, they often include false positives. The goal of this project is to enhance Findbugs by reducing the number of false positives it reports, but not at the expense of any true errors.

The aim of this project is to provide a system which will reduce the number of false positives reported by Findbugs. This will include an investigation of a number of potential strategies, for example, machine learning techniques, and implementing at least one of the possible strategies identified.

This is an experimentation with significant software development-based project, and aims to have a working system when the time comes to submit. The experimentation comes in discovering the best way to reduce the false positives, the one example mentioned, machine learning, being one of many. I'll use a following entry to discuss some brainstormed ideas for approaching the problem.

Graham Allan's Final Year Project (B)Log Book