Graham Allan's Final Year Project (B)Log Book: Is it possible to classify all false positives? Or "One Project's False Positive is Another Project's Showstopper"

As part of my specification and plan for this project, I included the objective:

Identifying, classifying and quantifying a representative (rather than comprehensive) set of false positives reported by Findbugs

I'm now starting to see that this is a fruitless exercise.

There could be several techniques for achieving this: grokking an open-source project, running Findbugs over the codebase and finding the false positives; trying to create the code that will expose false positives; or doing an analysis of the Findbugs bug-tracker (which has the 'False Positive' category). So in theory, it certainly would be possible do identify, classify and quantify false positives, but the problem in this case is that the results would either be project-specific, or bugs in the Findbugs code itself.

So how does this relate to the project? Well, an important factor in the success of a system that enhances Findbugs listed in the specification is:

... additions and modifications to Findbugs should not require changes to the system in order for it to function. While being forced to track major changes to Findbugs may be excusable, the system should not require modification every time a new bug pattern detector is added to Findbugs.

Identifying false positives at a certain point in time is surely chasing the wrong prey. Instead, it would be more fruitful to consider that any bug report can be a false positive, depending on the context. For example, consider the following code:

  public String uh-oh(Object whoopsie) {
    ...
    if(whoopsie == null) {
      return whoopsie.toString();  // guaranteed NullPointerException    
    }
    ...
  }

This is an example of a guaranteed dereference of a null pointer. Findbugs will kindly report this for us. However, throwing an NPE may be part of the contract of the method, albeit implemented in a strange way. This would make the report a false positive in this context. Given this scenario, should the Findbugs null dereference detector be considered as generating a false positive? Definitely not - this would be a valid report in 99% of cases. But it illustrates that what constitues a false positive depends on the context. This is an example of a single detector, but any bug report could be a false positive. If the code can never be reached, all bets are off. By definition any bug reported will be a false positive.

So what does this mean for the project? Well, it means that trying to reduce false positives solely based on the bug detectors available in Findbugs is going to be a broken design. And when you consider that Findbugs has a plugin architecture for its detectors, allowing projects to add their own, this becomes even clearer.

If false positives are highly dependent on context, then building that context is what the system should be concerned with. In a previous post I talked about using the filter file to build a context, or conducting supervised learning as part of a triage session. At the time of writing this, following these avenues are clearly more desirable than trying to identify, classify and quantify the false positives reported by Findbugs.

To put it simpler, one project's false positive is another project's showstopper. The job of this system should be to get to know the project.

Graham Allan's Final Year Project (B)Log Book

Friday, 6 November 2009

Is it possible to classify all false positives? Or "One Project's False Positive is Another Project's Showstopper"

No comments:

Post a Comment

Followers

Blog Archive

About Me