Saturday 24 October 2009

Brainstorming techniques for Enhancing Findbugs - Crowdsourcing

One possible technique for removing the number of false positives (FPs) reported by Findbugs is crowdsourcing. This technique is completely different from the supervised learning[todo:link] described earlier. The technique of crowdsourcing involves combining the efforts of a large group of users (supervised learning does not necessarily have to be monogomous, that was just the scenario previously described). The actions required from users tend to be small, quick, unobtrusive and low in number. This is key difference from a supervised learning approach, where a single user's input would be collected over time, requiring a greater investment.

Using a crowdsourcing technique to reduce FPs would probably involve distributing an ignore-list for Findbugs. If memory serves, this is already a feature of Findbugs, though I don't know the file format, or its flexibility. When a user comes across a false positive, they would simply mark it as such. The rest of the developers on the project would then share the ignore-list and the FP would no longer be reported by their instance of Findbugs. The mechanism for achieving this which springs to mind is simple - commit the ignore-list to version control. Is it likely a team will be using Findbugs if they are not employing one of the first requirements for team development?

However, if a team is not using version control, there are alternatives. The Findbugs system could be modified to submit the ignore action to a centralised server when performed, and retrieve the ignore-list on each run. In this case the centralised server does not need to be anything complicated - a web server would suffice, the persistance layer would only need to be the ignore-list file.

There are some benefits to this system:
  • Simplicity - the concept is incredibly simple. The mapping between user action and the FPs ignored is 1:1. The system does no learning to use one action to ignore many FPs. Not only does this simplify development, it would lower the cognitive load on developers.
  • Safety - the system will not neglect any true errors. Users may set the system to ignore a true error, but as far as I know, that's a risk with Findbugs as it is anyway. But at least in that case the user is consciously making the decision to ignore the error, rather than the system doing it 'behind their back'.

There are also some obvious and unsurmountable disadvantages:
  • Inherently unscaledownable (not a word I know, but I like it) - a single or low-user development team will not benefit from a crowdsourcing system. The only real advantage a system could offer for a single developer would be to provide a better interface for ignoring FPs. There may not be any problem with the current interface for doing this, I've yet to find this out.
  • Higher maintanence - each FP must be explicitly ignored. If crowdsourcing is the only mechanism used and there's no intelligence about the system and reducing the number of FPs would take more developer effort.

Despite these disadvantages, crowdsourcing is still something to consider. However, whether crowdsourcing is the chosen technique, it's likely whatever system is implemented, it will have to be team-aware.

No comments:

Post a Comment