Multiclass Classification and Prioritisation of Static Analysis Warnings Using Developer-Labelled Industrial Data

Benedikt Fein
Vibhash Kumar Singh
Gordon Fraser

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automatic static code analysis tools are used to identify code quality issues like vulnerabilities or performance problems. In practice, the high number of irrelevant warnings created by such tools is problematic, but can be addressed by pre-filtering and ranking these warnings before they are shown to the developer. Since ground truth labelled data is rarely available, existing research tends to heuristically construct training data from unlabelled open-source data by assigning clearly separated binary categories like fixed and irrelevant to the warnings. However, this labelling approach cannot capture distinctions such as between relevant, but not yet fixed and not fixed, because irrelevant warnings. The Teamscale software developed by CQSE provides a unique opportunity to investigate this concern, since its developers have adopted the practice of meticulously labelling all static analysis warnings as either accepted, tolerated, or false-positive. Using this dataset, we adapt previously proposed models for a new multiclass classification task to evaluate both their ability to classify warnings in this new setting and their ability to prioritise important warnings. Our experiments show that especially the mentioned subtle difference between categories of unresolved warnings is more challenging for the models compared to a binary prediction. Nevertheless, training the models for the multiclass task rather than the binary one results in a statistically significant improvement in the prioritisation of the warnings.

Version published to 10.21203/rs.3.rs-6469376/v1 on Research Square
Apr 30, 2025

AI-Powered Defect Prediction: From Code Smells to Failure Forecasting

This article has 5 authors:
1. Md Mostafizur Rahman
2. Md Mostafijur Rahman
3. Maria Khatun Shuvra
4. Md Mashfiquer Rahman
5. Najmul Gony
This article has no evaluationsLatest version Jun 9, 2025
Automated CVE Severity Prediction Using Deep Learning and Explainable AI

This article has 3 authors:
1. Omar Yasin
2. Qasem Abu Al-Haija
3. Yousef AbuHour
This article has no evaluationsLatest version Jul 15, 2025
An AI-Powered Framework for Intelligent Log File Analysis

This article has 2 authors:
1. Prerna Lohar
2. Trupti Baraskar
This article has no evaluationsLatest version Jul 9, 2025

Listed in

Abstract

Article activity feed

Related articles

AI-Powered Defect Prediction: From Code Smells to Failure Forecasting

Automated CVE Severity Prediction Using Deep Learning and Explainable AI

An AI-Powered Framework for Intelligent Log File Analysis