An Improved Transformer-Based Model for Software Defect Prediction

Abstract

With the increasing scale and complexity of software, software quality issues have become a focal point of attention. Software defects, as the antithesis of software quality, pose significant threats to its reliability. Identifying defect-prone modules in the early stages of software development has emerged as a critical challenge. This study presents DP-TFusion, a novel cross-version software defect prediction method based on the Transformer architecture, which innovatively integrates Abstract Syntax Tree (AST) sequence features with traditional software metrics. To address the feature distribution shift problem in cross-version scenarios, the model employs feature fusion strategies. Experiments conducted on multiple historical versions of three representative projects from the Apache Software Foundation demonstrate that DP-TFusion achieves significant improvements in key metrics including Precision, Recall, F1-score, and AUC compared to traditional CNN and LSTM models. Notably, the method exhibits relatively stronger robustness and adaptability when handling significant version differences. The experimental results validate the effectiveness and practical value of this approach in cross-version software defect prediction tasks.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright (c) 2025 Ziyang Liu