Context:
Security patch identification is an important task in continuous integration and deployment, which helps software developers detect security issues and code vulnerabilities. Recent studies have confirmed that using both commit message and code diff information are beneficial to identification performance. However, existing works still face the problems of poor model representation ability and low model robustness, both of which affect the quality of commit representation, resulting in bad identification performance.
Objective:
We propose a gated transformer network for multivariate security patch identification with mixture-of-experts.
Method:
To improve the representation capability of the model and the quality of the commit representations, we provided a bi-encoder to utilize prior knowledge to enhance distinctive features for commit message and code diff respectively. To improve the robustness of the model and further improve the quality of commit representations, we designed a gated layer to learn the weight of each expert, and dynamically assign weights to different features.
Results:
Extensive experiments show that our framework has effectively improved the model representation ability, and the robustness of the model, providing high-quality commit representations, and achieves the state-of-the-art performance.
Conclusion:
Our approach provides a bi-encoder to obtain the embedding of each feature by two experts, and then explore the difference between them, by setting different weights through the gated layer. It not only improves the model representation ability but also improves the robustness of the model, thus having favorable applicability in real-world scenarios. The code and data are shared in https://github.com/AppleMax1992/ensemble_commit.
扫码关注我们
求助内容:
应助结果提醒方式:
