Vulnerability Dataset Denoising
1.0.0
我们的研究揭示了用于源代码漏洞检测任务的现有数据集中持续存在的错误标签问题。我们强调有必要构建使用可靠技术收集的高质量数据集。在这里,我们提供了论文中描述的模型的实现,包括 DeepWuKong、SySeVr、VulDeePecker 和两种相应的去噪方法(CL 和 DT)。下面还列出了我们使用的数据集。
配置:
config files for deep learning models. In this work, we just use deepwukong.yaml, silver.yaml, and vuldeepecker.yaml.
型号:
code files for deep learning models.
准备数据:
util files that prepare data for FFmpeg+qumu.
工具:
program slice util files.
实用程序:
commonly used functions.
自信的学习.py :
entrance of confident learning.
差异训练.py :
entrance of differential training.
dwk_train.py :
entrance of training deepwukong.
sys_train.py :
entrance of training sysevr.
vdp_train.py :
entrance of training vuldeepecker.
scrd_crawl.py :
code for crawling sard dataset.
可以通过脚本从SARD官网爬取漏洞数据:
python sard_crawl.py
You can download it via this link.
Xu Nie, Ningke Li, Kailong Wang, Shangguang Wang, Xiapu Luo, and Haoyu Wang. 2023. Understanding and Tackling Label Errors in Deep Learning-based Vulnerability Detection. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’23), July 17–21, 2023, Seattle, WA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3597926.3598037