Vulnerability Dataset Denoising
1.0.0
我們的研究揭示了用於原始碼漏洞檢測任務的現有資料集中持續存在的錯誤標籤問題。在這裡,我們提供了論文中描述的模型的實現,包括 DeepWuKong、SySeVr、VulDeePecker 和兩種相應的去噪方法(CL 和 DT)。下面也列出了我們使用的資料集。
配置:
config files for deep learning models. In this work, we just use deepwukong.yaml, silver.yaml, and vuldeepecker.yaml.
型號:
code files for deep learning models.
準備數據:
util files that prepare data for FFmpeg+qumu.
工具:
program slice util files.
實用程式:
commonly used functions.
自信的學習.py :
entrance of confident learning.
差異訓練.py :
entrance of differential training.
dwk_train.py :
entrance of training deepwukong.
sys_train.py :
entrance of training sysevr.
vdp_train.py :
entrance of training vuldeepecker.
scrd_crawl.py :
code for crawling sard dataset.
可以透過腳本從SARD官網爬取漏洞資料:
python sard_crawl.py
You can download it via this link.
Xu Nie, Ningke Li, Kailong Wang, Shangguang Wang, Xiapu Luo, and Haoyu Wang. 2023. Understanding and Tackling Label Errors in Deep Learning-based Vulnerability Detection. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’23), July 17–21, 2023, Seattle, WA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3597926.3598037