Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang
{"title":"JavaVFC:开源软件的 Java 漏洞修复承诺","authors":"Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang","doi":"arxiv-2409.05576","DOIUrl":null,"url":null,"abstract":"We present a comprehensive dataset of Java vulnerability-fixing commits\n(VFCs) to advance research in Java vulnerability analysis. Our dataset, derived\nfrom thousands of open-source Java projects on GitHub, comprises two variants:\nJavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous\nprocess involving heuristic rules and multiple rounds of manual labeling. We\ninitially used keywords to filter candidate VFCs based on commit messages, then\nrefined this keyword set through iterative manual labeling. The final labeling\nround achieved a precision score of 0.7 among three annotators. We applied the\nrefined keyword set to 34,321 open-source Java repositories with over 50 GitHub\nstars, resulting in JavaVFC with 784 manually verified VFCs and\nJavaVFC-extended with 16,837 automatically identified VFCs. Both variants are\npresented in a standardized JSONL format for easy access and analysis. This\ndataset supports various research endeavors, including VFC identification,\nfine-grained vulnerability detection, and automated vulnerability repair. The\nJavaVFC and JavaVFC-extended are publicly available at\nhttps://zenodo.org/records/13731781.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"JavaVFC: Java Vulnerability Fixing Commits from Open-source Software\",\"authors\":\"Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang\",\"doi\":\"arxiv-2409.05576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a comprehensive dataset of Java vulnerability-fixing commits\\n(VFCs) to advance research in Java vulnerability analysis. Our dataset, derived\\nfrom thousands of open-source Java projects on GitHub, comprises two variants:\\nJavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous\\nprocess involving heuristic rules and multiple rounds of manual labeling. We\\ninitially used keywords to filter candidate VFCs based on commit messages, then\\nrefined this keyword set through iterative manual labeling. The final labeling\\nround achieved a precision score of 0.7 among three annotators. We applied the\\nrefined keyword set to 34,321 open-source Java repositories with over 50 GitHub\\nstars, resulting in JavaVFC with 784 manually verified VFCs and\\nJavaVFC-extended with 16,837 automatically identified VFCs. Both variants are\\npresented in a standardized JSONL format for easy access and analysis. This\\ndataset supports various research endeavors, including VFC identification,\\nfine-grained vulnerability detection, and automated vulnerability repair. The\\nJavaVFC and JavaVFC-extended are publicly available at\\nhttps://zenodo.org/records/13731781.\",\"PeriodicalId\":501278,\"journal\":{\"name\":\"arXiv - CS - Software Engineering\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
JavaVFC: Java Vulnerability Fixing Commits from Open-source Software
We present a comprehensive dataset of Java vulnerability-fixing commits
(VFCs) to advance research in Java vulnerability analysis. Our dataset, derived
from thousands of open-source Java projects on GitHub, comprises two variants:
JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous
process involving heuristic rules and multiple rounds of manual labeling. We
initially used keywords to filter candidate VFCs based on commit messages, then
refined this keyword set through iterative manual labeling. The final labeling
round achieved a precision score of 0.7 among three annotators. We applied the
refined keyword set to 34,321 open-source Java repositories with over 50 GitHub
stars, resulting in JavaVFC with 784 manually verified VFCs and
JavaVFC-extended with 16,837 automatically identified VFCs. Both variants are
presented in a standardized JSONL format for easy access and analysis. This
dataset supports various research endeavors, including VFC identification,
fine-grained vulnerability detection, and automated vulnerability repair. The
JavaVFC and JavaVFC-extended are publicly available at
https://zenodo.org/records/13731781.