Matheus Paixão, J. Krinke, Donggyun Han, Mark Harman
Code review has been widely adopted by both industrial and open source software development communities. Research in code review is highly dependant on real-world data, and although existing researchers have attempted to provide code review datasets, there is still no dataset that links code reviews with complete versions of the system's code base mainly because reviewed versions are not kept in the system's version control repository. Thus, we present CROP, the Code Review Open Platform, the first curated code review repository that links review data with isolated complete versions (snapshots) of the source code at the time of review. CROP currently provides data for 8 software systems, 48,975 reviews and 112,617 patches, including versions of the systems that are inaccessible in the systems' original repositories. Moreover, CROP is extensible, and it will be continuously curated and extended.
{"title":"CROP","authors":"Matheus Paixão, J. Krinke, Donggyun Han, Mark Harman","doi":"10.1145/3196398.3196466","DOIUrl":"https://doi.org/10.1145/3196398.3196466","url":null,"abstract":"Code review has been widely adopted by both industrial and open source software development communities. Research in code review is highly dependant on real-world data, and although existing researchers have attempted to provide code review datasets, there is still no dataset that links code reviews with complete versions of the system's code base mainly because reviewed versions are not kept in the system's version control repository. Thus, we present CROP, the Code Review Open Platform, the first curated code review repository that links review data with isolated complete versions (snapshots) of the source code at the time of review. CROP currently provides data for 8 software systems, 48,975 reviews and 112,617 patches, including versions of the systems that are inaccessible in the systems' original repositories. Moreover, CROP is extensible, and it will be continuously curated and extended.","PeriodicalId":309559,"journal":{"name":"Proceedings of the 15th International Conference on Mining Software Repositories","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131675823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Race detection is increasingly popular, both in the academic research and in industrial practice. However, there is no specialized and comprehensive dataset of the data race, making it difficult to achieve the purpose of effectively evaluating race detectors or developing efficient race detection algorithms. In this paper, we presented JBench, a dataset with a total number of 985 data races from real-world applications and academic artifacts. We pointed out the locations of data races, provided source code, provided running commands and standardized storage structure. We also analyzed all the data races and classified them from four aspects: variable type, code structure, method span and cause. Furthermore, we discussed usages of the dataset in two scenarios: optimize race detection techniques and extract concurrency patterns.
{"title":"Jbench","authors":"Jian Gao, Xin Yang, Yu Jiang, Han Liu, Weiliang Ying, Xian Zhang","doi":"10.1145/3196398.3196451","DOIUrl":"https://doi.org/10.1145/3196398.3196451","url":null,"abstract":"Race detection is increasingly popular, both in the academic research and in industrial practice. However, there is no specialized and comprehensive dataset of the data race, making it difficult to achieve the purpose of effectively evaluating race detectors or developing efficient race detection algorithms. In this paper, we presented JBench, a dataset with a total number of 985 data races from real-world applications and academic artifacts. We pointed out the locations of data races, provided source code, provided running commands and standardized storage structure. We also analyzed all the data races and classified them from four aspects: variable type, code structure, method span and cause. Furthermore, we discussed usages of the dataset in two scenarios: optimize race detection techniques and extract concurrency patterns.","PeriodicalId":309559,"journal":{"name":"Proceedings of the 15th International Conference on Mining Software Repositories","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126003848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susan clasped her hands calmly in front of her and listened to the meditative elevator sounds. Five minutes to go. She could see the Earth through the viewport and felt the increasing tug of gravity. Her thoughts focused on how the nanites would turn years of low-paid research into gold for her. White gold. A milliliter of nanites exchanged for fifty bars of platinum in a numbered Swiss bank vault.
{"title":"CLEVER","authors":"Mathieu Nayrolles, A. Hamou-Lhadj","doi":"10.1145/3196398.3196438","DOIUrl":"https://doi.org/10.1145/3196398.3196438","url":null,"abstract":"Susan clasped her hands calmly in front of her and listened to the meditative elevator sounds. Five minutes to go. She could see the Earth through the viewport and felt the increasing tug of gravity. Her thoughts focused on how the nanites would turn years of low-paid research into gold for her. White gold. A milliliter of nanites exchanged for fifty bars of platinum in a numbered Swiss bank vault.","PeriodicalId":309559,"journal":{"name":"Proceedings of the 15th International Conference on Mining Software Repositories","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115370668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. A. Sánchez, Konstantinos Barmpis, Patrick Neubauer, R. Paige, D. Kolovos
Mining data from remote repositories, such as GitHub and StackExchange, involves the execution of requests that can easily reach the limitations imposed by the respective APIs to shield their services from overload and abuse. Therefore, data mining clients are left alone to deal with such protective service policies which usually involves an extensive amount of manual implementation effort. In this work we present RestMule, a framework for handling various service policies, such as limited number of requests within a period of time and multi-page responses, by generating resilient clients that are able to handle request rate limits, network failures, response caching, and paging in a graceful and transparent manner. As a result, RestMule clients generated from OpenAPI specifications (i.e. standardized REST API descriptors), are suitable for intensive data-fetching scenarios. We evaluate our framework by reproducing an existing repository mining use case and comparing the results produced by employing a popular hand-written client and a RestMule client.
{"title":"Restmule","authors":"B. A. Sánchez, Konstantinos Barmpis, Patrick Neubauer, R. Paige, D. Kolovos","doi":"10.1145/3196398.3196405","DOIUrl":"https://doi.org/10.1145/3196398.3196405","url":null,"abstract":"Mining data from remote repositories, such as GitHub and StackExchange, involves the execution of requests that can easily reach the limitations imposed by the respective APIs to shield their services from overload and abuse. Therefore, data mining clients are left alone to deal with such protective service policies which usually involves an extensive amount of manual implementation effort. In this work we present RestMule, a framework for handling various service policies, such as limited number of requests within a period of time and multi-page responses, by generating resilient clients that are able to handle request rate limits, network failures, response caching, and paging in a graceful and transparent manner. As a result, RestMule clients generated from OpenAPI specifications (i.e. standardized REST API descriptors), are suitable for intensive data-fetching scenarios. We evaluate our framework by reproducing an existing repository mining use case and comparing the results produced by employing a popular hand-written client and a RestMule client.","PeriodicalId":309559,"journal":{"name":"Proceedings of the 15th International Conference on Mining Software Repositories","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123730145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 15th International Conference on Mining Software Repositories","authors":"Jesus M. Gonzalez-Barahona, Abram Hindle, Lin Tan","doi":"10.1145/3196398","DOIUrl":"https://doi.org/10.1145/3196398","url":null,"abstract":"","PeriodicalId":309559,"journal":{"name":"Proceedings of the 15th International Conference on Mining Software Repositories","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131778848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}