处理细粒度源代码更改的大型数据集

2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2019-09-01 DOI:10.1109/ICSME.2019.00064

S. Levin, A. Yehudai

{"title":"处理细粒度源代码更改的大型数据集","authors":"S. Levin, A. Yehudai","doi":"10.1109/ICSME.2019.00064","DOIUrl":null,"url":null,"abstract":"In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"275 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Processing Large Datasets of Fined Grained Source Code Changes\",\"authors\":\"S. Levin, A. Yehudai\",\"doi\":\"10.1109/ICSME.2019.00064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.\",\"PeriodicalId\":106748,\"journal\":{\"name\":\"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"275 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME.2019.00064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2019.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在大代码时代，当研究人员试图研究越来越多的存储库来支持他们的发现时，数据处理阶段可能需要操纵数百万甚至更多的记录。在这项工作中，我们专注于涉及细粒度AST级别源代码更改的研究。我们介绍了如何用数据操作功能扩展CodeDistillery源代码挖掘框架，旨在减轻处理细粒度源代码更改的大型数据集的工作量。我们介绍的功能允许研究人员高度自动化他们的存储库挖掘过程，并简化数据获取和处理阶段。这些功能已经被成功地用于进行大量的研究，在此过程中，已经处理了数千万个细粒度的源代码更改。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Processing Large Datasets of Fined Grained Source Code Changes

In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量

期刊最新文献

Same App, Different Countries: A Preliminary User Reviews Study on Most Downloaded iOS Apps Towards Better Understanding Developer Perception of Refactoring Decomposing God Classes at Siemens Self-Admitted Technical Debt Removal and Refactoring Actions: Co-Occurrence or More? A Validation Method of Self-Adaptive Strategy Based on POMDP