2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)最新文献

英文中文

Two Approaches to Survival Analysis of Open Source Python Projects 开源Python项目生存分析的两种方法

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2022-03-15 DOI: 10.1145/3524610.3527871

Derek Robinson, Keanelek Enns, Neha Koulecar, Manish Sihag

A recent study applied frequentist survival analysis methods to a subset of the Software Heritage Graph and determined which at-tributes of an open source software project contribute to its health. This paper serves as an exact replication of that study. In addition, Bayesian survival analysis methods were applied to the same dataset, and an additional project attribute was studied to serve as a conceptual replication. Both analyses focus on the effects of certain attributes on the survival of open-source software projects as mea-sured by their revision activity. Methods such as the Kaplan-Meier estimator, Cox Proportional-Hazards model, and the visualization of posterior survival functions were used for each of the project attributes. The results show that projects which publish major re-leases, have repositories on multiple hosting services, possess a large team of developers, and make frequent revisions have a higher likelihood of survival in the long run. The findings were similar to the original study; however, a deeper look revealed quantitative inconsistencies.

最近的一项研究将频率生存分析方法应用于软件遗产图的一个子集，并确定开源软件项目的哪些属性有助于其健康发展。这篇论文是对那项研究的精确复制。此外，贝叶斯生存分析方法应用于同一数据集，并研究了一个额外的项目属性作为概念复制。这两种分析都集中在某些属性对开源软件项目生存的影响上，这些属性是通过它们的修订活动来衡量的。对每个项目属性使用Kaplan-Meier估计器、Cox比例风险模型和后验生存函数可视化等方法。结果表明，发布主要版本的项目，在多个托管服务上拥有存储库，拥有大型开发人员团队，并且经常进行修订，从长远来看有更高的生存可能性。研究结果与最初的研究相似;然而，更深入的研究揭示了数量上的不一致。

引用次数: 2

Clone-based code method usage pattern mining 基于克隆的代码方法使用模式挖掘

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2021-09-27 DOI: 10.1145/3524610.3527880

Zhipeng Xue, Yuanliang Zhang, Rulin Xu

When programmers retrieve a code method and want to reuse it, they need to understand the usage patterns of the retrieved method. However, it is difficult to obtain usage information of the retrieved method since this method may only have a brief comment and few available usage examples. In this paper, we propose an approach, called LUPIN (cLone-based Usage Pattern mIniNg), to mine the usage patterns of these methods, which do not widely appeared in the code repository. The key idea of LUPIN is that the cloned code of the target method may have a similar usage pattern, and we can collect more usage information of the target method from cloned code usage examples. From the amplified usage examples, we mine the usage pattern of the target method by frequent subsequence mining after program slicing and code normalization. Our evaluation shows that LUPIN can mine four categories of usage patterns with an average precision of 0.65.

当程序员检索代码方法并希望重用它时，他们需要了解所检索方法的使用模式。然而，很难获得所检索方法的使用信息，因为该方法可能只有简短的注释和很少可用的使用示例。在本文中，我们提出了一种称为LUPIN(基于克隆的使用模式挖掘)的方法来挖掘这些方法的使用模式，这些方法在代码库中没有广泛出现。LUPIN的核心思想是目标方法的克隆代码可能具有相似的使用模式，我们可以从克隆代码的使用示例中收集更多的目标方法的使用信息。从放大的使用实例中，通过程序切片和代码归一化后的频繁子挖掘，挖掘出目标方法的使用模式。我们的评估表明，LUPIN可以挖掘四类使用模式，平均精度为0.65。

引用次数: 1

GitQ- Towards Using Badges as Visual Cues for GitHub Projects 使用徽章作为GitHub项目的视觉提示

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2021-07-08 DOI: 10.1145/3524610.3527876

Akhila Sri Manasa Venigalla, Kowndinya Boyalakunta, S. Chimalakonda

GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However, developers willing to contribute to projects on GitHub often find it challenging to select appropriate projects to contribute to or reuse due to the large number of repositories present on GitHub. Further, obtaining this required information often becomes a tedious process, as one has to carefully mine information hidden inside the repository. To alleviate the effort intensive mining procedures, researchers have proposed npmbadges to outline information relating to build status of a project. However, these badges are static and limit their usage to package dependency and build details. Adding visual cues such as badges, to the repositories might reduce the search space for developers. Hence, we present GitQ, to auto-matically augment GitHub repositories with badges representing information about source code and project maintenance. Presenting GitQ as a browser plugin to GitHub could make it easily accessible to developers using GitHub. GitQ is evaluated with 15 developers based on the UTAUT model to understand developer perception towards its usefulness. We observed that 11 out of 15 developers perceived GitQ to be useful in identifying the right set of reposi-tories using visual cues such as generated by GitQ. The source code and tool are available for download on GitHub at https://github.com/gitq-for-github/plugin, and the demo can be found at https://youtu.be/c0yohmIat3A.

GitHub拥有数百万个软件存储库，方便开发人员以多种方式为许多项目做出贡献。关于存储库的大多数信息都是基于文本的，以星图、分叉、提交等形式出现。然而，愿意在GitHub上贡献项目的开发人员经常发现，由于GitHub上有大量的存储库，选择合适的项目来贡献或重用是一项挑战。此外，获得所需的信息通常是一个繁琐的过程，因为必须仔细挖掘存储库中隐藏的信息。为了减轻挖掘过程的工作量，研究人员提出了npmbadge来概述与项目构建状态相关的信息。然而，这些徽章是静态的，并且限制了它们对包依赖关系和构建细节的使用。向存储库添加诸如徽章之类的视觉提示可能会减少开发人员的搜索空间。因此，我们提出了GitQ，以自动增加GitHub存储库，其中包含表示源代码和项目维护信息的徽章。将GitQ作为浏览器插件呈现给GitHub，可以使使用GitHub的开发人员更容易访问它。GitQ由15名开发人员基于UTAUT模型进行评估，以了解开发人员对其有用性的看法。我们观察到，15个开发人员中有11个认为GitQ在使用可视化提示(比如由GitQ生成的)识别正确的存储库集合方面很有用。源代码和工具可以在GitHub上下载https://github.com/gitq-for-github/plugin, demo可以在https://youtu.be/c0yohmIat3A上找到。

{"title":"GitQ- Towards Using Badges as Visual Cues for GitHub Projects","authors":"Akhila Sri Manasa Venigalla, Kowndinya Boyalakunta, S. Chimalakonda","doi":"10.1145/3524610.3527876","DOIUrl":"https://doi.org/10.1145/3524610.3527876","url":null,"abstract":"GitHub hosts millions of software repositories, facilitating developers to contribute to many projects in multiple ways. Most of the information about the repositories is text-based in the form of stars, forks, commits, and so on. However, developers willing to contribute to projects on GitHub often find it challenging to select appropriate projects to contribute to or reuse due to the large number of repositories present on GitHub. Further, obtaining this required information often becomes a tedious process, as one has to carefully mine information hidden inside the repository. To alleviate the effort intensive mining procedures, researchers have proposed npmbadges to outline information relating to build status of a project. However, these badges are static and limit their usage to package dependency and build details. Adding visual cues such as badges, to the repositories might reduce the search space for developers. Hence, we present GitQ, to auto-matically augment GitHub repositories with badges representing information about source code and project maintenance. Presenting GitQ as a browser plugin to GitHub could make it easily accessible to developers using GitHub. GitQ is evaluated with 15 developers based on the UTAUT model to understand developer perception towards its usefulness. We observed that 11 out of 15 developers perceived GitQ to be useful in identifying the right set of reposi-tories using visual cues such as generated by GitQ. The source code and tool are available for download on GitHub at https://github.com/gitq-for-github/plugin, and the demo can be found at https://youtu.be/c0yohmIat3A.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning to Represent Programs with Heterogeneous Graphs 学习用异构图表示程序

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-12-08 DOI: 10.1145/3524610.3527905

Wenhan Wang, Kechi Zhang, Ge Li, Zhi Jin

Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and the AST-augmented graph of the program contain much structural and semantic information, and most existing studies apply them for code representation. The graph adopted by existing approaches is homogeneous, i.e., it discards the type information of the edges and the nodes lying within AST. That may cause plausible obstruction to the representation model. In this paper, we propose to leverage the type information in the graph for code representation. To be specific, we propose the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly. Furthermore, we employ the heterogeneous graph transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing. With the additional types in HPG, our approach can capture complex structural information, produce accurate and delicate representations, and finally perform well on certain tasks. Our in-depth evaluations upon four classic datasets for two typical tasks (i.e., method name prediction and code classification) demonstrate that the heterogeneous types in HPG benefit the representation models. Our proposed $text{HPG}+text{HGT}$ also outperforms the SOTA baselines on the subject tasks and datasets.

代码表示是将程序转换为具有语义的向量的方法，是源代码处理的关键。近年来，我们已经见证了将结构信息(即图形)合并到代码表示中的有效性。具体来说，程序的抽象语法树(AST)和AST增强图包含了大量的结构和语义信息，现有的研究大多将它们用于代码表示。现有方法采用的图是同构的，即丢弃了AST内的边和节点的类型信息，这可能会对表示模型造成貌似合理的阻碍。在本文中，我们建议利用图中的类型信息来表示代码。具体来说，我们提出了异构规划图(HPG)，它明确地提供了节点和边的类型。此外，考虑到处理过程中信息的类型，我们采用异构图转换器(HGT)架构来生成基于HPG的表示。利用HPG中的附加类型，我们的方法可以捕获复杂的结构信息，产生准确而精细的表示，并最终在某些任务上表现良好。我们对两个典型任务(即方法名称预测和代码分类)的四个经典数据集进行了深入评估，结果表明HPG中的异构类型有利于表示模型。我们提出的$text{HPG}+text{HGT}$在主题任务和数据集上也优于SOTA基线。

{"title":"Learning to Represent Programs with Heterogeneous Graphs","authors":"Wenhan Wang, Kechi Zhang, Ge Li, Zhi Jin","doi":"10.1145/3524610.3527905","DOIUrl":"https://doi.org/10.1145/3524610.3527905","url":null,"abstract":"Code representation, which transforms programs into vectors with semantics, is essential for source code processing. We have witnessed the effectiveness of incorporating structural information (i.e., graph) into code representations in recent years. Specifically, the abstract syntax tree (AST) and the AST-augmented graph of the program contain much structural and semantic information, and most existing studies apply them for code representation. The graph adopted by existing approaches is homogeneous, i.e., it discards the type information of the edges and the nodes lying within AST. That may cause plausible obstruction to the representation model. In this paper, we propose to leverage the type information in the graph for code representation. To be specific, we propose the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly. Furthermore, we employ the heterogeneous graph transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing. With the additional types in HPG, our approach can capture complex structural information, produce accurate and delicate representations, and finally perform well on certain tasks. Our in-depth evaluations upon four classic datasets for two typical tasks (i.e., method name prediction and code classification) demonstrate that the heterogeneous types in HPG benefit the representation models. Our proposed $text{HPG}+text{HGT}$ also outperforms the SOTA baselines on the subject tasks and datasets.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123626704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Semantic Similarity Metrics for Evaluating Source Code Summarization 评估源代码摘要的语义相似度度量

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

Pub Date : 2017-05-10 DOI: 10.1145/NNNNNNN.NNNNNNN

Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, D. Parkes

Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. Current work is almost all based on machine models trained via big data input. Large datasets of examples of code and summaries of that code are used to train an e.g. encoder-decoder neural model. Then the output predictions of the model are evaluated against a set of reference summaries. The input is code not seen by the model, and the prediction is compared to a reference. The means by which a prediction is compared to a reference is essentially word overlap, calculated via a metric such as BLEU or ROUGE. The problem with using word overlap is that not all words in a sentence have the same importance, and many words have synonyms. The result is that calculated similarity may not match the perceived similarity by human readers. In this paper, we conduct an experiment to measure the degree to which various word overlap metrics correlate to human-rated similarity of predicted and reference summaries. We evaluate alternatives based on current work in semantic similarity metrics and propose recommendations for evaluation of source code summarization.

源代码摘要包括用自然语言创建源代码的简要描述。这些描述是软件文档(如JavaDocs)的关键组件。自动代码摘要是软件工程研究的一个重要目标，因为它对程序员来说具有很高的价值，同时手工编写和维护文档的成本也很高。目前的工作几乎都是基于通过大数据输入训练的机器模型。代码示例和代码摘要的大型数据集用于训练例如编码器-解码器神经模型。然后根据一组参考摘要评估模型的输出预测。输入是模型看不到的代码，预测将与参考进行比较。将预测与参考进行比较的方法本质上是单词重叠，通过BLEU或ROUGE等度量来计算。使用单词重叠的问题是，并不是一个句子中所有的单词都具有相同的重要性，而且许多单词都有同义词。结果是计算出的相似度可能与人类读者感知到的相似度不匹配。在本文中，我们进行了一项实验来衡量各种单词重叠度量与预测摘要和参考摘要的人类评级相似度的关联程度。我们基于语义相似度度量的当前工作评估替代方案，并提出评估源代码摘要的建议。

{"title":"Semantic Similarity Metrics for Evaluating Source Code Summarization","authors":"Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, D. Parkes","doi":"10.1145/NNNNNNN.NNNNNNN","DOIUrl":"https://doi.org/10.1145/NNNNNNN.NNNNNNN","url":null,"abstract":"Source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. Current work is almost all based on machine models trained via big data input. Large datasets of examples of code and summaries of that code are used to train an e.g. encoder-decoder neural model. Then the output predictions of the model are evaluated against a set of reference summaries. The input is code not seen by the model, and the prediction is compared to a reference. The means by which a prediction is compared to a reference is essentially word overlap, calculated via a metric such as BLEU or ROUGE. The problem with using word overlap is that not all words in a sentence have the same importance, and many words have synonyms. The result is that calculated similarity may not match the perceived similarity by human readers. In this paper, we conduct an experiment to measure the degree to which various word overlap metrics correlate to human-rated similarity of predicted and reference summaries. We evaluate alternatives based on current work in semantic similarity metrics and propose recommendations for evaluation of source code summarization.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126094041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀