程序注释的分类与传播

2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2020-09-01 DOI:10.1145/3324884.3418913

Xiangzhe Xu

{"title":"程序注释的分类与传播","authors":"Xiangzhe Xu","doi":"10.1145/3324884.3418913","DOIUrl":null,"url":null,"abstract":"Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Classification and Propagation of Program Comments\",\"authors\":\"Xiangzhe Xu\",\"doi\":\"10.1145/3324884.3418913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.\",\"PeriodicalId\":106337,\"journal\":{\"name\":\"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3324884.3418913\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3418913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自然语言注释就像是人类逻辑和软件语义之间的桥梁。开发人员使用注释来描述代码片段的功能、实现和属性。这种连接包含丰富的信息，例如变量的潜在类型和方法的先决条件等。在本文中，我们对评论进行分类，并使用自然语言处理技术从中提取信息。基于编程语言的语义，为每个注释类别构建不同的规则，以便在代码实体之间系统地传播注释。然后我们使用传播的注释来检查代码的使用情况和注释的一致性。我们的演示系统在实际项目中发现了37个bug，其中30个已经被开发人员确认。除了代码中的bug，我们还发现了304条有缺陷的注释。其中12条具有误导性，292条不正确。此外，在我们传播的41573条注释中，有87条注释是针对既没有代码也没有注释的私有本机方法的。我们还进行了一项用户研究，发现传播评论在一致性、自然性和意义性三个维度上与人工撰写的评论一样好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Classification and Propagation of Program Comments

Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量

期刊最新文献

Towards Generating Thread-Safe Classes Automatically Anti-patterns for Java Automated Program Repair Tools Automating Just-In-Time Comment Updating Synthesizing Smart Solving Strategy for Symbolic Execution Identifying and Describing Information Seeking Tasks