{"title":"程序注释的分类与传播","authors":"Xiangzhe Xu","doi":"10.1145/3324884.3418913","DOIUrl":null,"url":null,"abstract":"Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Classification and Propagation of Program Comments\",\"authors\":\"Xiangzhe Xu\",\"doi\":\"10.1145/3324884.3418913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.\",\"PeriodicalId\":106337,\"journal\":{\"name\":\"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3324884.3418913\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3418913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Classification and Propagation of Program Comments
Natural language comments are like bridges between human logic and software semantics. Developers use comments to describe the function, implementation, and property of code snippets. This kind of connections contains rich information, like the potential types of a variable and the pre-condition of a method, among other things. In this paper, we categorize comments and use natural language processing techniques to extract information from them. Based on the semantics of programming languages, different rules are built for each comment category to systematically propagate comments among code entities. Then we use the propagated comments to check the code usage and comments consistency. Our demo system finds 37 bugs in real-world projects, 30 of which have been confirmed by the developers. Except for bugs in the code, we also find 304 pieces of defected comments. The 12 of them are misleading and 292 of them are not correct. Moreover, among the 41573 pieces of comments we propagate, 87 comments are for private native methods which had neither code nor comments. We also conduct a user study where we find that propagated comments are as good as human-written comments in three dimensions of consistency, naturalness, and meaningfulness.