{"title":"将现代JVM语言代码映射到分析友好型图形:Kotlin研究","authors":"Lu Li, Yan Liu","doi":"10.1142/s0218194022500735","DOIUrl":null,"url":null,"abstract":"Kotlin is a modern JVM language, gaining adoption rapidly and becoming Android official programming language. With its wide usage, the need for code analysis of Kotlin is increasing. Exposing code semantics explicitly with a properly structured format is the first step in code analysis and the construction of such representation is the foundation for downstream tasks. Recently, graph-based approaches became a promising way of encoding source code semantics. However, this work mainly focuses on representation learning with limited interpretability and shallow domain knowledge. The known evolvements of code semantics in new-generation programming languages have been overlooked. How to establish an effective mapping between naturally concise Kotlin source code with graph-based representation needs to be studied by analyzing known language features. Moreover, the feasibility of enhancing the mapping with code semantics automatically learned from the program needs to be explored. In this paper, we first propose a first-sight, rule-based mapping method, using composite representation with AST, CFG, DFG and language features. To examine the possibility of exposing code semantics in the mapped graph, we use Latent Semantic Indexing-based source code summarization to learn more features of each method, and then enrich the attributes of the corresponding node in the graph. We evaluate these mapping strategies with comparative experiments by simulating a code search solution as a downstream task. The experiment result shows that the graph-based method with built-in language features outperforms the text-based way without introducing greater complexity. Comparative experiments also prove that adding code semantics to the graph benefits the capacity of downstream tasks. When exploring the whole mapping process, our study explicitly revealed the practical barriers to extracting and exposing the hidden semantics from Kotlin source code, which may help enlighten source code representations for other modern languages.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"1 1","pages":"1667-1688"},"PeriodicalIF":0.6000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mapping Modern JVM Language Code to Analysis-Friendly Graphs: A Study with Kotlin\",\"authors\":\"Lu Li, Yan Liu\",\"doi\":\"10.1142/s0218194022500735\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kotlin is a modern JVM language, gaining adoption rapidly and becoming Android official programming language. With its wide usage, the need for code analysis of Kotlin is increasing. Exposing code semantics explicitly with a properly structured format is the first step in code analysis and the construction of such representation is the foundation for downstream tasks. Recently, graph-based approaches became a promising way of encoding source code semantics. However, this work mainly focuses on representation learning with limited interpretability and shallow domain knowledge. The known evolvements of code semantics in new-generation programming languages have been overlooked. How to establish an effective mapping between naturally concise Kotlin source code with graph-based representation needs to be studied by analyzing known language features. Moreover, the feasibility of enhancing the mapping with code semantics automatically learned from the program needs to be explored. In this paper, we first propose a first-sight, rule-based mapping method, using composite representation with AST, CFG, DFG and language features. To examine the possibility of exposing code semantics in the mapped graph, we use Latent Semantic Indexing-based source code summarization to learn more features of each method, and then enrich the attributes of the corresponding node in the graph. We evaluate these mapping strategies with comparative experiments by simulating a code search solution as a downstream task. The experiment result shows that the graph-based method with built-in language features outperforms the text-based way without introducing greater complexity. Comparative experiments also prove that adding code semantics to the graph benefits the capacity of downstream tasks. When exploring the whole mapping process, our study explicitly revealed the practical barriers to extracting and exposing the hidden semantics from Kotlin source code, which may help enlighten source code representations for other modern languages.\",\"PeriodicalId\":50288,\"journal\":{\"name\":\"International Journal of Software Engineering and Knowledge Engineering\",\"volume\":\"1 1\",\"pages\":\"1667-1688\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218194022500735\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500735","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Mapping Modern JVM Language Code to Analysis-Friendly Graphs: A Study with Kotlin
Kotlin is a modern JVM language, gaining adoption rapidly and becoming Android official programming language. With its wide usage, the need for code analysis of Kotlin is increasing. Exposing code semantics explicitly with a properly structured format is the first step in code analysis and the construction of such representation is the foundation for downstream tasks. Recently, graph-based approaches became a promising way of encoding source code semantics. However, this work mainly focuses on representation learning with limited interpretability and shallow domain knowledge. The known evolvements of code semantics in new-generation programming languages have been overlooked. How to establish an effective mapping between naturally concise Kotlin source code with graph-based representation needs to be studied by analyzing known language features. Moreover, the feasibility of enhancing the mapping with code semantics automatically learned from the program needs to be explored. In this paper, we first propose a first-sight, rule-based mapping method, using composite representation with AST, CFG, DFG and language features. To examine the possibility of exposing code semantics in the mapped graph, we use Latent Semantic Indexing-based source code summarization to learn more features of each method, and then enrich the attributes of the corresponding node in the graph. We evaluate these mapping strategies with comparative experiments by simulating a code search solution as a downstream task. The experiment result shows that the graph-based method with built-in language features outperforms the text-based way without introducing greater complexity. Comparative experiments also prove that adding code semantics to the graph benefits the capacity of downstream tasks. When exploring the whole mapping process, our study explicitly revealed the practical barriers to extracting and exposing the hidden semantics from Kotlin source code, which may help enlighten source code representations for other modern languages.
期刊介绍:
The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published:
Research papers reporting original research results
Technology trend surveys reviewing an area of research in software engineering and knowledge engineering
Survey articles surveying a broad area in software engineering and knowledge engineering
In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome.
A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.