将现代JVM语言代码映射到分析友好型图形:Kotlin研究

IF 0.6 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Software Engineering and Knowledge Engineering Pub Date : 2022-12-17 DOI:10.1142/s0218194022500735

Lu Li, Yan Liu

{"title":"将现代JVM语言代码映射到分析友好型图形:Kotlin研究","authors":"Lu Li, Yan Liu","doi":"10.1142/s0218194022500735","DOIUrl":null,"url":null,"abstract":"Kotlin is a modern JVM language, gaining adoption rapidly and becoming Android official programming language. With its wide usage, the need for code analysis of Kotlin is increasing. Exposing code semantics explicitly with a properly structured format is the first step in code analysis and the construction of such representation is the foundation for downstream tasks. Recently, graph-based approaches became a promising way of encoding source code semantics. However, this work mainly focuses on representation learning with limited interpretability and shallow domain knowledge. The known evolvements of code semantics in new-generation programming languages have been overlooked. How to establish an effective mapping between naturally concise Kotlin source code with graph-based representation needs to be studied by analyzing known language features. Moreover, the feasibility of enhancing the mapping with code semantics automatically learned from the program needs to be explored. In this paper, we first propose a first-sight, rule-based mapping method, using composite representation with AST, CFG, DFG and language features. To examine the possibility of exposing code semantics in the mapped graph, we use Latent Semantic Indexing-based source code summarization to learn more features of each method, and then enrich the attributes of the corresponding node in the graph. We evaluate these mapping strategies with comparative experiments by simulating a code search solution as a downstream task. The experiment result shows that the graph-based method with built-in language features outperforms the text-based way without introducing greater complexity. Comparative experiments also prove that adding code semantics to the graph benefits the capacity of downstream tasks. When exploring the whole mapping process, our study explicitly revealed the practical barriers to extracting and exposing the hidden semantics from Kotlin source code, which may help enlighten source code representations for other modern languages.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"1 1","pages":"1667-1688"},"PeriodicalIF":0.6000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mapping Modern JVM Language Code to Analysis-Friendly Graphs: A Study with Kotlin\",\"authors\":\"Lu Li, Yan Liu\",\"doi\":\"10.1142/s0218194022500735\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kotlin is a modern JVM language, gaining adoption rapidly and becoming Android official programming language. With its wide usage, the need for code analysis of Kotlin is increasing. Exposing code semantics explicitly with a properly structured format is the first step in code analysis and the construction of such representation is the foundation for downstream tasks. Recently, graph-based approaches became a promising way of encoding source code semantics. However, this work mainly focuses on representation learning with limited interpretability and shallow domain knowledge. The known evolvements of code semantics in new-generation programming languages have been overlooked. How to establish an effective mapping between naturally concise Kotlin source code with graph-based representation needs to be studied by analyzing known language features. Moreover, the feasibility of enhancing the mapping with code semantics automatically learned from the program needs to be explored. In this paper, we first propose a first-sight, rule-based mapping method, using composite representation with AST, CFG, DFG and language features. To examine the possibility of exposing code semantics in the mapped graph, we use Latent Semantic Indexing-based source code summarization to learn more features of each method, and then enrich the attributes of the corresponding node in the graph. We evaluate these mapping strategies with comparative experiments by simulating a code search solution as a downstream task. The experiment result shows that the graph-based method with built-in language features outperforms the text-based way without introducing greater complexity. Comparative experiments also prove that adding code semantics to the graph benefits the capacity of downstream tasks. When exploring the whole mapping process, our study explicitly revealed the practical barriers to extracting and exposing the hidden semantics from Kotlin source code, which may help enlighten source code representations for other modern languages.\",\"PeriodicalId\":50288,\"journal\":{\"name\":\"International Journal of Software Engineering and Knowledge Engineering\",\"volume\":\"1 1\",\"pages\":\"1667-1688\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218194022500735\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500735","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

Kotlin是一种现代的JVM语言，迅速获得采用并成为Android的官方编程语言。随着Kotlin的广泛使用，对其代码分析的需求也越来越大。以适当的结构格式显式地公开代码语义是代码分析的第一步，这种表示的构造是后续任务的基础。最近，基于图的方法成为一种很有前途的编码源代码语义的方法。然而，这些工作主要集中在可解释性有限、领域知识浅薄的表示学习上。新一代编程语言中已知的代码语义的演变被忽视了。如何在自然简洁的Kotlin源代码与基于图形的表示之间建立有效的映射，需要通过分析已知的语言特征来研究。此外，还需要探索利用从程序中自动学习的代码语义来增强映射的可行性。在本文中，我们首先提出了一种基于规则的直观映射方法，使用AST、CFG、DFG和语言特征的复合表示。为了检验在映射图中暴露代码语义的可能性，我们使用基于潜在语义索引的源代码摘要来学习每种方法的更多特征，然后丰富图中相应节点的属性。我们通过模拟代码搜索解决方案作为下游任务，通过比较实验来评估这些映射策略。实验结果表明，内置语言特征的基于图的方法在不增加复杂性的情况下优于基于文本的方法。对比实验还证明，在图中加入代码语义有利于提高下游任务的处理能力。在探索整个映射过程时，我们的研究明确地揭示了从Kotlin源代码中提取和暴露隐藏语义的实际障碍，这可能有助于启发其他现代语言的源代码表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mapping Modern JVM Language Code to Analysis-Friendly Graphs: A Study with Kotlin

Kotlin is a modern JVM language, gaining adoption rapidly and becoming Android official programming language. With its wide usage, the need for code analysis of Kotlin is increasing. Exposing code semantics explicitly with a properly structured format is the first step in code analysis and the construction of such representation is the foundation for downstream tasks. Recently, graph-based approaches became a promising way of encoding source code semantics. However, this work mainly focuses on representation learning with limited interpretability and shallow domain knowledge. The known evolvements of code semantics in new-generation programming languages have been overlooked. How to establish an effective mapping between naturally concise Kotlin source code with graph-based representation needs to be studied by analyzing known language features. Moreover, the feasibility of enhancing the mapping with code semantics automatically learned from the program needs to be explored. In this paper, we first propose a first-sight, rule-based mapping method, using composite representation with AST, CFG, DFG and language features. To examine the possibility of exposing code semantics in the mapped graph, we use Latent Semantic Indexing-based source code summarization to learn more features of each method, and then enrich the attributes of the corresponding node in the graph. We evaluate these mapping strategies with comparative experiments by simulating a code search solution as a downstream task. The experiment result shows that the graph-based method with built-in language features outperforms the text-based way without introducing greater complexity. Comparative experiments also prove that adding code semantics to the graph benefits the capacity of downstream tasks. When exploring the whole mapping process, our study explicitly revealed the practical barriers to extracting and exposing the hidden semantics from Kotlin source code, which may help enlighten source code representations for other modern languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Software Engineering and Knowledge Engineering 工程技术-工程：电子与电气

CiteScore

1.90

自引率

11.10%

发文量

审稿时长

16 months

期刊介绍： The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.