Journal of Computer Languages最新文献_第4页

Python’s evolution on Stack Overflow: An empirical analysis of topic trends Python在Stack Overflow上的演变：主题趋势的实证分析

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-06-12 DOI: 10.1016/j.cola.2025.101340

Fengqi Hu, Weihao Xue, Siyuan Zhou, Ye Wang, Bo Jiang, Qiao Huang, Hua Zhang

With the rapid development of information technology and changing programming practices, the demand for programming discussions on online Q&A platforms is growing. This study analyzes over two million Python-related posts on Stack Overflow to identify core topics and challenges over fifteen years. By using a Gradient Boosting Decision Tree (GBDT) model to quantify post popularity, we objectively show what the hottest as well as the most disturbing topics related to Python are to users at different times. We find that: The domains most closely associated with Python are data processing and machine learning, while development environments as well as automation and testing are gradually increasing in popularity. Machine learning is the area that bothers users the most. Moreover, we found that some questions that confuse users can increase the popularity of related topics. These findings can help developers grasp the direction of the Python language so that they can better plan their personal learning and project development. Enterprises and organizations can also optimize resource allocation based on trends in hot topics for training, tool development, and technical support.

随着信息技术的快速发展和编程实践的变化，在线问答平台对编程讨论的需求越来越大。这项研究分析了Stack Overflow上超过200万个与python相关的帖子，以确定15年来的核心主题和挑战。通过使用梯度提升决策树（GBDT）模型来量化帖子受欢迎程度，我们客观地显示了在不同时间与Python相关的最热门和最令人不安的话题是什么。我们发现：与Python最密切相关的领域是数据处理和机器学习，而开发环境以及自动化和测试正在逐渐普及。机器学习是最困扰用户的领域。此外，我们发现一些让用户困惑的问题可以增加相关话题的受欢迎程度。这些发现可以帮助开发人员掌握Python语言的发展方向，从而更好地规划个人学习和项目开发。企业和组织也可以根据培训、工具开发和技术支持的热门话题趋势来优化资源分配。

{"title":"Python’s evolution on Stack Overflow: An empirical analysis of topic trends","authors":"Fengqi Hu, Weihao Xue, Siyuan Zhou, Ye Wang, Bo Jiang, Qiao Huang, Hua Zhang","doi":"10.1016/j.cola.2025.101340","DOIUrl":"10.1016/j.cola.2025.101340","url":null,"abstract":"<div><div>With the rapid development of information technology and changing programming practices, the demand for programming discussions on online Q&A platforms is growing. This study analyzes over two million Python-related posts on Stack Overflow to identify core topics and challenges over fifteen years. By using a Gradient Boosting Decision Tree (GBDT) model to quantify post popularity, we objectively show what the hottest as well as the most disturbing topics related to Python are to users at different times. We find that: The domains most closely associated with Python are data processing and machine learning, while development environments as well as automation and testing are gradually increasing in popularity. Machine learning is the area that bothers users the most. Moreover, we found that some questions that confuse users can increase the popularity of related topics. These findings can help developers grasp the direction of the Python language so that they can better plan their personal learning and project development. Enterprises and organizations can also optimize resource allocation based on trends in hot topics for training, tool development, and technical support.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101340"},"PeriodicalIF":1.7,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The role of data transformation in modern analytics: A comprehensive survey 数据转换在现代分析中的作用：一个全面的调查

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-05-20 DOI: 10.1016/j.cola.2025.101329

Sanae Borrohou, Rachida Fissoune, Hassan Badir

Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.

数据转换是现代数据分析的基本步骤，可以将原始数据转换为适合分析的结构化、高质量格式。此过程在数据清理、集成和预处理中起着至关重要的作用，确保跨不同数据源的一致性，同时解决诸如缺失值、不一致性和冗余等挑战。通过应用缩放、归一化、编码、特征提取和聚合等技术，数据转换提高了分析和机器学习模型的准确性和效率。本研究对数据转换技术进行了全面的综述，并将其分为关键类型：数据清洗和预处理、规范化和标准化、特征工程、分类数据编码、数据增强、离散化和数据聚合。我们分析了它们对数据质量的影响，并探讨了它们的相互依赖性，提出了一个结构化框架，将这些转换连接到更广泛的数据预处理工作流程中。此外，我们强调了在大规模异构数据集中实现转换方法的挑战，包括数据集成的复杂性、安全问题和资源约束。通过综合该领域的最新进展，本研究为数据科学家和研究人员提供了一个结构化的参考，指导他们根据自己的具体分析需求选择合适的转换策略。未来的工作将侧重于开发一个完整的数据清理工作流，该工作流集成了大规模应用的转换技术，强调现代分析中的自动化和可扩展性。

{"title":"The role of data transformation in modern analytics: A comprehensive survey","authors":"Sanae Borrohou, Rachida Fissoune, Hassan Badir","doi":"10.1016/j.cola.2025.101329","DOIUrl":"10.1016/j.cola.2025.101329","url":null,"abstract":"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101329"},"PeriodicalIF":1.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards democratisation of veterinary clinical protocols: Transferring their development from technical-coding experts to veterinary professionals for the case of Chronic Kidney Disease for Cats (CKD4Cats Domain-Specific Language) 兽医临床协议的民主化：将技术编码专家的发展转变为猫慢性肾病的兽医专业人员（CKD4Cats领域特定语言）

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-05-18 DOI: 10.1016/j.cola.2025.101328

Sofia Meacham , Hessa Alfraihi

This paper presents CKD4Cats, a domain-specific language (DSL) for computerised Chronic Kidney Disease (CKD) clinical protocols in cats - a very common disease in veterinary practice. Building on DSLs used in human health, CKD4Cats addresses veterinary-specific needs while addressing their shortcomings. Developed with JetBrains’ Meta-Programming System (MPS) and veterinary input, the DSL ensures ease of use and adoption. It employs advanced evaluation methods, creating a projectional editor that streamlines protocol creation, displays relevant options, and guarantees ”correct-by-construction” clinical protocols. This innovative approach democratises software development, making advanced tools accessible to non-technical users and significantly improving veterinary practice management.

本文介绍了CKD4Cats，一种领域特定语言（DSL），用于计算机化慢性肾脏疾病（CKD）猫的临床协议-兽医实践中非常常见的疾病。CKD4Cats以用于人类健康的dsl为基础，解决了兽医的特定需求，同时解决了它们的缺点。使用JetBrains的元编程系统（MPS）和兽医输入开发，DSL确保易于使用和采用。它采用先进的评估方法，创建一个投影编辑器，简化方案创建，显示相关选项，并保证“构建正确”的临床方案。这种创新的方法使软件开发民主化，使非技术用户可以使用先进的工具，并显着改善兽医实践管理。

引用次数: 0

A novel framework for evaluating developers’ code comprehension proficiency through technical and non-technical skills 一个通过技术和非技术技能评估开发人员代码理解能力的新框架

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-04-28 DOI: 10.1016/j.cola.2025.101327

Divjot Singh, Ashutosh Mishra, Ashutosh Aggarwal

Context:

Code comprehension is an essential software maintenance skill, where technical skills are often considered the primary benchmark for evaluating developers’ proficiency, overlooking the significant role of non-technical skills.

Objective:

Our work aims to propose a generalized framework for measuring developers’ code comprehension proficiency by integrating technical and non-technical skills, inspired by cognitive attraction networks, and conducting an empirical study to evaluate code comprehension proficiency based on selective skills.

Methods:

The generalized framework evaluates developers’ technical and non-technical skills separately using collected data and computes their respective indices to derive an overall measure of code comprehension ability, represented as the comprehension measure index (CMI). Additionally, an empirical study with 158 participants assessed technical skills, including code understanding, debugging, and completion, alongside non-technical skills such as problem-solving, emotions, long-term memory, belief, desire, intention, and commitment to compute their overall code comprehension proficiency.

Results:

Based on the obtained indices values related to technical and non-technical parameters, the study identifies multiple factors affecting participants’ performance, including lack of technical knowledge, reliance on guesswork, stress intolerance, lack of commitment and desire, difficulty understanding logic, inability to recall concepts, and check other contributing factors. To enhance our results K-means clustering is done to group the participants into three clusters according to their performance.

Conclusion:

Integrating technical and non-technical skills enables a more accurate assessment by addressing factors beyond technical expertise. The framework can help managers and tutors identify strengths and weaknesses, allowing task assignments that align with strengths of developers while addressing areas for improvement.

上下文：代码理解是一项必要的软件维护技能，其中技术技能通常被认为是评估开发人员熟练程度的主要基准，而忽略了非技术技能的重要作用。目的：在认知吸引力网络的启发下，通过整合技术和非技术技能，提出了一个衡量开发人员代码理解能力的通用框架，并对基于选择性技能的代码理解能力进行了实证研究。方法：该广义框架使用收集到的数据，分别对开发人员的技术和非技术技能进行评估，并计算其各自的指标，得出代码理解能力的总体度量，表示为理解度量指数（CMI）。此外，一项有158名参与者的实证研究评估了技术技能，包括代码理解、调试和完成，以及非技术技能，如解决问题、情感、长期记忆、信念、欲望、意图和承诺，以计算他们的整体代码理解熟练程度。结果：根据获得的与技术和非技术参数相关的指标值，研究确定了影响参与者绩效的多个因素，包括缺乏技术知识、依赖猜测、压力耐受、缺乏承诺和愿望、难以理解逻辑、无法回忆概念，并检查了其他影响因素。为了增强我们的结果，K-means聚类将参与者根据他们的表现分成三类。结论：通过解决技术专长之外的因素，将技术和非技术技能整合在一起，可以进行更准确的评估。框架可以帮助管理人员和导师识别长处和短处，允许任务分配与开发人员的长处保持一致，同时解决需要改进的领域。

{"title":"A novel framework for evaluating developers’ code comprehension proficiency through technical and non-technical skills","authors":"Divjot Singh, Ashutosh Mishra, Ashutosh Aggarwal","doi":"10.1016/j.cola.2025.101327","DOIUrl":"10.1016/j.cola.2025.101327","url":null,"abstract":"<div><h3>Context:</h3><div>Code comprehension is an essential software maintenance skill, where technical skills are often considered the primary benchmark for evaluating developers’ proficiency, overlooking the significant role of non-technical skills.</div></div><div><h3>Objective:</h3><div>Our work aims to propose a generalized framework for measuring developers’ code comprehension proficiency by integrating technical and non-technical skills, inspired by cognitive attraction networks, and conducting an empirical study to evaluate code comprehension proficiency based on selective skills.</div></div><div><h3>Methods:</h3><div>The generalized framework evaluates developers’ technical and non-technical skills separately using collected data and computes their respective indices to derive an overall measure of code comprehension ability, represented as the comprehension measure index (CMI). Additionally, an empirical study with 158 participants assessed technical skills, including code understanding, debugging, and completion, alongside non-technical skills such as problem-solving, emotions, long-term memory, belief, desire, intention, and commitment to compute their overall code comprehension proficiency.</div></div><div><h3>Results:</h3><div>Based on the obtained indices values related to technical and non-technical parameters, the study identifies multiple factors affecting participants’ performance, including lack of technical knowledge, reliance on guesswork, stress intolerance, lack of commitment and desire, difficulty understanding logic, inability to recall concepts, and check other contributing factors. To enhance our results K-means clustering is done to group the participants into three clusters according to their performance.</div></div><div><h3>Conclusion:</h3><div>Integrating technical and non-technical skills enables a more accurate assessment by addressing factors beyond technical expertise. The framework can help managers and tutors identify strengths and weaknesses, allowing task assignments that align with strengths of developers while addressing areas for improvement.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101327"},"PeriodicalIF":1.7,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The evolution of Lua, continued Lua的进化还在继续

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-04-10 DOI: 10.1016/j.cola.2025.101326

Roberto Ierusalimschy , Luiz Henrique de Figueiredo , Waldemar Celes

Lua is a scripting language created in 1993 in Brazil. We have reported in detail on the birth of Lua and its evolution until 2007. Here, we chronicle the evolution of Lua since then. In particular, we discuss in detail the evolution of global variables, the introduction of integers, and the implementation of garbage collection and finalizers, including deterministic finalization. We also comment on some landmark social developments in the history of Lua.

Lua是1993年在巴西创建的一种脚本语言。我们已经详细报道了Lua的诞生及其直到2007年的演变。在这里，我们记录了从那时起Lua的演变。特别是，我们详细讨论了全局变量的演变，整数的引入，以及垃圾收集和终结器的实现，包括确定性终结。我们还评论了Lua历史上一些具有里程碑意义的社会发展。

引用次数: 0

Debugging in the Domain-Specific Modeling Languages for multi-agent systems 多智能体系统的领域特定建模语言调试

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-02-14 DOI: 10.1016/j.cola.2025.101325

Baris Tekin Tezel , Geylani Kardas

In many cases, developers face challenges while implementing Multi-Agent Systems (MAS) due to the complexity of expanding software systems, despite the presence of numerous agent programming environments and platforms. To tackle this complexity, Model-driven Engineering (MDE) can be employed at a higher level of abstraction and component modeling before diving into MAS development, which helps alleviate the intricacies. Probably, the most effective method of incorporating MDE into Multi-Agent Systems (MAS) is to adapt Domain-Specific Modeling Languages (DSMLs) along with integrated development environments (IDEs). These tools make it easier to model the system and generate the necessary code for the development process. Although existing MAS DSML IDEs offer some control over systems modeled based on the language’s syntax and semantics, they lack built-in debugging support. This deficiency leads to uncertainty among agent developers about the accuracy of models prepared during the design phase. To address this issue, this study proposes a comprehensive debugging framework (MASDebugFW) that facilitates the design of agent components within modeling environments. The framework’s utilization commences with modeling MASs using a design language, and then converting these design model instances into a runtime model. Following that, the runtime model undergoes simulation using an integrated simulator specifically designed for debugging purposes. Additionally, the framework includes a simulation environment model and a control mechanism to manage the simulation process effectively. These features further enhance the debugging capabilities and overall functionality of MASDebugFW. Furthermore, we have qualitatively and quantitatively evaluated MASDebugFW, subjecting all obtained results to statistical analysis. The evaluation results show that, on average, the implemented framework reduces debugging time by around 45%, leading to more efficient debugging processes. Moreover, it significantly enhances bug detection and repair capabilities, as it increases the number of bugs fixed in the models by approximately 50%.

在许多情况下，尽管存在许多代理编程环境和平台，但由于扩展软件系统的复杂性，开发人员在实现多代理系统（MAS）时面临挑战。为了处理这种复杂性，在深入MAS开发之前，可以在更高层次的抽象和组件建模中使用模型驱动工程（MDE），这有助于减轻复杂性。可能，将MDE合并到多代理系统（Multi-Agent Systems， MAS）的最有效方法是将特定于领域的建模语言（Domain-Specific Modeling Languages, dsml）与集成开发环境（integrated development environments, ide）结合起来。这些工具使系统建模和为开发过程生成必要的代码变得更加容易。尽管现有的MAS DSML ide对基于该语言的语法和语义建模的系统提供了一些控制，但它们缺乏内置的调试支持。这一缺陷导致智能体开发人员对设计阶段准备的模型的准确性不确定。为了解决这个问题，本研究提出了一个全面的调试框架（MASDebugFW），它有助于在建模环境中设计代理组件。框架的使用从使用设计语言对MASs建模开始，然后将这些设计模型实例转换为运行时模型。然后，使用专门为调试目的而设计的集成模拟器对运行时模型进行仿真。此外，该框架还包括仿真环境模型和有效管理仿真过程的控制机制。这些特性进一步增强了MASDebugFW的调试能力和整体功能。此外，我们对MASDebugFW进行了定性和定量评估，并对所有获得的结果进行了统计分析。评估结果表明，平均而言，实现的框架减少了约45%的调试时间，从而提高了调试过程的效率。此外，它显著增强了错误检测和修复能力，因为它将模型中修复的错误数量增加了大约50%。

{"title":"Debugging in the Domain-Specific Modeling Languages for multi-agent systems","authors":"Baris Tekin Tezel , Geylani Kardas","doi":"10.1016/j.cola.2025.101325","DOIUrl":"10.1016/j.cola.2025.101325","url":null,"abstract":"<div><div>In many cases, developers face challenges while implementing Multi-Agent Systems (MAS) due to the complexity of expanding software systems, despite the presence of numerous agent programming environments and platforms. To tackle this complexity, Model-driven Engineering (MDE) can be employed at a higher level of abstraction and component modeling before diving into MAS development, which helps alleviate the intricacies. Probably, the most effective method of incorporating MDE into Multi-Agent Systems (MAS) is to adapt Domain-Specific Modeling Languages (DSMLs) along with integrated development environments (IDEs). These tools make it easier to model the system and generate the necessary code for the development process. Although existing MAS DSML IDEs offer some control over systems modeled based on the language’s syntax and semantics, they lack built-in debugging support. This deficiency leads to uncertainty among agent developers about the accuracy of models prepared during the design phase. To address this issue, this study proposes a comprehensive debugging framework (MASDebugFW) that facilitates the design of agent components within modeling environments. The framework’s utilization commences with modeling MASs using a design language, and then converting these design model instances into a runtime model. Following that, the runtime model undergoes simulation using an integrated simulator specifically designed for debugging purposes. Additionally, the framework includes a simulation environment model and a control mechanism to manage the simulation process effectively. These features further enhance the debugging capabilities and overall functionality of MASDebugFW. Furthermore, we have qualitatively and quantitatively evaluated MASDebugFW, subjecting all obtained results to statistical analysis. The evaluation results show that, on average, the implemented framework reduces debugging time by around 45%, leading to more efficient debugging processes. Moreover, it significantly enhances bug detection and repair capabilities, as it increases the number of bugs fixed in the models by approximately 50%.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101325"},"PeriodicalIF":1.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GPotion: Embedding GPU programming in Elixir GPotion：在Elixir中嵌入GPU编程

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-02-08 DOI: 10.1016/j.cola.2025.101323

André Rauber Du Bois, Gerson Geraldo H. Cavalheiro

This paper describes GPotion, a DSL for GPU programming embedded in the Elixir functional language. GPotion allows programmers to write low-level GPU kernels, similar to CUDA kernels, in Elixir but also provides high-level facilities, like garbage collection of host and device arrays allocated in the host, type inference and simplified data transfer. This paper describes the design and implementation of GPotion and also presents experiments that demonstrate that GPotion allows fast and efficient kernels with little overhead in comparison to pure CUDA. GPotion is implemented using metaprogramming features of Elixir, without having to modify Elixir’s compiler. The source code for GPotion and the benchmarks used in the experiments are available in a GitHub repository.

本文介绍的 GPotion 是一种嵌入 Elixir 功能语言的 GPU 编程 DSL。GPotion 允许程序员在 Elixir 中编写类似于 CUDA 内核的底层 GPU 内核，但同时也提供了高层设施，如主机和设备阵列在主机中分配的垃圾回收、类型推断和简化的数据传输。本文介绍了 GPotion 的设计与实现，并通过实验证明，与纯 CUDA 内核相比，GPotion 内核既快速又高效，而且开销很小。GPotion 使用 Elixir 的元编程功能实现，无需修改 Elixir 的编译器。GPotion 的源代码和实验中使用的基准可在 GitHub 存储库中获取。

引用次数: 0

Near-Pruned single assignment transformation of programs 程序的近修剪单赋值变换

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-02-05 DOI: 10.1016/j.cola.2025.101324

Akshay M. Fajge, Raju Halder

This paper introduces Near-Pruned SSA, a novel variant of the SSA form that attains precision close to the Pruned version while prioritizing its efficient generation without the need for costly data flow analysis. This is realized by leveraging variables’ usage information within the program’s augmented CFG. Furthermore, we propose a direct method for generating DSA form of programs that bypasses the traditional process of

ϕ

-node destruction into its immediate predecessor-blocks, thereby streamlining the process. Experimental evaluation on a range of Solidity programs, including real-world smart contracts deployed on the Ethereum mainnet, demonstrates that our method outperforms existing SSA variants, except for the Pruned version, by minimizing the number of introduced

ϕ

-statements compared to state-of-the-art techniques. In particular, the proposed Near-Pruned variant demonstrates a computational cost that is approximately one-third of that of the Pruned variant while achieving a nearly 92% reduction in the introduction of additional statements compared to the Semi-Pruned variant.

本文介绍了近修剪SSA，这是SSA形式的一种新变体，它在不需要昂贵的数据流分析的情况下优先考虑其高效生成，同时获得接近修剪版本的精度。这是通过利用程序增强的CFG中的变量使用信息来实现的。此外，我们提出了一种直接生成DSA形式的程序的方法，该方法绕过了传统的节点破坏过程，从而简化了该过程。对一系列Solidity程序（包括部署在以太坊主网上的现实世界智能合约）的实验评估表明，与最先进的技术相比，我们的方法通过最小化引入的语句的数量，优于现有的SSA变体（除了Pruned版本）。特别是，提议的Near-Pruned变体表明，计算成本大约是Pruned变体的三分之一，而与Semi-Pruned变体相比，在引入额外语句方面减少了近92%。

{"title":"Near-Pruned single assignment transformation of programs","authors":"Akshay M. Fajge, Raju Halder","doi":"10.1016/j.cola.2025.101324","DOIUrl":"10.1016/j.cola.2025.101324","url":null,"abstract":"<div><div>This paper introduces <span>Near-Pruned</span> <span>SSA</span>, a novel variant of the <span>SSA</span> form that attains precision close to the <span>Pruned</span> version while prioritizing its efficient generation without the need for costly data flow analysis. This is realized by leveraging variables’ usage information within the program’s <em>augmented</em> <span>CFG</span>. Furthermore, we propose a direct method for generating <span>DSA</span> form of programs that bypasses the traditional process of <span><math><mi>ϕ</mi></math></span>-node destruction into its immediate predecessor-blocks, thereby streamlining the process. Experimental evaluation on a range of <em>Solidity</em> programs, including <em>real-world</em> smart contracts deployed on the <em>Ethereum mainnet</em>, demonstrates that our method outperforms existing <span>SSA</span> variants, except for the <span>Pruned</span> version, by minimizing the number of introduced <span><math><mi>ϕ</mi></math></span>-statements compared to <em>state-of-the-art</em> techniques. In particular, the proposed <span>Near-Pruned</span> variant demonstrates a computational cost that is approximately one-third of that of the <span>Pruned</span> variant while achieving a nearly 92% reduction in the introduction of additional statements compared to the <span>Semi-Pruned</span> variant.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101324"},"PeriodicalIF":1.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics MLAPW：一个框架，用于评估特征选择和抽样技术对使用WSDL度量的反模式预测的影响

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2025-02-01 DOI: 10.1016/j.cola.2025.101322

Lov Kumar , Vikram Singh , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra

<div><h3>Context:</h3><div>The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called <strong>Anti-patterns</strong>. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.</div></div><div><h3>Objective:</h3><div>The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’<em>how these metrics accurately predict different types of Anti-patterns present in the WSDL file</em>’.</div></div><div><h3>Methods:</h3><div>To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.</div></div><div><h3>Results:</h3><div>The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.</div></div><div><h3>Conclusion:</h3><div>The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental result

上下文：由于频繁的更改，基于服务的系统的质量和设计可能会下降，并对软件设计质量产生负面影响，称为反模式。这些反模式的存在严重影响了基于服务的系统的整体可维护性。因此，对这些反模式的存在进行早期检测是必须的。然而，手动查找这些反模式并不容易。目的：这项工作的目的是探索WSDL （Web服务描述语言）度量（MLAPW）在使用基于机器学习（ML）的框架进行反模式预测中的作用。该框架包含了特征选择技术、数据采样技术和广泛的ML算法的不同变体。这项工作对使用不同的WSDL度量集开发的反模式预测模型的预测能力进行了实证研究。我们的主要焦点是研究“这些指标如何准确地预测WSDL文件中出现的不同类型的反模式”。方法：为了实现目标，使用不同的WSDL度量集，如结构质量度量、过程质量度量、数据质量度量、质量度量和复杂性度量，作为反模式预测模型的输入。由于这些模型使用WSDL度量作为输入，我们还使用特征选择方法来找到最佳的WSDL度量集。这些模型使用各种机器学习技术进行训练。本研究还展示了使用数据采样技术在平衡数据上训练的这些模型的性能。最后，运用准确度、受试者工作特征曲线（ROC）曲线（AUC）和假设检验对这些技术进行实证研究。结果：实证研究的观察结果基于226个来自金融、旅游、卫生、教育等各个领域的WSDL文件。评估断言使用WSDL指标训练的模型具有0.79的平均AUC和0.90的中位数AUC。然而，使用分类器特征子集选择（CFS）训练的模型具有更好的平均AUC为0.80，中位数AUC为0.97。实验结果还证实，上采样（UPSAM）训练的模型具有较好的平均AUC为0.79，中位数AUC为0.91，Friedman rank值较低为2.40。最后，使用最小二乘支持向量机（LSSVM）训练的模型实现了中位AUC 1，平均AUC 0.99， Friedman rank低至1.30。结论：实验结果表明，与其他度量集相比，使用数据和程序质量度量集训练的模型的AUC值较高。然而，在采用特征选择技术后，模型的预测性能显著提高。实验结果还表明，使用高级分类器和集成学习训练的模型具有比其他技术更高的AUC值。基于本研究，我们有理由认为使用数据采样技术有助于提高模型的预测能力。使用UPSAM或上采样对采样数据进行训练的模型实现了0.91中位数AUC和0.79平均AUC。

{"title":"MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics","authors":"Lov Kumar , Vikram Singh , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra","doi":"10.1016/j.cola.2025.101322","DOIUrl":"10.1016/j.cola.2025.101322","url":null,"abstract":"<div><h3>Context:</h3><div>The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called <strong>Anti-patterns</strong>. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.</div></div><div><h3>Objective:</h3><div>The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’<em>how these metrics accurately predict different types of Anti-patterns present in the WSDL file</em>’.</div></div><div><h3>Methods:</h3><div>To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.</div></div><div><h3>Results:</h3><div>The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.</div></div><div><h3>Conclusion:</h3><div>The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental result","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101322"},"PeriodicalIF":1.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Code histories: Documenting development by recording code influences and changes in code 代码历史：通过记录代码影响和代码变更来记录开发过程

IF 1.7 3区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Computer Languages

Pub Date : 2024-12-25 DOI: 10.1016/j.cola.2024.101313

Vo Thien Tri Pham, Caitlin Kelleher

Developers frequently encounter challenges when working with large code bases found in modern software applications, from navigating through files to more complex tasks like understanding code histories, dependencies, and evolutions. While many applications use Version Control Systems (VCSs) to archive present-day programs and provide a historical perspective on code development, the level of detail they offer is often insufficient for in-depth analyses. As a result, it becomes difficult to fully explore the potential benefits of historical data in software development. We introduce an enhanced recording framework that integrates both the Visual Studio Code (VS Code) development environment and the Google Chrome web browser to capture more detailed development activities. Our framework is designed to offer additional recording options, thereby providing researchers with more opportunities to study how different historical resources can be utilized. Through an observational study, we demonstrate the utility of our framework in capturing the complex dynamics of code change activities, highlighting its potential value in both academic and practical contexts.

开发人员在处理现代软件应用程序中的大型代码库时经常遇到挑战，从浏览文件到更复杂的任务，如理解代码历史、依赖关系和演进。虽然许多应用程序使用版本控制系统（vcs）来存档当前的程序并提供代码开发的历史视图，但它们提供的详细程度通常不足以进行深入分析。因此，在软件开发中充分挖掘历史数据的潜在好处变得很困难。我们引入了一个增强的记录框架，它集成了Visual Studio Code （VS Code）开发环境和b谷歌Chrome web浏览器，以捕获更详细的开发活动。我们的框架旨在提供额外的记录选项，从而为研究人员提供更多的机会来研究如何利用不同的历史资源。通过一项观察性研究，我们展示了我们的框架在捕获代码更改活动的复杂动态方面的效用，强调了它在学术和实践环境中的潜在价值。

{"title":"Code histories: Documenting development by recording code influences and changes in code","authors":"Vo Thien Tri Pham, Caitlin Kelleher","doi":"10.1016/j.cola.2024.101313","DOIUrl":"10.1016/j.cola.2024.101313","url":null,"abstract":"<div><div>Developers frequently encounter challenges when working with large code bases found in modern software applications, from navigating through files to more complex tasks like understanding code histories, dependencies, and evolutions. While many applications use Version Control Systems (VCSs) to archive present-day programs and provide a historical perspective on code development, the level of detail they offer is often insufficient for in-depth analyses. As a result, it becomes difficult to fully explore the potential benefits of historical data in software development. We introduce an enhanced recording framework that integrates both the Visual Studio Code (VS Code) development environment and the Google Chrome web browser to capture more detailed development activities. Our framework is designed to offer additional recording options, thereby providing researchers with more opportunities to study how different historical resources can be utilized. Through an observational study, we demonstrate the utility of our framework in capturing the complex dynamics of code change activities, highlighting its potential value in both academic and practical contexts.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101313"},"PeriodicalIF":1.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0