Pub Date : 2025-08-15DOI: 10.1016/j.cola.2025.101346
Jan Slifka, Robert Pergl
Modern web front-end applications frequently encounter challenges in maintaining long-term stability as they evolve to accommodate new requirements. This growing complexity often leads to diminishing maintainability and, in some cases, costly rewrites. To address this issue, we propose a methodology that integrates Normalized Systems Theory (NST)–which provides the structural foundations for stable software—with functional programming (FP) principles to construct inherently evolvable front-end systems. Our approach is implemented and evaluated using Elm, a statically typed, purely functional language designed for web front-end development. By aligning Elm’s design patterns with NST theorems, we establish a framework for building systems that are modular, maintainable, and resilient to change. We validate the efficacy of this methodology through a case study of a production-grade Elm application, demonstrating notable improvements in evolvability and system sustainability. While our implementation focuses on Elm, the underlying principles extend to other functional technologies, offering a broadly applicable strategy for achieving long-term stability in web front-end architecture.
{"title":"Application of Normalized Systems Theory to pure functional code to achieve sustainability of web front-end applications","authors":"Jan Slifka, Robert Pergl","doi":"10.1016/j.cola.2025.101346","DOIUrl":"10.1016/j.cola.2025.101346","url":null,"abstract":"<div><div>Modern web front-end applications frequently encounter challenges in maintaining long-term stability as they evolve to accommodate new requirements. This growing complexity often leads to diminishing maintainability and, in some cases, costly rewrites. To address this issue, we propose a methodology that integrates Normalized Systems Theory (NST)–which provides the structural foundations for stable software—with functional programming (FP) principles to construct inherently evolvable front-end systems. Our approach is implemented and evaluated using Elm, a statically typed, purely functional language designed for web front-end development. By aligning Elm’s design patterns with NST theorems, we establish a framework for building systems that are modular, maintainable, and resilient to change. We validate the efficacy of this methodology through a case study of a production-grade Elm application, demonstrating notable improvements in evolvability and system sustainability. While our implementation focuses on Elm, the underlying principles extend to other functional technologies, offering a broadly applicable strategy for achieving long-term stability in web front-end architecture.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"85 ","pages":"Article 101346"},"PeriodicalIF":1.8,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144880113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code completion is a crucial feature in modern IDEs, improving programming efficiency. Traditional systems rely on prefix filtering and static ranking but often overwhelm users with lengthy, alphabetically sorted lists. Recent research has introduced LR-parsing-based approaches that derive completion candidates from language syntax and compute their ranks using open-source programs; however, these methods only suggest structural candidates, requiring manual refinement into complete code. To address this, we propose a hybrid method that integrates LR parsing with LLMs to enhance accuracy and usability. Our approach refines structural candidates using LR parsing into textual code suggestions via an LLM, referencing a database of ranked candidates from open-source programs. This combines the syntactic precision of LR parsing with the generative capabilities of LLMs. This study examines whether LLMs benefit from LR structural candidates in code completion. By comparing completions with and without these candidates, we assess their impact. Building on prior research, we also explore how leveraging top-ranked structural candidates can effectively enhance LLM-based code completion precision. We also demonstrate our method through VSCode extensions for Microsoft Small Basic and C. As a language-agnostic solution, our system applies to any language with a defined LR grammar. Our findings suggest that integrating LR parsing with LLM-based completion improves both accuracy and usability, paving the way for more effective code completion in modern IDEs.
代码补全是现代ide的一个关键特性,可以提高编程效率。传统的系统依赖于前缀过滤和静态排名,但往往用冗长的、按字母顺序排序的列表淹没用户。最近的研究引入了基于lr解析的方法,从语言语法中获得补全候选项,并使用开源程序计算它们的排名;然而,这些方法只建议结构候选,需要手工细化为完整的代码。为了解决这个问题,我们提出了一种将LR解析与llm集成在一起的混合方法,以提高准确性和可用性。我们的方法通过LLM将LR解析细化为文本代码建议,并参考来自开源程序的排名候选数据库。这结合了LR解析的语法精度和llm的生成能力。本研究探讨llm是否受益于LR结构候选人在代码完成。通过比较有和没有这些候选人的完成情况,我们评估他们的影响。在先前研究的基础上,我们还探讨了如何利用排名靠前的候选结构来有效地提高基于llm的代码完成精度。我们还通过VSCode扩展为Microsoft Small Basic和c演示了我们的方法。作为一种语言无关的解决方案,我们的系统适用于任何具有定义LR语法的语言。我们的研究结果表明,将LR解析与基于llm的补全集成可以提高准确性和可用性,为现代ide中更有效的代码补全铺平了道路。
{"title":"Improving LLM-based code completion using LR parsing","authors":"Md Monir Ahammod Bin Atique , Hyeon-Ah Moon , Isao Sasano , Kwanghoon Choi","doi":"10.1016/j.cola.2025.101352","DOIUrl":"10.1016/j.cola.2025.101352","url":null,"abstract":"<div><div>Code completion is a crucial feature in modern IDEs, improving programming efficiency. Traditional systems rely on prefix filtering and static ranking but often overwhelm users with lengthy, alphabetically sorted lists. Recent research has introduced LR-parsing-based approaches that derive completion candidates from language syntax and compute their ranks using open-source programs; however, these methods only suggest structural candidates, requiring manual refinement into complete code. To address this, we propose a hybrid method that integrates LR parsing with LLMs to enhance accuracy and usability. Our approach refines structural candidates using LR parsing into textual code suggestions via an LLM, referencing a database of ranked candidates from open-source programs. This combines the syntactic precision of LR parsing with the generative capabilities of LLMs. This study examines whether LLMs benefit from LR structural candidates in code completion. By comparing completions with and without these candidates, we assess their impact. Building on prior research, we also explore how leveraging top-ranked structural candidates can effectively enhance LLM-based code completion precision. We also demonstrate our method through VSCode extensions for Microsoft Small Basic and C. As a language-agnostic solution, our system applies to any language with a defined LR grammar. Our findings suggest that integrating LR parsing with LLM-based completion improves both accuracy and usability, paving the way for more effective code completion in modern IDEs.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101352"},"PeriodicalIF":1.8,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-29DOI: 10.1016/j.cola.2025.101351
Enkhbold Nyamsuren
Democratization of AI, which is making AI accessible and usable for everyone, is an important topic with the broader topic of the digital divide. This issue is especially relevant to Large Language Models (LLM) that are becoming increasingly popular as AI co-pilots but suffer from a lack of accessibility due to high computational demand. In this study, we evaluate whether LLM quantization is a viable approach toward enabling LLMs on generic consumer devices. The study assesses the performance of five quantized code LLMs in Lua and Python code generation tasks. All code LLMs had approximately 7 billion parameters and were deployed on a generic CPU-only consumer laptop. To evaluate the impact of quantization, the models were tested at 2-, 4-, and 8-bit integer precisions. Pass@1 and pass@10 evaluations were done at variable temperatures and token sampling rates. Along with tasks such as question answering, text summarization, and text generation, programming tasks are one of the popular applications of AI co-pilots. Furthermore, code generation is a high-precision task, which makes it a suitable benchmark to evaluate and compare quantized models for everyday use by individuals. Lua is chosen as a low-resource language to avoid models’ biases related to high-resource languages. Performance in Lua is contrasted against performance in Python, which was chosen as a high-resource language. The results suggest that the models quantized at the 4-bit integer precision offer the best trade-off between performance and model size. These models can be comfortably deployed on an average laptop without a dedicated GPU. The findings suggest that lower quantization precision adversely affects more the performance in low-resource languages than in high-resource languages. But it also hinted that the quantization to an integer precision from the full precision affects more the performance in high-resource language. The quantized models at 8-bit integer precision require more inference that does not effectively translate to better performance. While quantization indeed increases the accessibility of smaller LLMs with 7 billion parameters, these LLMs demonstrate overall low performance (less than 50%) on high-precision and low-resource tasks such as Lua code generation. While accessibility is improved, usability is still not at a practical level of foundational LLMs such as GPT-4o or Llama 3.1 with 405 billion parameters. Additionally, in the most failed instances, the models excel at generating code that is free of syntax errors but fails at unit tests or has runtime issues. This means that any generated code requires extensive testing that may negate any potential efficiency boost delivered by these smaller coding models.
{"title":"Evaluating quantized Large Language Models for code generation on low-resource language benchmarks","authors":"Enkhbold Nyamsuren","doi":"10.1016/j.cola.2025.101351","DOIUrl":"10.1016/j.cola.2025.101351","url":null,"abstract":"<div><div>Democratization of AI, which is making AI accessible and usable for everyone, is an important topic with the broader topic of the digital divide. This issue is especially relevant to Large Language Models (LLM) that are becoming increasingly popular as AI co-pilots but suffer from a lack of accessibility due to high computational demand. In this study, we evaluate whether LLM quantization is a viable approach toward enabling LLMs on generic consumer devices. The study assesses the performance of five quantized code LLMs in Lua and Python code generation tasks. All code LLMs had approximately 7 billion parameters and were deployed on a generic CPU-only consumer laptop. To evaluate the impact of quantization, the models were tested at 2-, 4-, and 8-bit integer precisions. Pass@1 and pass@10 evaluations were done at variable temperatures and token sampling rates. Along with tasks such as question answering, text summarization, and text generation, programming tasks are one of the popular applications of AI co-pilots. Furthermore, code generation is a high-precision task, which makes it a suitable benchmark to evaluate and compare quantized models for everyday use by individuals. Lua is chosen as a low-resource language to avoid models’ biases related to high-resource languages. Performance in Lua is contrasted against performance in Python, which was chosen as a high-resource language. The results suggest that the models quantized at the 4-bit integer precision offer the best trade-off between performance and model size. These models can be comfortably deployed on an average laptop without a dedicated GPU. The findings suggest that lower quantization precision adversely affects more the performance in low-resource languages than in high-resource languages. But it also hinted that the quantization to an integer precision from the full precision affects more the performance in high-resource language. The quantized models at 8-bit integer precision require more inference that does not effectively translate to better performance. While quantization indeed increases the accessibility of smaller LLMs with 7 billion parameters, these LLMs demonstrate overall low performance (less than 50%) on high-precision and low-resource tasks such as Lua code generation. While accessibility is improved, usability is still not at a practical level of foundational LLMs such as GPT-4o or Llama 3.1 with 405 billion parameters. Additionally, in the most failed instances, the models excel at generating code that is free of syntax errors but fails at unit tests or has runtime issues. This means that any generated code requires extensive testing that may negate any potential efficiency boost delivered by these smaller coding models.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101351"},"PeriodicalIF":1.8,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-26DOI: 10.1016/j.cola.2025.101350
Ilia Maslov , Stephan Poelmans , Yves Wautelet , Frederik Gailly
Process modeling is fundamental for effective (business) process management. Comprehension of process models by novice modelers and the effective integration of learning technologies present crucial challenges that can be addressed through the application of visualization, animation, and simulation techniques. In this study, we examine the experiences and perceptions of novice modelers deploying token-animated process models, drawing upon data from 119 college students specializing in business management and business engineering who answered comprehension questions based on these models. We concentrate on investigating perceived understanding through the utilization of Technology Adoption Model (TAM) constructs, employing Partial Least Square to validate an extended research model based on TAM. We additionally analyze qualitative data from respondents' answers to open questions to extract codes and themes that complete the research model's findings. The results confirm that token-animated process models are useful and preferred as a learning technique. Tokens enhance cognitive facilitation by incorporating visualization, animation, and simulation functionalities, resulting in improved objective and perceived comprehension. We extend the comprehension determinants of process models with perceived enjoyment and show that emotional states are also important in the utilization of tokens for teaching purposes. Over 80 % of participants reported a clear preference for using token-animated process models, confirming high levels of student acceptance. Our study also identified recommendations for enhancement and potential limitations associated with the use of animated tokens in education. Further theoretical and practical implications are finally discussed.
{"title":"Novice modelers’ subjective comprehension and interaction with token-animated process models","authors":"Ilia Maslov , Stephan Poelmans , Yves Wautelet , Frederik Gailly","doi":"10.1016/j.cola.2025.101350","DOIUrl":"10.1016/j.cola.2025.101350","url":null,"abstract":"<div><div>Process modeling is fundamental for effective (business) process management. Comprehension of process models by novice modelers and the effective integration of learning technologies present crucial challenges that can be addressed through the application of visualization, animation, and simulation techniques. In this study, we examine the experiences and perceptions of novice modelers deploying token-animated process models, drawing upon data from 119 college students specializing in business management and business engineering who answered comprehension questions based on these models. We concentrate on investigating perceived understanding through the utilization of Technology Adoption Model (TAM) constructs, employing Partial Least Square to validate an extended research model based on TAM. We additionally analyze qualitative data from respondents' answers to open questions to extract codes and themes that complete the research model's findings. The results confirm that token-animated process models are useful and preferred as a learning technique. Tokens enhance cognitive facilitation by incorporating visualization, animation, and simulation functionalities, resulting in improved objective and perceived comprehension. We extend the comprehension determinants of process models with perceived enjoyment and show that emotional states are also important in the utilization of tokens for teaching purposes. Over 80 % of participants reported a clear preference for using token-animated process models, confirming high levels of student acceptance. Our study also identified recommendations for enhancement and potential limitations associated with the use of animated tokens in education. Further theoretical and practical implications are finally discussed.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101350"},"PeriodicalIF":1.8,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-23DOI: 10.1016/j.cola.2025.101348
Ruofan Yang, Xianghua Xu, Ran Wang
Automated unit test generation is a critical technique for improving software quality and development efficiency. However, traditional methods often produce test cases with poor business consistency, while large language model based approaches face two major challenges: a high error rate in generated tests and insufficient code coverage. To address these issues, this paper proposes TestLoter, a logic-driven test generation framework. The core contributions of TestLoter are twofold. First, by integrating the structured analysis capabilities of white-box testing with the functional validation characteristics of black-box testing, we design a logic-driven test generation chain-of-thought that enables deep semantic analysis of code. Second, we establish a hierarchical repair mechanism to systematically correct errors in generated test cases, significantly enhancing the correctness of the test code. Experimental results on nine open-source projects covering various domains, such as data processing and utility libraries, demonstrate that TestLoter achieves 83.6% line coverage and 78% branch coverage. Our approach outperforms both LLM-based methods and traditional search-based software testing techniques in terms of coverage, while also reducing the number of errors in the generated unit test code.
{"title":"TestLoter: A logic-driven framework for automated unit test generation and error repair using large language models","authors":"Ruofan Yang, Xianghua Xu, Ran Wang","doi":"10.1016/j.cola.2025.101348","DOIUrl":"10.1016/j.cola.2025.101348","url":null,"abstract":"<div><div>Automated unit test generation is a critical technique for improving software quality and development efficiency. However, traditional methods often produce test cases with poor business consistency, while large language model based approaches face two major challenges: a high error rate in generated tests and insufficient code coverage. To address these issues, this paper proposes TestLoter, a logic-driven test generation framework. The core contributions of TestLoter are twofold. First, by integrating the structured analysis capabilities of white-box testing with the functional validation characteristics of black-box testing, we design a logic-driven test generation chain-of-thought that enables deep semantic analysis of code. Second, we establish a hierarchical repair mechanism to systematically correct errors in generated test cases, significantly enhancing the correctness of the test code. Experimental results on nine open-source projects covering various domains, such as data processing and utility libraries, demonstrate that TestLoter achieves 83.6% line coverage and 78% branch coverage. Our approach outperforms both LLM-based methods and traditional search-based software testing techniques in terms of coverage, while also reducing the number of errors in the generated unit test code.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101348"},"PeriodicalIF":1.8,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144721370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-21DOI: 10.1016/j.cola.2025.101347
Rafael Fontes Sumitani, Lucas Victor da Silva Costa, Frederico F. Campos, Fernando Magno Quintão Pereira
A cost model is a function that relates how often each part of a program runs depending on its inputs. Cost models can be derived automatically via the observation of counters: instrumentation that tracks execution of program operations. This paper defines Newton Counters: counters that can be described via a polynomial ranging on a single program input variable whose value can be read in constant time. Additionally, it shows that Newton Counters are prevalent in actual codes. Motivated by this observation, the paper introduces a methodology to derive automatic cost models. Said methodology combines static code analyses with interpolation via Newton’s divided difference method. This approach is currently available as a tool, Merlin. The effectiveness of this tool is demonstrated on 949 executable C programs taken from the Jotai collection, and on genann, a neural network library.
{"title":"A methodology for empirical complexity analysis based on Newton’s polynomial interpolation","authors":"Rafael Fontes Sumitani, Lucas Victor da Silva Costa, Frederico F. Campos, Fernando Magno Quintão Pereira","doi":"10.1016/j.cola.2025.101347","DOIUrl":"10.1016/j.cola.2025.101347","url":null,"abstract":"<div><div>A cost model is a function that relates how often each part of a program runs depending on its inputs. Cost models can be derived automatically via the observation of counters: instrumentation that tracks execution of program operations. This paper defines Newton Counters: counters that can be described via a polynomial ranging on a single program input variable whose value can be read in constant time. Additionally, it shows that Newton Counters are prevalent in actual codes. Motivated by this observation, the paper introduces a methodology to derive automatic cost models. Said methodology combines static code analyses with interpolation via Newton’s divided difference method. This approach is currently available as a tool, <span>Merlin</span>. The effectiveness of this tool is demonstrated on 949 executable C programs taken from the <span>Jotai</span> collection, and on <span>genann</span>, a neural network library.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101347"},"PeriodicalIF":1.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144703220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-17DOI: 10.1016/j.cola.2025.101345
Lázaro Costa , Susana Barbosa , Jácome Cunha
User studies are paramount for advancing research in software engineering, particularly when evaluating tools and techniques involving programmers. However, researchers face several barriers when performing them despite the existence of supporting tools. We base our study on a set of tools and researcher-reported barriers identified in prior work on user studies in software engineering. In this work, we study how existing tools and their features cope with previously identified barriers. Moreover, we propose new features for the barriers that lack support. We validated our proposal with 102 researchers, achieving statistically significant positive support for all but one feature. We study the current gap between tools and barriers, using features as the bridge. We show there is a significant lack of support for several barriers, as some have no single tool to support them.
{"title":"Mind the gap: The missing features of the tools to support user studies in software engineering","authors":"Lázaro Costa , Susana Barbosa , Jácome Cunha","doi":"10.1016/j.cola.2025.101345","DOIUrl":"10.1016/j.cola.2025.101345","url":null,"abstract":"<div><div>User studies are paramount for advancing research in software engineering, particularly when evaluating tools and techniques involving programmers. However, researchers face several barriers when performing them despite the existence of supporting tools. We base our study on a set of tools and researcher-reported barriers identified in prior work on user studies in software engineering. In this work, we study how existing tools and their features cope with previously identified barriers. Moreover, we propose new features for the barriers that lack support. We validated our proposal with 102 researchers, achieving statistically significant positive support for all but one feature. We study the current gap between tools and barriers, using features as the bridge. We show there is a significant lack of support for several barriers, as some have no single tool to support them.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101345"},"PeriodicalIF":1.7,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-24DOI: 10.1016/j.cola.2025.101343
Zixuan Zhu
This paper presents advanced optimization techniques for Lua Parsing Expression Grammars (LPeg) through two complementary case studies: a high-performance JSON parser and a sophisticated Glob-to-LPeg pattern converter. We demonstrate how strategic grammar construction can dramatically improve parsing performance without modifying the underlying LPeg library. For the JSON parser, we implement substitution capture and table construction optimization to reduce memory allocation overhead and improve object processing. For the Glob converter, we introduce segment-boundary separation, implement Cox’s flattened search strategy, and develop optimized braced condition handling to prevent exponential backtracking. Comprehensive benchmarks demonstrate that our JSON parser achieves processing speeds up to 125 MB/s on complex documents, consistently outperforming dkjson and showing competitive results against rxi_json across most test cases. Our Glob-to-LPeg converter exhibits 14%–92% better performance than Bun.Glob and runs 3–14 times faster than Minimatch across diverse pattern matching scenarios. This research provides practical optimization techniques for LPeg-based parsers, contributing valuable strategies to the text processing ecosystem.
{"title":"Advanced LPeg techniques: A dual case study approach","authors":"Zixuan Zhu","doi":"10.1016/j.cola.2025.101343","DOIUrl":"10.1016/j.cola.2025.101343","url":null,"abstract":"<div><div>This paper presents advanced optimization techniques for Lua Parsing Expression Grammars (LPeg) through two complementary case studies: a high-performance JSON parser and a sophisticated Glob-to-LPeg pattern converter. We demonstrate how strategic grammar construction can dramatically improve parsing performance without modifying the underlying LPeg library. For the JSON parser, we implement substitution capture and table construction optimization to reduce memory allocation overhead and improve object processing. For the Glob converter, we introduce segment-boundary separation, implement Cox’s flattened search strategy, and develop optimized braced condition handling to prevent exponential backtracking. Comprehensive benchmarks demonstrate that our JSON parser achieves processing speeds up to 125 MB/s on complex documents, consistently outperforming dkjson and showing competitive results against rxi_json across most test cases. Our Glob-to-LPeg converter exhibits 14%–92% better performance than Bun.Glob and runs 3–14 times faster than Minimatch across diverse pattern matching scenarios. This research provides practical optimization techniques for LPeg-based parsers, contributing valuable strategies to the text processing ecosystem.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101343"},"PeriodicalIF":1.7,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144501341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-18DOI: 10.1016/j.cola.2025.101342
Glenn Strong, Nina Bresnihan, Brendan Tangney
This paper describes a systematic review of the approaches being taken to providing support to learners as they transition from block-based programming environments to text-based ones. It identifies and analyses the literature in the area, identifies the themes which are common across the different approaches being used, and determines gaps in the literature. With the widespread use of block-based programming environments in introductory programming education, the question of how to support learners in the transition to text-based environments has received much attention. The contribution of this paper is to analyse and characterise the approaches being taken to support learners by considering the question: what approaches have been developed to facilitate the transition from block-based programming to text-based programming for learners? To answer this, a systematic literature review was undertaken, combining manual and automatic searches to identify work in the field. A thematic analysis of the literature found eight themes covering technical and non-technical approaches to supporting transition, prompting a set of recommendations for gaps to be addressed in future development in the field.
{"title":"Supporting learners in the transition from block-based to text-based programming, a systematic review","authors":"Glenn Strong, Nina Bresnihan, Brendan Tangney","doi":"10.1016/j.cola.2025.101342","DOIUrl":"10.1016/j.cola.2025.101342","url":null,"abstract":"<div><div>This paper describes a systematic review of the approaches being taken to providing support to learners as they transition from block-based programming environments to text-based ones. It identifies and analyses the literature in the area, identifies the themes which are common across the different approaches being used, and determines gaps in the literature. With the widespread use of block-based programming environments in introductory programming education, the question of how to support learners in the transition to text-based environments has received much attention. The contribution of this paper is to analyse and characterise the approaches being taken to support learners by considering the question: what approaches have been developed to facilitate the transition from block-based programming to text-based programming for learners? To answer this, a systematic literature review was undertaken, combining manual and automatic searches to identify work in the field. A thematic analysis of the literature found eight themes covering technical and non-technical approaches to supporting transition, prompting a set of recommendations for gaps to be addressed in future development in the field.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101342"},"PeriodicalIF":1.7,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-16DOI: 10.1016/j.cola.2025.101341
Sérgio Queiroz de Medeiros, Marcelo Borges Nogueira, Gustavo Quezado
The current concern about global warming has led to an increasing interest in the energy efficiency of computer applications. Assuming power is constant, the general trend is that faster programs consume less energy, thus optimizing a program for speed would also improve its energy efficiency.
We investigate this tendency in a set of C++ and Java solutions mined from Code Submission Evaluation System (CSES), a popular programming competition site, where each solution must give the correct answer under a given time limit. In such context, we can consider that all correct solutions for a problem were written with a speed concern, but not with energy efficiency in mind.
We selected 15 problems from CSES and for each of them we mined at least 30 C++ and Java solutions, evaluating time and energy efficiency of each solution in at least two different machines. In our scenario, where there is a great diversity of programming styles, execution speed, and memory usage, we could confirm the general trend: faster programs consume less energy. Moreover, we were able to use ordinary least squares to fit a linear function, with good precision, that relates energy consumption of a program to its execution time, as well as to automatically identify programs with abnormal energy consumption. A manual analysis of these programs revealed that often they perform a different amount of allocation and deallocation operations when compared to programs with similar execution times.
We also calculated the energy consumption profile of sets of random C++ solutions for these 15 CSES problems, and we tried to associate each set with its corresponding CSES problem by using the energy consumption profiles previously computed for each one of them. By using this approach, we could restrict, for each set of random C++ solutions, the classification task to a subset of 7 CSES problems, a reduction of more than 50% in the search space.
{"title":"Investigating the energy consumption of C++ and Java solutions mined from a programming contest site","authors":"Sérgio Queiroz de Medeiros, Marcelo Borges Nogueira, Gustavo Quezado","doi":"10.1016/j.cola.2025.101341","DOIUrl":"10.1016/j.cola.2025.101341","url":null,"abstract":"<div><div>The current concern about global warming has led to an increasing interest in the energy efficiency of computer applications. Assuming power is constant, the general trend is that faster programs consume less energy, thus optimizing a program for speed would also improve its energy efficiency.</div><div>We investigate this tendency in a set of C++ and Java solutions mined from Code Submission Evaluation System (CSES), a popular programming competition site, where each solution must give the correct answer under a given time limit. In such context, we can consider that all correct solutions for a problem were written with a speed concern, but not with energy efficiency in mind.</div><div>We selected 15 problems from CSES and for each of them we mined at least 30 C++ and Java solutions, evaluating time and energy efficiency of each solution in at least two different machines. In our scenario, where there is a great diversity of programming styles, execution speed, and memory usage, we could confirm the general trend: faster programs consume less energy. Moreover, we were able to use ordinary least squares to fit a linear function, with good precision, that relates energy consumption of a program to its execution time, as well as to automatically identify programs with abnormal energy consumption. A manual analysis of these programs revealed that often they perform a different amount of allocation and deallocation operations when compared to programs with similar execution times.</div><div>We also calculated the energy consumption profile of sets of random C++ solutions for these 15 CSES problems, and we tried to associate each set with its corresponding CSES problem by using the energy consumption profiles previously computed for each one of them. By using this approach, we could restrict, for each set of random C++ solutions, the classification task to a subset of 7 CSES problems, a reduction of more than 50% in the search space.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101341"},"PeriodicalIF":1.7,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144308119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}