Pub Date : 2025-02-05DOI: 10.1016/j.cola.2025.101324
Akshay M. Fajge, Raju Halder
This paper introduces Near-PrunedSSA, a novel variant of the SSA form that attains precision close to the Pruned version while prioritizing its efficient generation without the need for costly data flow analysis. This is realized by leveraging variables’ usage information within the program’s augmentedCFG. Furthermore, we propose a direct method for generating DSA form of programs that bypasses the traditional process of -node destruction into its immediate predecessor-blocks, thereby streamlining the process. Experimental evaluation on a range of Solidity programs, including real-world smart contracts deployed on the Ethereum mainnet, demonstrates that our method outperforms existing SSA variants, except for the Pruned version, by minimizing the number of introduced -statements compared to state-of-the-art techniques. In particular, the proposed Near-Pruned variant demonstrates a computational cost that is approximately one-third of that of the Pruned variant while achieving a nearly 92% reduction in the introduction of additional statements compared to the Semi-Pruned variant.
{"title":"Near-Pruned single assignment transformation of programs","authors":"Akshay M. Fajge, Raju Halder","doi":"10.1016/j.cola.2025.101324","DOIUrl":"10.1016/j.cola.2025.101324","url":null,"abstract":"<div><div>This paper introduces <span>Near-Pruned</span> <span>SSA</span>, a novel variant of the <span>SSA</span> form that attains precision close to the <span>Pruned</span> version while prioritizing its efficient generation without the need for costly data flow analysis. This is realized by leveraging variables’ usage information within the program’s <em>augmented</em> <span>CFG</span>. Furthermore, we propose a direct method for generating <span>DSA</span> form of programs that bypasses the traditional process of <span><math><mi>ϕ</mi></math></span>-node destruction into its immediate predecessor-blocks, thereby streamlining the process. Experimental evaluation on a range of <em>Solidity</em> programs, including <em>real-world</em> smart contracts deployed on the <em>Ethereum mainnet</em>, demonstrates that our method outperforms existing <span>SSA</span> variants, except for the <span>Pruned</span> version, by minimizing the number of introduced <span><math><mi>ϕ</mi></math></span>-statements compared to <em>state-of-the-art</em> techniques. In particular, the proposed <span>Near-Pruned</span> variant demonstrates a computational cost that is approximately one-third of that of the <span>Pruned</span> variant while achieving a nearly 92% reduction in the introduction of additional statements compared to the <span>Semi-Pruned</span> variant.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101324"},"PeriodicalIF":1.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<div><h3>Context:</h3><div>The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called <strong>Anti-patterns</strong>. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.</div></div><div><h3>Objective:</h3><div>The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’<em>how these metrics accurately predict different types of Anti-patterns present in the WSDL file</em>’.</div></div><div><h3>Methods:</h3><div>To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.</div></div><div><h3>Results:</h3><div>The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.</div></div><div><h3>Conclusion:</h3><div>The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental result
{"title":"MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics","authors":"Lov Kumar , Vikram Singh , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra","doi":"10.1016/j.cola.2025.101322","DOIUrl":"10.1016/j.cola.2025.101322","url":null,"abstract":"<div><h3>Context:</h3><div>The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called <strong>Anti-patterns</strong>. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.</div></div><div><h3>Objective:</h3><div>The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’<em>how these metrics accurately predict different types of Anti-patterns present in the WSDL file</em>’.</div></div><div><h3>Methods:</h3><div>To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.</div></div><div><h3>Results:</h3><div>The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.</div></div><div><h3>Conclusion:</h3><div>The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental result","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101322"},"PeriodicalIF":1.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1016/j.cola.2024.101313
Vo Thien Tri Pham, Caitlin Kelleher
Developers frequently encounter challenges when working with large code bases found in modern software applications, from navigating through files to more complex tasks like understanding code histories, dependencies, and evolutions. While many applications use Version Control Systems (VCSs) to archive present-day programs and provide a historical perspective on code development, the level of detail they offer is often insufficient for in-depth analyses. As a result, it becomes difficult to fully explore the potential benefits of historical data in software development. We introduce an enhanced recording framework that integrates both the Visual Studio Code (VS Code) development environment and the Google Chrome web browser to capture more detailed development activities. Our framework is designed to offer additional recording options, thereby providing researchers with more opportunities to study how different historical resources can be utilized. Through an observational study, we demonstrate the utility of our framework in capturing the complex dynamics of code change activities, highlighting its potential value in both academic and practical contexts.
{"title":"Code histories: Documenting development by recording code influences and changes in code","authors":"Vo Thien Tri Pham, Caitlin Kelleher","doi":"10.1016/j.cola.2024.101313","DOIUrl":"10.1016/j.cola.2024.101313","url":null,"abstract":"<div><div>Developers frequently encounter challenges when working with large code bases found in modern software applications, from navigating through files to more complex tasks like understanding code histories, dependencies, and evolutions. While many applications use Version Control Systems (VCSs) to archive present-day programs and provide a historical perspective on code development, the level of detail they offer is often insufficient for in-depth analyses. As a result, it becomes difficult to fully explore the potential benefits of historical data in software development. We introduce an enhanced recording framework that integrates both the Visual Studio Code (VS Code) development environment and the Google Chrome web browser to capture more detailed development activities. Our framework is designed to offer additional recording options, thereby providing researchers with more opportunities to study how different historical resources can be utilized. Through an observational study, we demonstrate the utility of our framework in capturing the complex dynamics of code change activities, highlighting its potential value in both academic and practical contexts.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101313"},"PeriodicalIF":1.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24DOI: 10.1016/j.cola.2024.101314
Mohamed Amine Daoud , Sid Ahmed Mokhtar Mostefaoui , Abdelkader Ouared , Hadj Madani Meghazi , Bendaoud Mebarek , Abdelkader Bouguessa , Hasan Ahmed
Creating an intrusion detection system (IDS) is a prominent area of research that continuously draws attention from both scholars and practitioners who tirelessly innovate new solutions. The complexity of IDS naturally escalates alongside technological advancements, whether they are manually implemented within security infrastructures or elaborated upon in academic literature. However, accessing and comparing these IDS solutions requires sifting through a multitude of hypotheses presented in research papers, which is a laborious and error-prone endeavor. Consequently, many researchers encounter difficulties in replicating results or reanalyzing published IDSs. This challenge primarily arises due to the absence of a standardized process for elucidating IDS methodologies. In response, this paper advocates for a framework aimed at enhancing the reproducibility of IDS outcomes, thereby enabling their seamless reuse across diverse cybersecurity contexts, benefiting both end-users and experts alike. The proposed framework introduces a descriptive language for the precise specification of IDS descriptions. Additionally, a model repository facilitates the sharing and reusability of IDS configurations. Lastly, through a case study, we showcase the effectiveness of our framework in addressing challenges associated with data acquisition and knowledge organization and sharing. Our results demonstrate satisfactory prediction accuracy for configuration reuse and precise identification of reusable components.
{"title":"A comprehensive meta-analysis of efficiency and effectiveness in the detection community","authors":"Mohamed Amine Daoud , Sid Ahmed Mokhtar Mostefaoui , Abdelkader Ouared , Hadj Madani Meghazi , Bendaoud Mebarek , Abdelkader Bouguessa , Hasan Ahmed","doi":"10.1016/j.cola.2024.101314","DOIUrl":"10.1016/j.cola.2024.101314","url":null,"abstract":"<div><div>Creating an intrusion detection system (IDS) is a prominent area of research that continuously draws attention from both scholars and practitioners who tirelessly innovate new solutions. The complexity of IDS naturally escalates alongside technological advancements, whether they are manually implemented within security infrastructures or elaborated upon in academic literature. However, accessing and comparing these IDS solutions requires sifting through a multitude of hypotheses presented in research papers, which is a laborious and error-prone endeavor. Consequently, many researchers encounter difficulties in replicating results or reanalyzing published IDSs. This challenge primarily arises due to the absence of a standardized process for elucidating IDS methodologies. In response, this paper advocates for a framework aimed at enhancing the reproducibility of IDS outcomes, thereby enabling their seamless reuse across diverse cybersecurity contexts, benefiting both end-users and experts alike. The proposed framework introduces a descriptive language for the precise specification of IDS descriptions. Additionally, a model repository facilitates the sharing and reusability of IDS configurations. Lastly, through a case study, we showcase the effectiveness of our framework in addressing challenges associated with data acquisition and knowledge organization and sharing. Our results demonstrate satisfactory prediction accuracy for configuration reuse and precise identification of reusable components.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101314"},"PeriodicalIF":1.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05DOI: 10.1016/j.cola.2024.101312
Kanika Soni, Shelly Sachdeva
Almost all human endeavors in the era of the digital revolution, from commercial and industrial processes to scientific and medical research, depend on the use of ever-increasing amounts of data. However, this humungous data and its complexity make data exploration and querying challenging even for experts. This led to the demand for easy access to data, even for naive users, all the more evident. Considering this, the database community has tilted toward NoSQL Data stores. While there has been much study on query formulation assistance for NoSQL data stores, many users still want help when specifying complex queries (such as aggregation pipeline queries), which require an in-depth understanding of the data storage architecture of a specific NoSQL data store. To help users perform interactive browsing and navigation in NoSQL data stores (MongoDB), this paper proposes a novel, simple, and user-friendly interface, MTable, that provides users with a presentation-level interactive view. This view compactly presents the query results from multiple embedded documents within a single tabular format compared to MongoDB's find operation, which always returns the main document. A certain cell of the MTable contains clickable hyperlinks for users to interact directly with the data persisted in the document stores. This helps the users to incrementally construct complex queries and navigate the document stores without worrying about the tedious task of writing complex queries. In a user study, participants performed various querying tasks faster with MTable than with the traditional querying mechanism. MTable has received positive subjective feedback as well.
{"title":"MTable: Visual query interface for browsing and navigation in NoSQL data stores","authors":"Kanika Soni, Shelly Sachdeva","doi":"10.1016/j.cola.2024.101312","DOIUrl":"10.1016/j.cola.2024.101312","url":null,"abstract":"<div><div>Almost all human endeavors in the era of the digital revolution, from commercial and industrial processes to scientific and medical research, depend on the use of ever-increasing amounts of data. However, this humungous data and its complexity make data exploration and querying challenging even for experts. This led to the demand for easy access to data, even for naive users, all the more evident. Considering this, the database community has tilted toward NoSQL Data stores. While there has been much study on query formulation assistance for NoSQL data stores, many users still want help when specifying complex queries (such as aggregation pipeline queries), which require an in-depth understanding of the data storage architecture of a specific NoSQL data store. To help users perform interactive browsing and navigation in NoSQL data stores (MongoDB), this paper proposes a novel, simple, and user-friendly interface, MTable, that provides users with a presentation-level interactive view. This view compactly presents the query results from multiple embedded documents within a single tabular format compared to MongoDB's find operation, which always returns the main document. A certain cell of the MTable contains clickable hyperlinks for users to interact directly with the data persisted in the document stores. This helps the users to incrementally construct complex queries and navigate the document stores without worrying about the tedious task of writing complex queries. In a user study, participants performed various querying tasks faster with MTable than with the traditional querying mechanism. MTable has received positive subjective feedback as well.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101312"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual-based programming languages that facilitate block-based coding have gained popularity as introductory methods for learning programming. Conversely, programming experts typically use text-based programming languages like C and Java. Nevertheless, a seamless method for transitioning from a visual- to text-based language has yet to be developed. Therefore, our research project aims to develop a methodology that facilitates this transition by bridging the gap between the two languages and verifying the variations in the biometric information of learners of both languages. In this study, we measured the participants’ heart rate variability (HRV) and evaluated variations in mental stress experienced while learning visual- and text-based languages. The experimental results confirmed that participants proficient in text-based languages experienced lower HRV (indicating higher stress levels) when learning visual-based languages. Conversely, those poorly proficient in text-based languages exhibited higher HRVs (indicating more favorable stress levels) while learning text-based languages. This study successfully observed differences in stress levels while learning both language types using experimental methods. These findings serve as a preliminary step toward clarifying the impact of stress experienced during learning outcomes and identifying the factors that constitute beneficial stress. This study establishes a foundation for an intermediate language that can enhance transitions between the two types of languages.
{"title":"Mental stress analysis by measuring heart rate variability during learning programming: Comparison of visual- and text-based languages","authors":"Katsuyuki Umezawa , Takumi Koshikawa , Makoto Nakazawa , Shigeichi Hirasawa","doi":"10.1016/j.cola.2024.101311","DOIUrl":"10.1016/j.cola.2024.101311","url":null,"abstract":"<div><div>Visual-based programming languages that facilitate block-based coding have gained popularity as introductory methods for learning programming. Conversely, programming experts typically use text-based programming languages like C and Java. Nevertheless, a seamless method for transitioning from a visual- to text-based language has yet to be developed. Therefore, our research project aims to develop a methodology that facilitates this transition by bridging the gap between the two languages and verifying the variations in the biometric information of learners of both languages. In this study, we measured the participants’ heart rate variability (HRV) and evaluated variations in mental stress experienced while learning visual- and text-based languages. The experimental results confirmed that participants proficient in text-based languages experienced lower HRV (indicating higher stress levels) when learning visual-based languages. Conversely, those poorly proficient in text-based languages exhibited higher HRVs (indicating more favorable stress levels) while learning text-based languages. This study successfully observed differences in stress levels while learning both language types using experimental methods. These findings serve as a preliminary step toward clarifying the impact of stress experienced during learning outcomes and identifying the factors that constitute beneficial stress. This study establishes a foundation for an intermediate language that can enhance transitions between the two types of languages.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101311"},"PeriodicalIF":1.7,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.cola.2024.101300
Jan Blizničenko, Robert Pergl
This paper explores how to reconstruct UML diagrams from dynamically typed languages such as Smalltalk, which do not use explicit type information. This lack of information makes traditional methods for extracting associations difficult. It addresses the need for automated techniques, particularly in legacy software systems, to facilitate their transformation into modern technologies, focusing on Smalltalk as a case study due to its extensive industrial legacy and modern adaptations like Pharo. We propose a way to create UML diagrams from Smalltalk code, focusing on using type inference to determine UML associations. For optimal outcomes for large-scale software systems, we recommend combining different type inference methods in an automatic or semi-automatic way.
{"title":"Combining type inference techniques for semi-automatic UML generation from Pharo code","authors":"Jan Blizničenko, Robert Pergl","doi":"10.1016/j.cola.2024.101300","DOIUrl":"10.1016/j.cola.2024.101300","url":null,"abstract":"<div><div>This paper explores how to reconstruct UML diagrams from dynamically typed languages such as Smalltalk, which do not use explicit type information. This lack of information makes traditional methods for extracting associations difficult. It addresses the need for automated techniques, particularly in legacy software systems, to facilitate their transformation into modern technologies, focusing on Smalltalk as a case study due to its extensive industrial legacy and modern adaptations like Pharo. We propose a way to create UML diagrams from Smalltalk code, focusing on using type inference to determine UML associations. For optimal outcomes for large-scale software systems, we recommend combining different type inference methods in an automatic or semi-automatic way.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101300"},"PeriodicalIF":1.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1016/j.cola.2024.101301
Manpreet Singh, Jitender Kumar Chhabra
SVM is limited in its use for cross-project software defect prediction because of its very slow training process. So, this research article proposes a new instance selection (IS) algorithm called boundary detection among classes (BDAC) to reduce the training dataset size for faster training of SVM without degrading the prediction performance. The proposed algorithm is evaluated against six existing IS algorithms based on accuracy, running time, data reduction rate, etc. using 23 general datasets, 18 software defect prediction datasets, and two shape-based datasets, and results prove that BDAC is better than the selected algorithm based on collective comparison.
{"title":"An efficient instance selection algorithm for fast training of support vector machine for cross-project software defect prediction pairs","authors":"Manpreet Singh, Jitender Kumar Chhabra","doi":"10.1016/j.cola.2024.101301","DOIUrl":"10.1016/j.cola.2024.101301","url":null,"abstract":"<div><div>SVM is limited in its use for cross-project software defect prediction because of its very slow training process. So, this research article proposes a new instance selection (IS) algorithm called boundary detection among classes (BDAC) to reduce the training dataset size for faster training of SVM without degrading the prediction performance. The proposed algorithm is evaluated against six existing IS algorithms based on accuracy, running time, data reduction rate, etc. using 23 general datasets, 18 software defect prediction datasets, and two shape-based datasets, and results prove that BDAC is better than the selected algorithm based on collective comparison.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101301"},"PeriodicalIF":1.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-21DOI: 10.1016/j.cola.2024.101299
Alex Holmquist , Vitor Emanuel , Fernando C. Alves , Fernando Magno Quintão Pereira
A string event is a pattern that occurs in a stream of characters. The need to detect and handle string events in infinite texts emerges in many scenarios, including online treatment of logs, web crawling, and syntax highlighting. This paper describes a technique to specify and treat string events. Users determine patterns of interest via a markup language. From such examples, tokens are generalized via a semi-lattice of regular expressions. Such tokens are combined into a context-free language that recognizes patterns in the text stream. These techniques are implemented in a text processing system called Lushu, which runs on the Java Virtual Machine (JVM). Lushu intercepts strings emitted by the JVM. Once patterns are detected, it invokes a user-specified action handler. As a proof of concept, this paper shows that Lushu outperforms state-of-the-art parsers and parser generators, such as Comby, BeautifulSoup4 and ZheFuscator, in terms of memory consumption and running time.
{"title":"Detection and treatment of string events in the limit","authors":"Alex Holmquist , Vitor Emanuel , Fernando C. Alves , Fernando Magno Quintão Pereira","doi":"10.1016/j.cola.2024.101299","DOIUrl":"10.1016/j.cola.2024.101299","url":null,"abstract":"<div><div>A string event is a pattern that occurs in a stream of characters. The need to detect and handle string events in infinite texts emerges in many scenarios, including online treatment of logs, web crawling, and syntax highlighting. This paper describes a technique to specify and treat string events. Users determine patterns of interest via a markup language. From such examples, tokens are generalized via a semi-lattice of regular expressions. Such tokens are combined into a context-free language that recognizes patterns in the text stream. These techniques are implemented in a text processing system called <span>Lushu</span>, which runs on the Java Virtual Machine (JVM). <span>Lushu</span> intercepts strings emitted by the JVM. Once patterns are detected, it invokes a user-specified action handler. As a proof of concept, this paper shows that <span>Lushu</span> outperforms state-of-the-art parsers and parser generators, such as <span>Comby</span>, <span>BeautifulSoup4</span> and <span>ZheFuscator</span>, in terms of memory consumption and running time.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101299"},"PeriodicalIF":1.7,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10DOI: 10.1016/j.cola.2024.101298
Paul Keir , Andrew Gozillon
Interest in metaprogramming, reflection, and compile-time evaluation continues to inspire and foster innovation among the users and designers of the C++ programming language. Regrettably, the impact on compile-times of such features can be significant; and outside of build systems, multi-core parallelism is unable to bring down compilation times of individual translation units. We present ClangOz, a novel Clang-based research compiler that addresses this issue by evaluating annotated constant expressions in parallel, thereby reducing compilation times. Prior benchmarks analyzed parallel map operations, but were unable to consider reduction operations. Thus we also introduce parallel reduction functionality, alongside two additional benchmark programs.
对元编程、反射和编译时评估的兴趣不断激发和促进 C++ 编程语言用户和设计者的创新。遗憾的是,这些功能对编译时间的影响可能很大;在构建系统之外,多核并行性无法降低单个翻译单元的编译时间。我们介绍的 ClangOz 是一种基于 Clang 的新型研究编译器,它通过并行评估注释常量表达式来解决这一问题,从而缩短编译时间。之前的基准分析了并行映射操作,但无法考虑还原操作。因此,我们还引入了并行还原功能以及两个额外的基准程序。
{"title":"ClangOz: Parallel constant evaluation of C++ map and reduce operations","authors":"Paul Keir , Andrew Gozillon","doi":"10.1016/j.cola.2024.101298","DOIUrl":"10.1016/j.cola.2024.101298","url":null,"abstract":"<div><div>Interest in metaprogramming, reflection, and compile-time evaluation continues to inspire and foster innovation among the users and designers of the C++ programming language. Regrettably, the impact on compile-times of such features can be significant; and outside of build systems, multi-core parallelism is unable to bring down compilation times of individual translation units. We present ClangOz, a novel Clang-based research compiler that addresses this issue by evaluating annotated constant expressions in parallel, thereby reducing compilation times. Prior benchmarks analyzed parallel map operations, but were unable to consider reduction operations. Thus we also introduce parallel reduction functionality, alongside two additional benchmark programs.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101298"},"PeriodicalIF":1.7,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}