Pub Date : 2024-03-01DOI: 10.1007/s10515-024-00418-z
Yang Liu, Chao Wang, Yan Ma
Smart contract is a new paradigm for the decentralized software system, which plays an important and key role in Blockchain-based application. The vulnerabilities in smart contracts are unacceptable, and some of which have caused significant economic losses. The machine learning, especially deep learning, is a very promising and potential approach to vulnerability detecting for smart contracts. At present, deep learning-based vulnerability detection methods have low accuracy, time-consuming, and too small application range. For dealing with these, we propose a novel deep learning-based vulnerability detection framework for smart contracts at opcode level, named as DL4SC. It orthogonally combines the Transformer encoder and CNN (convolutional neural networks) to detect vulnerabilities of smart contracts for the first time, and firstly exploit SSA (sparrow search algorithm) to automatically search model hyperparameters for vulnerability detection. We implement the framework DL4SC on deep learning platform Pytorch with Python, and compare it with existing works on the three public datasets and one dataset we collect. The experiment results show that DL4SC can accurately detect vulnerabilities of smart contracts, and performs better than state-of-the-art works for detecting vulnerabilities in smart contracts. The accuracy and F1-score of DL4SC are 95.29% and 95.68%, respectively.
{"title":"DL4SC: a novel deep learning-based vulnerability detection framework for smart contracts","authors":"Yang Liu, Chao Wang, Yan Ma","doi":"10.1007/s10515-024-00418-z","DOIUrl":"10.1007/s10515-024-00418-z","url":null,"abstract":"<div><p>Smart contract is a new paradigm for the decentralized software system, which plays an important and key role in Blockchain-based application. The vulnerabilities in smart contracts are unacceptable, and some of which have caused significant economic losses. The machine learning, especially deep learning, is a very promising and potential approach to vulnerability detecting for smart contracts. At present, deep learning-based vulnerability detection methods have low accuracy, time-consuming, and too small application range. For dealing with these, we propose a novel deep learning-based vulnerability detection framework for smart contracts at opcode level, named as DL4SC. It orthogonally combines the Transformer encoder and CNN (convolutional neural networks) to detect vulnerabilities of smart contracts for the first time, and firstly exploit SSA (sparrow search algorithm) to automatically search model hyperparameters for vulnerability detection. We implement the framework DL4SC on deep learning platform Pytorch with Python, and compare it with existing works on the three public datasets and one dataset we collect. The experiment results show that DL4SC can accurately detect vulnerabilities of smart contracts, and performs better than state-of-the-art works for detecting vulnerabilities in smart contracts. The accuracy and F1-score of DL4SC are 95.29% and 95.68%, respectively.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-29DOI: 10.1007/s10515-024-00422-3
Maha Ayub, Muhammad Waiz Khan, Muhammmad Umar Janjua
With the addition of multiple blockchain platforms in the ecosystem, the Dapp owners need to migrate their smart contracts from one platform to another to remain competitive, cost-effective, and secure. A smart contract is a piece of code that contains logic and data. To migrate a smart contract, whether it’s on the same blockchain platform or a different one, we need both its source code that represents the logic and data that indicate the state of the contract. The source code can be easily set up, but to complete the migration, we have to extract the current state of the contract. In this paper, we have developed an advanced state extraction technique that uses static analysis to analyze the smart contract’s call graph and events, and extracts the entire storage state from the storage trie, along with the proper associations across function calls, enabling users to visualize, manage, and transform the state as desired for migration. The soundness of the extracted state was confirmed using the method of abstract interpretation. Further, the migration adapter is designed to transform the extracted state into slot-value pairs and migrate it to the target blockchain. Using our new approach, we were able to completely analyze 14% more smart contracts with the extraction of 53% more data by analyzing function calls and event logs from 67,993 contracts and also migrated some contracts to the multiple popular EVM-compatible blockchains.
{"title":"Sound analysis and migration of data from Ethereum smart contracts","authors":"Maha Ayub, Muhammad Waiz Khan, Muhammmad Umar Janjua","doi":"10.1007/s10515-024-00422-3","DOIUrl":"10.1007/s10515-024-00422-3","url":null,"abstract":"<div><p>With the addition of multiple blockchain platforms in the ecosystem, the Dapp owners need to migrate their smart contracts from one platform to another to remain competitive, cost-effective, and secure. A smart contract is a piece of code that contains logic and data. To migrate a smart contract, whether it’s on the same blockchain platform or a different one, we need both its source code that represents the logic and data that indicate the state of the contract. The source code can be easily set up, but to complete the migration, we have to extract the current state of the contract. In this paper, we have developed an advanced state extraction technique that uses static analysis to analyze the smart contract’s call graph and events, and extracts the entire storage state from the storage trie, along with the proper associations across function calls, enabling users to visualize, manage, and transform the state as desired for migration. The soundness of the extracted state was confirmed using the method of abstract interpretation. Further, the migration adapter is designed to transform the extracted state into slot-value pairs and migrate it to the target blockchain. Using our new approach, we were able to completely analyze 14% more smart contracts with the extraction of 53% more data by analyzing function calls and event logs from 67,993 contracts and also migrated some contracts to the multiple popular EVM-compatible blockchains.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-28DOI: 10.1007/s10515-024-00419-y
Kevin Lano, Hanan Siala
The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed semantic models of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.
{"title":"Using model-driven engineering to automate software language translation","authors":"Kevin Lano, Hanan Siala","doi":"10.1007/s10515-024-00419-y","DOIUrl":"10.1007/s10515-024-00419-y","url":null,"abstract":"<div><p>The porting or translation of software applications from one programming language to another is a common requirement of organisations that utilise software, and the increasing number and diversity of programming languages makes this capability as relevant today as in previous decades. Several approaches have been used to address this challenge, including machine learning and the manual definition of direct language-to-language translation rules, however the accuracy of these approaches remains unsatisfactory. In this paper we describe a new approach to program translation using model-driven engineering techniques: reverse-engineering source programs into specifications in the UML and OCL formalisms, and then forward-engineering the specifications to the required target language. This approach can provide assurance of semantic preservation, and additionally has the advantage of extracting precise specifications of software from code. We provide an evaluation based on a comprehensive dataset of examples, including industrial cases, and compare our results to those of other approaches and tools. Our specific contributions are: (1) Reverse-engineering source programs to detailed <i>semantic models</i> of software behaviour, to enable semantically-correct translations and reduce re-testing costs; (2) Program abstraction processes defined by precise and explicit rules, which can be edited and configured by users; (3) A set of reusable OCL library components appropriate for representing program semantics, and which can also be used for OCL specification of new applications; (4) A systematic procedure for building program abstractors based on language grammars and semantics.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00419-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-27DOI: 10.1007/s10515-024-00424-1
Zhiqiang Li, Jingwen Niu, Xiao-Yuan Jing
Software defect prediction is one of the most popular research topics in software engineering. The objective of defect prediction is to identify defective instances prior to the occurrence of software defects, thus it aids in more effectively prioritizing software quality assurance efforts. In this article, we delve into various prospective research directions and potential challenges in the field of defect prediction. The aim of this article is to propose a range of defect prediction techniques and methodologies for the future. These ideas are intended to enhance the practicality, explainability, and actionability of the predictions of defect models.
{"title":"Software defect prediction: future directions and challenges","authors":"Zhiqiang Li, Jingwen Niu, Xiao-Yuan Jing","doi":"10.1007/s10515-024-00424-1","DOIUrl":"10.1007/s10515-024-00424-1","url":null,"abstract":"<div><p>Software defect prediction is one of the most popular research topics in software engineering. The objective of defect prediction is to identify defective instances prior to the occurrence of software defects, thus it aids in more effectively prioritizing software quality assurance efforts. In this article, we delve into various prospective research directions and potential challenges in the field of defect prediction. The aim of this article is to propose a range of defect prediction techniques and methodologies for the future. These ideas are intended to enhance the practicality, explainability, and actionability of the predictions of defect models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139988033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-23DOI: 10.1007/s10515-024-00416-1
Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy
Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose ReBack (Recommending Backports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.
{"title":"ReBack: recommending backports in social coding environments","authors":"Debasish Chakroborti, Kevin A. Schneider, Chanchal K. Roy","doi":"10.1007/s10515-024-00416-1","DOIUrl":"10.1007/s10515-024-00416-1","url":null,"abstract":"<div><p>Pull-based development is widely used in popular social coding environments like GitHub and GitLab for both internal and external contributions. When critical bug fixes or features are committed to the main branch of a project, it is often desirable to also port those changes to other stable branches. This process is referred to as backporting, and pull-requests in the process are known as backports. Backports are typically determined after extensive discussion with collaborators, and it may take many days to identify backports, which commonly results in tags and references to the original pull-requests (i.e., pull-requests for the main branch) being missed. To help software development teams better identify and manage backports, we propose <b>ReBack</b> (<b>Re</b>commending <b>Back</b>ports), a tool based on a deep-learning model for automatically identifying backports from pull-requests and related reviews, discussions, metadata, and committed code. ReBack predicted backports with 90.98% precision and 91.81% recall from 80,000 pull-requests in 17 GitHub projects. Although the results are promising, more research is required to further support backporting, including research into automatically porting a pull-request to further reduce costs when managing software versions and branches.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-21DOI: 10.1007/s10515-024-00417-0
Maryam Asgari Araghi, Vahid Rafe, Ferhat Khendek
Software testing plays a crucial role in enhancing software quality. A significant portion of the time and cost in software development is dedicated to testing. Automation, particularly in generating test cases, can greatly reduce the cost. Model-based testing aims at generating automatically test cases from models. Several model based approaches use model checking tools to automate test case generation. However, this technique faces challenges such as state space explosion and duplication of test cases. This paper introduces a novel solution based on data mining algorithms for systems specified using graph transformation systems. To overcome the aforementioned challenges, the proposed method wisely explores only a portion of the state space based on test objectives. The proposed method is implemented using the GROOVE tool set for model-checking graph transformation systems specifications. Empirical results on widely used case studies in service-oriented architecture as well as a comparison with related state-of-the-art techniques demonstrate the efficiency and superiority of the proposed approach in terms of coverage and test suite size.
{"title":"Using data mining techniques to generate test cases from graph transformation systems specifications","authors":"Maryam Asgari Araghi, Vahid Rafe, Ferhat Khendek","doi":"10.1007/s10515-024-00417-0","DOIUrl":"10.1007/s10515-024-00417-0","url":null,"abstract":"<div><p>Software testing plays a crucial role in enhancing software quality. A significant portion of the time and cost in software development is dedicated to testing. Automation, particularly in generating test cases, can greatly reduce the cost. Model-based testing aims at generating automatically test cases from models. Several model based approaches use model checking tools to automate test case generation. However, this technique faces challenges such as state space explosion and duplication of test cases. This paper introduces a novel solution based on data mining algorithms for systems specified using graph transformation systems. To overcome the aforementioned challenges, the proposed method wisely explores only a portion of the state space based on test objectives. The proposed method is implemented using the GROOVE tool set for model-checking graph transformation systems specifications. Empirical results on widely used case studies in service-oriented architecture as well as a comparison with related state-of-the-art techniques demonstrate the efficiency and superiority of the proposed approach in terms of coverage and test suite size.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-18DOI: 10.1007/s10515-024-00415-2
Davide Di Ruscio, Paola Inverardi, Patrizio Migliarini, Phuong T. Nguyen
Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. Users need to properly manage these settings to protect personally identifiable information and express personal ethical preferences. This has shown to be very difficult for several concurrent reasons. However, profiling technologies can also empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. We focus on characterizing/collecting users’ privacy preferences and contribute a step in this direction through an empirical study on an existing dataset collected from the fitness domain. We aim to understand which set of questions is more appropriate to differentiate users according to their privacy preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.
{"title":"Leveraging privacy profiles to empower users in the digital society","authors":"Davide Di Ruscio, Paola Inverardi, Patrizio Migliarini, Phuong T. Nguyen","doi":"10.1007/s10515-024-00415-2","DOIUrl":"10.1007/s10515-024-00415-2","url":null,"abstract":"<div><p>Protecting privacy and ethics of citizens is among the core concerns raised by an increasingly digital society. Profiling users is common practice for software applications triggering the need for users, also enforced by laws, to manage privacy settings properly. Users need to properly manage these settings to protect personally identifiable information and express personal ethical preferences. This has shown to be very difficult for several concurrent reasons. However, profiling technologies can also empower users in their interaction with the digital world by reflecting personal ethical preferences and allowing for automatizing/assisting users in privacy settings. In this way, if properly reflecting users’ preferences, privacy profiling can become a key enabler for a trustworthy digital society. We focus on characterizing/collecting users’ privacy preferences and contribute a step in this direction through an empirical study on an existing dataset collected from the fitness domain. We aim to understand which set of questions is more appropriate to differentiate users according to their privacy preferences. The results reveal that a compact set of semantic-driven questions (about domain-independent privacy preferences) helps distinguish users better than a complex domain-dependent one. Based on the outcome, we implement a recommender system to provide users with suitable recommendations related to privacy choices. We then show that the proposed recommender system provides relevant settings to users, obtaining high accuracy.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00415-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1007/s10515-024-00413-4
Rongcun Wang, Senlei Xu, Xingyu Ji, Yuan Tian, Lina Gong, Ke Wang
Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.
{"title":"An extensive study of the effects of different deep learning models on code vulnerability detection in Python code","authors":"Rongcun Wang, Senlei Xu, Xingyu Ji, Yuan Tian, Lina Gong, Ke Wang","doi":"10.1007/s10515-024-00413-4","DOIUrl":"10.1007/s10515-024-00413-4","url":null,"abstract":"<div><p>Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of <i>precision</i>, <i>recall</i>, and <i>F</i>-<i>score</i>, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-19DOI: 10.1007/s10515-023-00411-y
Xiaoning Shen, Chengbin Yao, Liyan Song, Jiyong Xu, Mingjian Mao
In the process of software project development, completing tasks may require new skills that employees have not yet mastered due to factors such as requirement changes. However, existing studies on software project scheduling usually overlook such new skill demands. This paper designs the learning mechanism targeting the treatment of new skills for project employees, including how to select appropriate employees to learn new skills, the growth curves of new skill proficiencies and the adaptive dedication changes for the selected employees. Three common dynamic events are considered to establish a mathematical model for the dynamic software project scheduling problem considering the new skill learning. To solve the model, a multi-population coevolutionary algorithm-based predictive-reactive scheduling method is proposed in this paper. Three novel strategies are incorporated, which include a response mechanism to environmental changes, a population grouping strategy based on dual indicators, and a dynamic allocation of subpopulation size according to the variation trend of contribution. Systematic experimental results based on ten synthetic instances and three real-world instances show that when dynamic events occur, the proposed algorithm can quickly reschedule the tasks with a better duration, cost and stability compared with six state-of-the-art algorithms, helping project manager make a more informed decision.
{"title":"Coevolutionary scheduling of dynamic software project considering the new skill learning","authors":"Xiaoning Shen, Chengbin Yao, Liyan Song, Jiyong Xu, Mingjian Mao","doi":"10.1007/s10515-023-00411-y","DOIUrl":"10.1007/s10515-023-00411-y","url":null,"abstract":"<div><p>In the process of software project development, completing tasks may require new skills that employees have not yet mastered due to factors such as requirement changes. However, existing studies on software project scheduling usually overlook such new skill demands. This paper designs the learning mechanism targeting the treatment of new skills for project employees, including how to select appropriate employees to learn new skills, the growth curves of new skill proficiencies and the adaptive dedication changes for the selected employees. Three common dynamic events are considered to establish a mathematical model for the dynamic software project scheduling problem considering the new skill learning. To solve the model, a multi-population coevolutionary algorithm-based predictive-reactive scheduling method is proposed in this paper. Three novel strategies are incorporated, which include a response mechanism to environmental changes, a population grouping strategy based on dual indicators, and a dynamic allocation of subpopulation size according to the variation trend of contribution. Systematic experimental results based on ten synthetic instances and three real-world instances show that when dynamic events occur, the proposed algorithm can quickly reschedule the tasks with a better duration, cost and stability compared with six state-of-the-art algorithms, helping project manager make a more informed decision.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-11DOI: 10.1007/s10515-023-00409-6
Marco Gerosa, Bianca Trinkenreich, Igor Steinmacher, Anita Sarma
Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.
{"title":"Can AI serve as a substitute for human subjects in software engineering research?","authors":"Marco Gerosa, Bianca Trinkenreich, Igor Steinmacher, Anita Sarma","doi":"10.1007/s10515-023-00409-6","DOIUrl":"10.1007/s10515-023-00409-6","url":null,"abstract":"<div><p>Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139424085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}