Inf. Softw. Technol.最新文献

英文中文

Robustness assessment of hyperspectral image CNNs using metamorphic testing 基于变质测试的高光谱图像cnn鲁棒性评估

Inf. Softw. Technol.

Pub Date : 2023-10-01 DOI: 10.2139/ssrn.4102952

Rached Bouchoucha, Houssem Ben Braiek, Foutse Khomh, S. Bouzidi, Rania Zaatour

引用次数: 1

Towards accurate recommendations of merge conflicts resolution strategies 对合并冲突解决策略的准确建议

Inf. Softw. Technol.

Pub Date : 2023-09-01 DOI: 10.2139/ssrn.4327366

P. Elias, H. D. S. C. Junior, Eduardo Ogasawara, Leonardo Gresta Paulino Murta

引用次数: 0

Characteristics and generative mechanisms of software development productivity distributions 软件开发生产力分布的特征和生成机制

Inf. Softw. Technol.

Pub Date : 2023-07-01 DOI: 10.2139/ssrn.4273483

M. Jørgensen

引用次数: 1

Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance? 传统的特征提取和轻量级模型能提高漏洞类型识别性能吗?

Inf. Softw. Technol.

Pub Date : 2023-06-26 DOI: 10.48550/arXiv.2306.14726

H. Vo, Son Nguyen

Recent advances in automated vulnerability detection have achieved potential results in helping developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by"pre-training, then fine-tuning"framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that of the classical baseline approach with an old-fashioned bag-of-word, TF-IDF. Meanwhile, these deep neural network approaches cost much more resources and require GPU. We also introduce a lightweight independent component to refine the predictions of the baseline approach. Our idea is that the types of vulnerabilities could strongly correlate to certain code tokens (distinguishing tokens) in several crucial parts of programs. The distinguishing tokens for each vulnerability type are statistically identified based on their prevalence in the type versus the others. Our results show that the baseline approach enhanced by our component can outperform the state-of-the-art deep pre-trained approaches while retaining very high efficiency. Furthermore, the proposed component could also improve the neural network approaches by up to 92.8% in macro-average F1.

自动化漏洞检测的最新进展已经在帮助开发人员确定易受攻击的组件方面取得了潜在的成果。然而，在检测到漏洞之后，调查修复漏洞代码是一项非常重要的任务。实际上，漏洞的类型，如缓冲区溢出或内存损坏，可以帮助开发人员快速了解弱点的性质，并对漏洞进行本地化，以便进行安全分析。在这项工作中，我们研究了漏洞类型识别(VTI)问题。将该问题建模为多标签分类任务，采用深度预训练嵌入模型的“先训练后微调”框架可以有效地解决该问题。我们评估了知名和先进的VTI预训练模型在大量漏洞上的性能。令人惊讶的是，它们的性能并不比使用老式词袋TF-IDF的经典基线方法好多少。同时，这些深度神经网络方法耗费更多的资源和GPU。我们还引入了一个轻量级的独立组件来改进基线方法的预测。我们的想法是，漏洞的类型可能与程序的几个关键部分中的某些代码令牌(区分令牌)密切相关。每个漏洞类型的区分令牌是根据其在该类型中的流行程度与其他类型进行统计识别的。我们的结果表明，我们的组件增强的基线方法可以在保持非常高的效率的同时优于最先进的深度预训练方法。此外，所提出的分量在宏观平均F1上也能将神经网络方法提高92.8%。

{"title":"Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?","authors":"H. Vo, Son Nguyen","doi":"10.48550/arXiv.2306.14726","DOIUrl":"https://doi.org/10.48550/arXiv.2306.14726","url":null,"abstract":"Recent advances in automated vulnerability detection have achieved potential results in helping developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by\"pre-training, then fine-tuning\"framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that of the classical baseline approach with an old-fashioned bag-of-word, TF-IDF. Meanwhile, these deep neural network approaches cost much more resources and require GPU. We also introduce a lightweight independent component to refine the predictions of the baseline approach. Our idea is that the types of vulnerabilities could strongly correlate to certain code tokens (distinguishing tokens) in several crucial parts of programs. The distinguishing tokens for each vulnerability type are statistically identified based on their prevalence in the type versus the others. Our results show that the baseline approach enhanced by our component can outperform the state-of-the-art deep pre-trained approaches while retaining very high efficiency. Furthermore, the proposed component could also improve the neural network approaches by up to 92.8% in macro-average F1.","PeriodicalId":133352,"journal":{"name":"Inf. Softw. Technol.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133298340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Learning Test-Mutant Relationship for Accurate Fault Localisation 学习测试突变关系，实现准确的故障定位

Inf. Softw. Technol.

Pub Date : 2023-06-04 DOI: 10.48550/arXiv.2306.02319

Jinhan Kim, Gabin An, R. Feldt, Shin Yoo

Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.

上下文:自动故障定位旨在通过缩小可能的故障位置空间来帮助开发人员识别故障的根本原因。基于突变的故障定位技术(MBFL)是一种基于突变的故障定位技术。尽管取得了成功，但现有的MBFL技术在观察到故障后进行突变分析的成本较高。方法:为了克服这一缺点，我们提出了一种新的基于突变的故障定位统计推断(SIMFL)技术。SIMFL根据在项目历史中的早期版本上完成的突变分析的过去结果来定位故障，允许开发人员以及时的方式对传入故障的位置做出预测。SIMFL利用多种统计推断方法，对突变体的检测结果与其位置之间的关系进行建模，进而推断出当前故障的位置。结果:对Defects4J数据集的实证研究表明，在224个故障中，SIMFL能将113个故障定位在第一级，优于其他MBFL技术。即使在预测的杀伤矩阵上训练SIMFL, SIMFL仍然可以在194个故障中定位到第一级的95个故障。此外，去除冗余突变体可以显著提高SIMFL的定位精度，在第一级定位的故障数量达到51个。结论:本文提出了一种新的MBFL技术SIMFL，该技术利用提前突变分析来定位电流故障。SIMFL不仅具有成本效益，因为它不需要在观察到故障后进行突变分析，而且能够准确地定位故障。

{"title":"Learning Test-Mutant Relationship for Accurate Fault Localisation","authors":"Jinhan Kim, Gabin An, R. Feldt, Shin Yoo","doi":"10.48550/arXiv.2306.02319","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02319","url":null,"abstract":"Context: Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. Method: To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults. Results: The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51. Conclusion: This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.","PeriodicalId":133352,"journal":{"name":"Inf. Softw. Technol.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116206090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A mixed method study of DevOps challenges DevOps挑战的混合方法研究

Inf. Softw. Technol.

Pub Date : 2023-05-01 DOI: 10.2139/ssrn.4109001

Minaoar Hossain Tanzil, M. Sarker, Gias Uddin, Anindya Iqbal

引用次数: 1

Abbreviation-Expansion Pair Detection for Glossary Term Extraction 用于术语提取的缩写扩展对检测

Inf. Softw. Technol.

Pub Date : 2023-03-01 DOI: 10.1007/978-3-030-98464-9_6

Hussein Hasso, K. Großer, Iliass Aymaz, Hanna Geppert, J. Jürjens

引用次数: 2

Application of Project-Based Learning to a Software Engineering course in a hybrid class environment 基于项目的学习在混合式课堂环境下软件工程课程中的应用

Inf. Softw. Technol.

Pub Date : 2023-03-01 DOI: 10.2139/ssrn.4280809

E. Ceh-Varela, Carlos Canto-Bonilla, Dhimitraq Duni

引用次数: 1

User story extraction from natural language for requirements elicitation: Identify software-related information from online news 从自然语言中提取用户故事，用于需求引出:从在线新闻中识别与软件相关的信息

Inf. Softw. Technol.

Pub Date : 2023-03-01 DOI: 10.2139/ssrn.4297519

D. Siahaan, I. K. Raharjana, C. Fatichah

引用次数: 4

Zero-Shot Learning for Requirements Classification: An Exploratory Study 需求分类的零射击学习:探索性研究

Inf. Softw. Technol.

Pub Date : 2023-02-09 DOI: 10.48550/arXiv.2302.04723

Waad Alhoshan, Alessio Ferrari, Liping Zhao

Context: Requirements engineering researchers have been experimenting with machine learning and deep learning approaches for a range of RE tasks, such as requirements classification, requirements tracing, ambiguity detection, and modelling. However, most of today's ML/DL approaches are based on supervised learning techniques, meaning that they need to be trained using a large amount of task-specific labelled training data. This constraint poses an enormous challenge to RE researchers, as the lack of labelled data makes it difficult for them to fully exploit the benefit of advanced ML/DL technologies. Objective: This paper addresses this problem by showing how a zero-shot learning approach can be used for requirements classification without using any labelled training data. We focus on the classification task because many RE tasks can be framed as classification problems. Method: The ZSL approach used in our study employs contextual word-embeddings and transformer-based language models. We demonstrate this approach through a series of experiments to perform three classification tasks: (1)FR/NFR: classification functional requirements vs non-functional requirements; (2)NFR: identification of NFR classes; (3)Security: classification of security vs non-security requirements. Results: The study shows that the ZSL approach achieves an F1 score of 0.66 for the FR/NFR task. For the NFR task, the approach yields F1~0.72-0.80, considering the most frequent classes. For the Security task, F1~0.66. All of the aforementioned F1 scores are achieved with zero-training efforts. Conclusion: This study demonstrates the potential of ZSL for requirements classification. An important implication is that it is possible to have very little or no training data to perform classification tasks. The proposed approach thus contributes to the solution of the long-standing problem of data shortage in RE.

背景:需求工程研究人员一直在试验机器学习和深度学习方法来完成一系列可重构任务，例如需求分类、需求跟踪、模糊检测和建模。然而，今天的大多数ML/DL方法都是基于监督学习技术，这意味着它们需要使用大量特定任务的标记训练数据进行训练。这一限制给可再生能源研究人员带来了巨大的挑战，因为缺乏标记数据使他们难以充分利用先进的ML/DL技术的优势。目的:本文通过展示如何在不使用任何标记训练数据的情况下使用零射击学习方法来解决这个问题。我们将重点放在分类任务上，因为许多可重构任务可以被定义为分类问题。方法:在我们的研究中使用的ZSL方法采用上下文词嵌入和基于转换的语言模型。我们通过一系列实验来演示这种方法，以执行三个分类任务:(1)FR/NFR:分类功能需求与非功能需求;(2)NFR: NFR类别的鉴定;(3)安全性:安全性与非安全性需求的分类。结果:研究表明ZSL方法对FR/NFR任务的F1得分为0.66。对于NFR任务，考虑到最频繁的类，该方法的结果为F1~0.72-0.80。对于Security任务，F1~0.66。上述所有F1成绩都是在零训练的情况下取得的。结论:本研究证明了ZSL在需求分类方面的潜力。一个重要的含义是，有可能只有很少或没有训练数据来执行分类任务。因此，该方法有助于解决长期存在的可重构数据短缺问题。

{"title":"Zero-Shot Learning for Requirements Classification: An Exploratory Study","authors":"Waad Alhoshan, Alessio Ferrari, Liping Zhao","doi":"10.48550/arXiv.2302.04723","DOIUrl":"https://doi.org/10.48550/arXiv.2302.04723","url":null,"abstract":"Context: Requirements engineering researchers have been experimenting with machine learning and deep learning approaches for a range of RE tasks, such as requirements classification, requirements tracing, ambiguity detection, and modelling. However, most of today's ML/DL approaches are based on supervised learning techniques, meaning that they need to be trained using a large amount of task-specific labelled training data. This constraint poses an enormous challenge to RE researchers, as the lack of labelled data makes it difficult for them to fully exploit the benefit of advanced ML/DL technologies. Objective: This paper addresses this problem by showing how a zero-shot learning approach can be used for requirements classification without using any labelled training data. We focus on the classification task because many RE tasks can be framed as classification problems. Method: The ZSL approach used in our study employs contextual word-embeddings and transformer-based language models. We demonstrate this approach through a series of experiments to perform three classification tasks: (1)FR/NFR: classification functional requirements vs non-functional requirements; (2)NFR: identification of NFR classes; (3)Security: classification of security vs non-security requirements. Results: The study shows that the ZSL approach achieves an F1 score of 0.66 for the FR/NFR task. For the NFR task, the approach yields F1~0.72-0.80, considering the most frequent classes. For the Security task, F1~0.66. All of the aforementioned F1 scores are achieved with zero-training efforts. Conclusion: This study demonstrates the potential of ZSL for requirements classification. An important implication is that it is possible to have very little or no training data to perform classification tasks. The proposed approach thus contributes to the solution of the long-standing problem of data shortage in RE.","PeriodicalId":133352,"journal":{"name":"Inf. Softw. Technol.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128213926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Inf. Softw. Technol.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀