首页 > 最新文献

2021 28th Asia-Pacific Software Engineering Conference (APSEC)最新文献

英文 中文
AWaRE2-MM: A Meta-Model for Goal-Driven, Contract-Mediated, Team-Centric Autonomous Middleware Frameworks for Antifragility AWaRE2-MM:目标驱动、契约中介、以团队为中心的反脆弱性自治中间件框架的元模型
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00066
Anton V. Uzunov, Matthew Brennan, Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk, John Wondoh
In this paper, we introduce a new meta-model that captures core concepts for constructing software architectures for general-purpose, autonomous middleware frameworks that realize internalized and externalized self-adaptivity at both a system- and meta-level in order to achieve antifragility. The proposed meta-model builds on, specializes, and complements existing multi-agent meta-models in line with a previously published reference model for antifragile systems in the cyber domain.
在本文中,我们引入了一个新的元模型,它捕获了为通用的、自治的中间件框架构建软件体系结构的核心概念,这些框架在系统和元级别上实现了内部化和外部化的自适应,以实现反脆弱性。提出的元模型建立在现有的多智能体元模型的基础上,专门研究和补充了先前发布的网络领域反脆弱系统参考模型。
{"title":"AWaRE2-MM: A Meta-Model for Goal-Driven, Contract-Mediated, Team-Centric Autonomous Middleware Frameworks for Antifragility","authors":"Anton V. Uzunov, Matthew Brennan, Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk, John Wondoh","doi":"10.1109/APSEC53868.2021.00066","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00066","url":null,"abstract":"In this paper, we introduce a new meta-model that captures core concepts for constructing software architectures for general-purpose, autonomous middleware frameworks that realize internalized and externalized self-adaptivity at both a system- and meta-level in order to achieve antifragility. The proposed meta-model builds on, specializes, and complements existing multi-agent meta-models in line with a previously published reference model for antifragile systems in the cyber domain.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122469773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Probabilistic testing of asynchronously communicating systems 异步通信系统的概率测试
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00058
Puneet Bhateja
Input-output labelled transition system (IOLTS) is a state-based model that is widely used to describe the functional behaviour of a reactive system. However when the same system is observed asynchronously through a pair of unbounded FIFO queues (or channels), its apparent behaviour is different from its actual behaviour. This is because an execution trace of the system could appear distorted in a multitude of ways. The apparent behaviour is called the asynchronous behaviour of the system. It is well known that the asynchronous behaviour can also be described by an infinite-state IOLTS. This description however proves to be appropriate only as long as the channels are assumed to be reliable. The moment we throw in unreliability assumptions, the asynchronous behaviour becomes probabilistic in nature. The plain IOLTS model is simply not expressive enough to capture this probabilistic behaviour. To this end, we in this paper show how the asynchronous behaviour of a reactive system can be captured by Segala's probabilistic automata (SPA). We further show how the SPA expressing the asynchronous behaviour can serve as a reference model for probabilistic testing of asynchronously communicating systems.
输入-输出标记转换系统(IOLTS)是一种基于状态的模型,广泛用于描述反应系统的功能行为。然而,当同一个系统通过一对无界FIFO队列(或通道)异步观察时,其表面行为与实际行为不同。这是因为系统的执行跟踪可能以多种方式出现扭曲。这种明显的行为称为系统的异步行为。众所周知,异步行为也可以用无限状态IOLTS来描述。然而,这种描述证明只有在假定信道是可靠的情况下才适用。当我们引入不可靠性假设时,异步行为在本质上就变成了概率性。普通IOLTS模型的表达能力不足以捕捉这种概率行为。为此,我们在本文中展示了如何通过Segala的概率自动机(SPA)捕获响应系统的异步行为。我们进一步展示了表达异步行为的SPA如何作为异步通信系统概率测试的参考模型。
{"title":"Probabilistic testing of asynchronously communicating systems","authors":"Puneet Bhateja","doi":"10.1109/APSEC53868.2021.00058","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00058","url":null,"abstract":"Input-output labelled transition system (IOLTS) is a state-based model that is widely used to describe the functional behaviour of a reactive system. However when the same system is observed asynchronously through a pair of unbounded FIFO queues (or channels), its apparent behaviour is different from its actual behaviour. This is because an execution trace of the system could appear distorted in a multitude of ways. The apparent behaviour is called the asynchronous behaviour of the system. It is well known that the asynchronous behaviour can also be described by an infinite-state IOLTS. This description however proves to be appropriate only as long as the channels are assumed to be reliable. The moment we throw in unreliability assumptions, the asynchronous behaviour becomes probabilistic in nature. The plain IOLTS model is simply not expressive enough to capture this probabilistic behaviour. To this end, we in this paper show how the asynchronous behaviour of a reactive system can be captured by Segala's probabilistic automata (SPA). We further show how the SPA expressing the asynchronous behaviour can serve as a reference model for probabilistic testing of asynchronously communicating systems.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PGraph: A Graph-based Structure for Interactive Event Exploration on Social Media PGraph:基于图的社交媒体互动事件探索结构
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00015
Yang Yu, Minglai Shao, Hongyan Xu, Ying Sun, Wenjun Wang, Bofei Ma
Event detection is a common research topic in visualization. Existing methods always follow an exploration mode, where machine learning algorithms identify events and then analyze them via a visualization system. The detection process does not integrate the expert's experience. In this paper, we propose a novel framework that organizes the original dataset as an integrated graph that allows for Interactive Event Detection (IED) on the graph. Specifically, we formulate the problem Interactive Event Detection as subgraph detection on the graph under expert's interactions. Further, we define a flexible structure called PGraph to model the dataset and then propose an efficient algorithm that returns a subgraph as an event. Our proposed method supports performing various IED tasks under the expert's interactions. We evaluate the utility of our approach by applying it in two scenarios. One uses a social media dataset to study hot events; the other urban burglary dataset is used to detect consecutive burglary cases. Case studies show that our algorithm could detect more global events considering the expert's experience. By quantitative performance experiments, our method outperforms traditional machine detection approaches, especially in the social media dataset; our method's accuracy is higher than baselines at least 10%.
事件检测是可视化领域的一个常见研究课题。现有的方法总是遵循探索模式,其中机器学习算法识别事件,然后通过可视化系统对其进行分析。检测过程没有整合专家的经验。在本文中,我们提出了一个新的框架,该框架将原始数据集组织为一个集成图,允许在图上进行交互式事件检测(IED)。具体来说,我们将交互事件检测问题表述为在专家交互下的图上的子图检测问题。此外,我们定义了一个名为PGraph的灵活结构来对数据集进行建模,然后提出了一种有效的算法,该算法将子图作为事件返回。我们提出的方法支持在专家的交互下执行各种IED任务。我们通过在两个场景中应用我们的方法来评估它的效用。一个使用社交媒体数据集来研究热点事件;另一个城市入室盗窃数据集用于检测连续入室盗窃案件。案例研究表明,考虑到专家的经验,我们的算法可以检测到更多的全局事件。通过定量性能实验,我们的方法优于传统的机器检测方法,特别是在社交媒体数据集中;我们的方法的准确度比基线至少高出10%。
{"title":"PGraph: A Graph-based Structure for Interactive Event Exploration on Social Media","authors":"Yang Yu, Minglai Shao, Hongyan Xu, Ying Sun, Wenjun Wang, Bofei Ma","doi":"10.1109/APSEC53868.2021.00015","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00015","url":null,"abstract":"Event detection is a common research topic in visualization. Existing methods always follow an exploration mode, where machine learning algorithms identify events and then analyze them via a visualization system. The detection process does not integrate the expert's experience. In this paper, we propose a novel framework that organizes the original dataset as an integrated graph that allows for Interactive Event Detection (IED) on the graph. Specifically, we formulate the problem Interactive Event Detection as subgraph detection on the graph under expert's interactions. Further, we define a flexible structure called PGraph to model the dataset and then propose an efficient algorithm that returns a subgraph as an event. Our proposed method supports performing various IED tasks under the expert's interactions. We evaluate the utility of our approach by applying it in two scenarios. One uses a social media dataset to study hot events; the other urban burglary dataset is used to detect consecutive burglary cases. Case studies show that our algorithm could detect more global events considering the expert's experience. By quantitative performance experiments, our method outperforms traditional machine detection approaches, especially in the social media dataset; our method's accuracy is higher than baselines at least 10%.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the Impact of ML use cases on Industrial Data Pipelines 机器学习用例对工业数据管道的影响
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00053
M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson
The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.
人工智能革命对我们的社会、生活、企业和就业的影响无疑是巨大的。由于数据是一个关键因素,组织正在努力获得高质量的数据来训练他们的人工智能模型。尽管在引入ML模型之前,数据、数据管理和数据管道就已经是工业实践的一部分,但随着ML模型的出现,数据的重要性进一步增加,这迫使数据管道开发人员超越对数据质量的传统关注。本研究的目的是分析ML用例对数据管道的影响。我们假设服务于ML模型的数据管道比传统的数据管道更重要。我们报告了一项研究,我们观察了三家公司的软件团队,他们开发了传统(非机器学习)数据管道和服务于基于机器学习的应用程序的数据管道。我们研究了来自三家公司的六个数据管道,并根据它们的重要性和目的对它们进行了分类。此外,我们还确定了可用于比较这些数据管道的开发和维护的决定因素。最后,我们将这些因素映射到二维空间中,以说明它们在低、中、高尺度上的重要性。
{"title":"On the Impact of ML use cases on Industrial Data Pipelines","authors":"M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson","doi":"10.1109/APSEC53868.2021.00053","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00053","url":null,"abstract":"The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards a Dynamic Visualization of Complex Reverse-Engineered Object Collaboration 面向复杂逆向工程对象协作的动态可视化
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00071
Aki Hongo, Naoya Nitta
UML is useful to model a higher abstraction level concepts of the software in a forward engineering context, but it is still challenging to reverse engineer more complex behavior of realistic object-oriented programs (OOPs) based on such visualization techniques. For example in a sequence diagram, an object appears in quite different ways when it serves as a sender or receiver of some message and as a parameter or return value of another message, and thus compound method invocations such as invocation chains and callbacks cannot be represented directly. In this paper, first, we define a dynamic metrics named alternation complexity that indicates the number of alternations of object roles between sender/receiver and parameter/return value within collaboration. Through experiments with 12 professional programmers, we confirmed that the metrics captures a certain aspect of difficulty in comprehending features. Furthermore, we present a dynamic visualization model to directly represent collaboration where the types of object roles frequently change.
UML对于在正向工程环境中建模更高抽象层次的软件概念是有用的,但是基于这种可视化技术对现实的面向对象程序(oop)的更复杂的行为进行逆向工程仍然是一个挑战。例如,在序列图中,当一个对象作为某个消息的发送者或接收者以及另一个消息的参数或返回值时,它以完全不同的方式出现,因此不能直接表示诸如调用链和回调之类的复合方法调用。在本文中,首先,我们定义了一个名为交替复杂性的动态度量,它表示协作中发送方/接收方与参数/返回值之间对象角色的交替数量。通过与12名专业程序员的实验,我们确认了这些指标在理解特性时捕捉到了困难的某个方面。此外,我们提出了一个动态可视化模型来直接表示对象角色类型频繁变化的协作。
{"title":"Towards a Dynamic Visualization of Complex Reverse-Engineered Object Collaboration","authors":"Aki Hongo, Naoya Nitta","doi":"10.1109/APSEC53868.2021.00071","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00071","url":null,"abstract":"UML is useful to model a higher abstraction level concepts of the software in a forward engineering context, but it is still challenging to reverse engineer more complex behavior of realistic object-oriented programs (OOPs) based on such visualization techniques. For example in a sequence diagram, an object appears in quite different ways when it serves as a sender or receiver of some message and as a parameter or return value of another message, and thus compound method invocations such as invocation chains and callbacks cannot be represented directly. In this paper, first, we define a dynamic metrics named alternation complexity that indicates the number of alternations of object roles between sender/receiver and parameter/return value within collaboration. Through experiments with 12 professional programmers, we confirmed that the metrics captures a certain aspect of difficulty in comprehending features. Furthermore, we present a dynamic visualization model to directly represent collaboration where the types of object roles frequently change.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122172427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CHIS: A Novel Hybrid Granularity Identifier Splitting Approach 一种新的混合粒度标识符分割方法
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00027
Siyuan Liu, Jingxuan Zhang, Jiahui Liang, Junpeng Luo, Yong Xu, Chenxing Sun
Information Retrieval (IR) techniques have been widely utilized by a growing number of software maintenance activities. However, there is a mismatch between source code lexicon (especially identifiers) and vocabulary in software artifacts, leading to the inefficiency of IR techniques. Consequently, it is essential to normalize identifiers, whose aim is to parse identifiers into several natural language terms. Identifier splitting significantly impacts on the effectiveness of identifier normalization. Even though researchers have proposed several approaches to split identifiers, three main drawbacks remain to be resolved, including without considering morphemes, over-splitting, and under-splitting. In this paper, we propose a new Character-level Hybrid-granularity Identifier Splitting approach CHIS to resolve the three drawbacks and better split identifiers. CHIS combines the Bidirectional Encoder Representation from Transformers (BERT) and Conditional Random Fields (CRF) to train a deep learning model to split identifiers. In addition, CHIS further employs a pre-processing component and a post-processing component to resolve the morpheme acquisition drawback and the over-splitting as well as the under-splitting drawbacks respectively, thus further improving its performance. Specifically, in the pre-processing component, CHIS obtains and labels the most frequent subwords of the training identifiers as morphemes through the Byte Pair Encoding (BPE) algorithm and the sequence labeling algorithm. In the post-processing component, CHIS iteratively merges and splits the splitting results obtained by the deep learning model to resolve the over-splitting and under-splitting drawbacks. We conduct extensive experiments to show the effectiveness of CHIS. Experimental results show that CHIS achieves the Accuracy of 0.943 on average and outperforms the state-of-the-art approach by 0.085 on average. In addition, the effectiveness of the pre-processing and post-processing components of CHIS are also validated.
信息检索(IR)技术已被越来越多的软件维护活动广泛应用。然而,源代码词汇(特别是标识符)和软件工件中的词汇之间存在不匹配,导致IR技术的低效率。因此,有必要对标识符进行规范化,其目的是将标识符解析为几个自然语言术语。标识符分裂严重影响标识符规范化的有效性。尽管研究人员已经提出了几种分割标识符的方法,但仍有三个主要缺点有待解决,包括不考虑语素、过度分割和欠分割。在本文中,我们提出了一种新的字符级混合粒度标识符分割方法CHIS,以解决这三个缺点并更好地分割标识符。CHIS结合了变形器的双向编码器表示(BERT)和条件随机场(CRF)来训练一个深度学习模型来分割标识符。此外,CHIS还采用预处理组件和后处理组件分别解决了语素获取缺陷和过拆分和欠拆分缺陷,从而进一步提高了其性能。具体来说,在预处理部分,CHIS通过字节对编码(Byte Pair Encoding, BPE)算法和序列标记算法,获取训练标识符中出现频率最高的子词,并将其标记为语素。在后处理部分,CHIS对深度学习模型得到的拆分结果进行迭代合并和拆分,解决了过拆分和欠拆分的缺点。我们进行了大量的实验来证明CHIS的有效性。实验结果表明,CHIS的平均准确率为0.943,比现有方法的平均准确率高出0.085。此外,还验证了CHIS的预处理和后处理组件的有效性。
{"title":"CHIS: A Novel Hybrid Granularity Identifier Splitting Approach","authors":"Siyuan Liu, Jingxuan Zhang, Jiahui Liang, Junpeng Luo, Yong Xu, Chenxing Sun","doi":"10.1109/APSEC53868.2021.00027","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00027","url":null,"abstract":"Information Retrieval (IR) techniques have been widely utilized by a growing number of software maintenance activities. However, there is a mismatch between source code lexicon (especially identifiers) and vocabulary in software artifacts, leading to the inefficiency of IR techniques. Consequently, it is essential to normalize identifiers, whose aim is to parse identifiers into several natural language terms. Identifier splitting significantly impacts on the effectiveness of identifier normalization. Even though researchers have proposed several approaches to split identifiers, three main drawbacks remain to be resolved, including without considering morphemes, over-splitting, and under-splitting. In this paper, we propose a new Character-level Hybrid-granularity Identifier Splitting approach CHIS to resolve the three drawbacks and better split identifiers. CHIS combines the Bidirectional Encoder Representation from Transformers (BERT) and Conditional Random Fields (CRF) to train a deep learning model to split identifiers. In addition, CHIS further employs a pre-processing component and a post-processing component to resolve the morpheme acquisition drawback and the over-splitting as well as the under-splitting drawbacks respectively, thus further improving its performance. Specifically, in the pre-processing component, CHIS obtains and labels the most frequent subwords of the training identifiers as morphemes through the Byte Pair Encoding (BPE) algorithm and the sequence labeling algorithm. In the post-processing component, CHIS iteratively merges and splits the splitting results obtained by the deep learning model to resolve the over-splitting and under-splitting drawbacks. We conduct extensive experiments to show the effectiveness of CHIS. Experimental results show that CHIS achieves the Accuracy of 0.943 on average and outperforms the state-of-the-art approach by 0.085 on average. In addition, the effectiveness of the pre-processing and post-processing components of CHIS are also validated.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124197305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design of Software Architecture for Neural Network Cooperation: Case of Forgery Detection 神经网络协同软件体系结构设计:以伪造检测为例
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00021
Akira Mizutani, Masami Noro, Atsushi Sawada
Recent technological advances in media tampering has been the cause of many harmful forged images. Tampering detection methods became major research topics to cope with it in the neural network community. The methods almost always aim at detecting a specific forgery. That is, a general detecting method to find any tampering has not been invented so far. This paper concerns about a software architecture for organizing multiple neural networks to detect multiple kinds of forgeries. The key issue here is to construct, from the meta-level, a mechanism for an ensemble of front-end neural networks to select a neural network which makes a decision. Under this architecture, we implemented a prototype for detecting forged images resulted from multiple tampering methods of copy-move and compression. In order to demonstrate that our architecture works well, we examined a case study with a total of 120,000 patches which consist of three classes of copy-move, compression and untampered data, 40,000 patches for each. The result shows our proposed method successfully classified 108,954 out of 120,000 patches with 90.82 % accuracy. We also give discussions on our architectural implication to avoid concept drift. Our architecture is designed to be a context-oriented and meta-level, which has a two-layered structure: meta and base. The neural networks can be categorized into base-level components, whereas a component coordinating the networks is addressed in meta-level. The architecture explains that the concept drift can be handled in the meta-level. Through the discussions on the techniques of transfer learning, online learning, and ensemble learning in terms of the architecture we constructed, it is concluded that we could construct a universal architecture to coordinate machine learning components.
最近在媒体篡改技术的进步已经造成了许多有害的伪造图像。针对这种情况,篡改检测方法成为神经网络学界的主要研究课题。这些方法几乎总是旨在检测特定的伪造品。也就是说,迄今为止还没有发明一种通用的检测方法来发现任何篡改。本文研究了一种组织多个神经网络来检测多种伪造文件的软件体系结构。这里的关键问题是从元层面构建一种机制,使前端神经网络集合选择一个做出决策的神经网络。在此架构下,我们实现了一个检测复制-移动和压缩等多种篡改方法导致的伪造图像的原型。为了证明我们的架构工作得很好,我们检查了一个案例研究,总共有120,000个补丁,其中包括三种类型的复制移动,压缩和未篡改数据,每种类型有40,000个补丁。结果表明,该方法在12万个补丁中成功分类了108,954个,准确率为90.82%。我们还讨论了我们的架构含义,以避免概念漂移。我们的体系结构被设计成面向上下文和元级的,它具有两层结构:元和基。神经网络可分为基础级组件,而协调网络的组件则在元级中寻址。该体系结构解释了概念漂移可以在元级别处理。通过对迁移学习、在线学习和集成学习技术在我们构建的体系结构方面的讨论,得出我们可以构建一个通用的体系结构来协调机器学习组件。
{"title":"Design of Software Architecture for Neural Network Cooperation: Case of Forgery Detection","authors":"Akira Mizutani, Masami Noro, Atsushi Sawada","doi":"10.1109/APSEC53868.2021.00021","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00021","url":null,"abstract":"Recent technological advances in media tampering has been the cause of many harmful forged images. Tampering detection methods became major research topics to cope with it in the neural network community. The methods almost always aim at detecting a specific forgery. That is, a general detecting method to find any tampering has not been invented so far. This paper concerns about a software architecture for organizing multiple neural networks to detect multiple kinds of forgeries. The key issue here is to construct, from the meta-level, a mechanism for an ensemble of front-end neural networks to select a neural network which makes a decision. Under this architecture, we implemented a prototype for detecting forged images resulted from multiple tampering methods of copy-move and compression. In order to demonstrate that our architecture works well, we examined a case study with a total of 120,000 patches which consist of three classes of copy-move, compression and untampered data, 40,000 patches for each. The result shows our proposed method successfully classified 108,954 out of 120,000 patches with 90.82 % accuracy. We also give discussions on our architectural implication to avoid concept drift. Our architecture is designed to be a context-oriented and meta-level, which has a two-layered structure: meta and base. The neural networks can be categorized into base-level components, whereas a component coordinating the networks is addressed in meta-level. The architecture explains that the concept drift can be handled in the meta-level. Through the discussions on the techniques of transfer learning, online learning, and ensemble learning in terms of the architecture we constructed, it is concluded that we could construct a universal architecture to coordinate machine learning components.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126273639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S2 LMMD: Cross-Project Software Defect Prediction via Statement Semantic Learning and Maximum Mean Discrepancy 基于语句语义学习和最大平均差异的跨项目软件缺陷预测
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00044
Wangshu Liu, Yongteng Zhu, Xiang Chen, Qing Gu, Xingya Wang, Shenkai Gu
Different from within-project software defect prediction (WPDP), cross-project software defect prediction (CPDP) does not require sufficient training data and can help developers in the early stages of software development. Recent studies tried to learn semantic features for CPDP by feeding neural networks with abstract syntax tree (AST) token vectors. However, the ASTs directly parsed from software modules usually have complex structures, which are reflected on more nodes and deeper size, and the transfer learning is not regularly adopted to further reduce the data distribution difference between the source project and the target project. To solve these problems, we aim to joint learn the statement level trees (SLT) and alleviate data distribution difference with maximum mean discrepancy (MMD) to improve defect prediction performance on CPDP. Specifically, we propose a novel cross-project defect prediction method S2LMMD via statement semantic learning and MMD. We first construct the SLT by splitting the original AST on specified node. Then we generate more effective semantic features by learning of sequence embedding with Bi-GRU neural network. Finally, a transfer loss MMD is carried out to keep more common characteristics across different project datasets to further improve CPDP performance. To verify the effectiveness of our proposed method, we conducted experiments on ten widely used open-source projects and evaluated the experimental performance by using AUC measures. Our empirical results show that our proposed method S2LMMD can significantly outperform eight state-of-the-art baselines. In addition, for semantic learning, SLT has a higher influence on CPDP, while MMD is of great significance in transfer learning.
与项目内软件缺陷预测(WPDP)不同,跨项目软件缺陷预测(CPDP)不需要足够的训练数据,可以在软件开发的早期阶段帮助开发人员。最近的研究试图通过向神经网络输入抽象语法树(AST)标记向量来学习CPDP的语义特征。但是,直接从软件模块中解析出来的ast通常结构复杂,体现在节点较多、规模更深,并且没有定期采用迁移学习来进一步减小源项目与目标项目之间的数据分布差异。为了解决这些问题,我们旨在联合学习语句层次树(SLT),并利用最大平均差异(MMD)来缓解数据分布差异,以提高CPDP上的缺陷预测性能。具体来说,我们提出了一种基于语句语义学习和MMD的跨项目缺陷预测方法S2LMMD。我们首先通过在指定节点上分割原始AST来构造SLT。然后利用Bi-GRU神经网络学习序列嵌入,生成更有效的语义特征。最后,进行转移损失MMD,以保持不同项目数据集的更多共同特征,从而进一步提高CPDP性能。为了验证我们提出的方法的有效性,我们在10个广泛使用的开源项目上进行了实验,并使用AUC度量来评估实验性能。我们的实证结果表明,我们提出的方法S2LMMD可以显著优于八个最先进的基线。此外,在语义学习中,SLT对CPDP有较高的影响,而MMD在迁移学习中具有重要意义。
{"title":"S2 LMMD: Cross-Project Software Defect Prediction via Statement Semantic Learning and Maximum Mean Discrepancy","authors":"Wangshu Liu, Yongteng Zhu, Xiang Chen, Qing Gu, Xingya Wang, Shenkai Gu","doi":"10.1109/APSEC53868.2021.00044","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00044","url":null,"abstract":"Different from within-project software defect prediction (WPDP), cross-project software defect prediction (CPDP) does not require sufficient training data and can help developers in the early stages of software development. Recent studies tried to learn semantic features for CPDP by feeding neural networks with abstract syntax tree (AST) token vectors. However, the ASTs directly parsed from software modules usually have complex structures, which are reflected on more nodes and deeper size, and the transfer learning is not regularly adopted to further reduce the data distribution difference between the source project and the target project. To solve these problems, we aim to joint learn the statement level trees (SLT) and alleviate data distribution difference with maximum mean discrepancy (MMD) to improve defect prediction performance on CPDP. Specifically, we propose a novel cross-project defect prediction method S2LMMD via statement semantic learning and MMD. We first construct the SLT by splitting the original AST on specified node. Then we generate more effective semantic features by learning of sequence embedding with Bi-GRU neural network. Finally, a transfer loss MMD is carried out to keep more common characteristics across different project datasets to further improve CPDP performance. To verify the effectiveness of our proposed method, we conducted experiments on ten widely used open-source projects and evaluated the experimental performance by using AUC measures. Our empirical results show that our proposed method S2LMMD can significantly outperform eight state-of-the-art baselines. In addition, for semantic learning, SLT has a higher influence on CPDP, while MMD is of great significance in transfer learning.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125891593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Finding repeated strings in code repositories and its applications to code-clone detection 查找代码存储库中的重复字符串及其在代码克隆检测中的应用
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00057
Yoriyuki Yamagata, Fabien Hervé, Yuji Fujiwara, Katsuro Inoue
Although researchers have created many advanced code-clone detection techniques, more effort is required to realize wide adaptation of these techniques in the industry. One of the reasons behind this is the reliance of these advanced techniques on lexing and parsing programs. Modern programming languages have complex lexical conventions and grammar, which evolve constantly. Therefore, using advanced code-clone detection techniques requires substantial and continuous effort. This paper proposes a lightweight language-independent method to detect code clones by simply finding repeated strings in a code repository, relying on neither lexing nor parsing. The proposed method is based on an efficient technique developed in a bio-informatics context to find repeated strings. We refer to the repeated strings in the source-code as weak Type-1 clones. Because the proposed technique normalizes newlines, tabs, and white spaces into a single white space, it can find clones in which newline positions or indentations are changed, as often in the case when copy-pasting occurs. Although the proposed method only finds verbatim copies, it also makes interesting observations regarding repository structures. Many developers may prefer the proposed simple approach because it is easier to understand than other advanced techniques that use heuristics, approximation, and machine learning.
尽管研究人员已经创造了许多先进的代码克隆检测技术,但要实现这些技术在工业上的广泛应用,还需要付出更多的努力。这背后的原因之一是这些高级技术依赖于词法分析和解析程序。现代编程语言具有复杂的词汇约定和语法,并且不断发展。因此,使用先进的代码克隆检测技术需要大量和持续的努力。本文提出了一种轻量级的独立于语言的方法,通过简单地在代码存储库中查找重复字符串来检测代码克隆,而不依赖于词法分析和解析。提出的方法是基于在生物信息学环境中开发的一种高效技术来查找重复字符串。我们将源代码中的重复字符串称为弱Type-1克隆。由于所建议的技术将换行符、制表符和空格规范化为单个空白,因此它可以找到换行符位置或缩进被更改的克隆,这通常发生在复制粘贴的情况下。尽管所建议的方法只查找逐字副本,但它也对存储库结构进行了有趣的观察。许多开发人员可能更喜欢建议的简单方法,因为它比使用启发式、近似和机器学习的其他高级技术更容易理解。
{"title":"Finding repeated strings in code repositories and its applications to code-clone detection","authors":"Yoriyuki Yamagata, Fabien Hervé, Yuji Fujiwara, Katsuro Inoue","doi":"10.1109/APSEC53868.2021.00057","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00057","url":null,"abstract":"Although researchers have created many advanced code-clone detection techniques, more effort is required to realize wide adaptation of these techniques in the industry. One of the reasons behind this is the reliance of these advanced techniques on lexing and parsing programs. Modern programming languages have complex lexical conventions and grammar, which evolve constantly. Therefore, using advanced code-clone detection techniques requires substantial and continuous effort. This paper proposes a lightweight language-independent method to detect code clones by simply finding repeated strings in a code repository, relying on neither lexing nor parsing. The proposed method is based on an efficient technique developed in a bio-informatics context to find repeated strings. We refer to the repeated strings in the source-code as weak Type-1 clones. Because the proposed technique normalizes newlines, tabs, and white spaces into a single white space, it can find clones in which newline positions or indentations are changed, as often in the case when copy-pasting occurs. Although the proposed method only finds verbatim copies, it also makes interesting observations regarding repository structures. Many developers may prefer the proposed simple approach because it is easier to understand than other advanced techniques that use heuristics, approximation, and machine learning.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Fault Detection Based on Precise Access Path 基于精确访问路径的可扩展故障检测
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00054
Chi Li, Yuexing Wang, Min Zhou, M. Gu
Precise static analysis is necessary for an industrial environment to ensure reliability and security, which is usually field-sensitive and inter-procedural. However, it faces the problem of insufficient scale capability when being applied to various industrial environments: (1) Field-sensitive analysis can not assure termination if field accesses are modeled by unbounded access paths; (2) Inter-procedural analysis may lead to path explosion problems because of the unbounded length of call chains. While using longer access paths or call chains can improve precision, the analysis may have poor performance in terms of efficiency. Specifically, an industry-strength method should be scalable enough to face different applications. This paper presents a scalable fault detection method based on the precise access path. Precise access path models a memory location with accurate operations and offsets from a source. Points-to relations of variables are used to refine it. It can differentiate elements of aggregate structures and is more precise than the ordinary access path. Based on the precise access path, we perform an inter-procedural analysis with the help of an intra-procedural analysis and combined function summary. Furthermore, our method is designed backward to detect error handling bugs. Compared with the state-of-the-art tools, our method is more scalable, with higher precision and efficiency on both benchmarks and 11 widely-used applications.
为了确保工业环境的可靠性和安全性,精确的静态分析是必要的,这通常是现场敏感的和程序间的。然而,在各种工业环境中应用时,它面临着规模能力不足的问题:(1)如果采用无界访问路径建模,则场敏感分析无法保证终端;(2)由于调用链的长度无界,程序间分析可能导致路径爆炸问题。虽然使用较长的访问路径或调用链可以提高精度,但就效率而言,分析的性能可能较差。具体来说,行业强度的方法应该具有足够的可伸缩性,以应对不同的应用程序。本文提出一种可扩展的基于精确的故障检测方法的访问路径。精确访问路径模型的内存位置与准确的操作和偏移从一个源。变量的点对关系被用来改进它。它可以区分聚集结构的元素,比普通的访问路径更精确。基于精确的访问路径,我们借助程序内分析和组合功能总结进行了程序间分析。此外,我们的方法被向后设计以检测错误处理错误。与最先进的工具相比,我们的方法更具可扩展性,在基准测试和11种广泛使用的应用中都具有更高的精度和效率。
{"title":"Scalable Fault Detection Based on Precise Access Path","authors":"Chi Li, Yuexing Wang, Min Zhou, M. Gu","doi":"10.1109/APSEC53868.2021.00054","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00054","url":null,"abstract":"Precise static analysis is necessary for an industrial environment to ensure reliability and security, which is usually field-sensitive and inter-procedural. However, it faces the problem of insufficient scale capability when being applied to various industrial environments: (1) Field-sensitive analysis can not assure termination if field accesses are modeled by unbounded access paths; (2) Inter-procedural analysis may lead to path explosion problems because of the unbounded length of call chains. While using longer access paths or call chains can improve precision, the analysis may have poor performance in terms of efficiency. Specifically, an industry-strength method should be scalable enough to face different applications. This paper presents a scalable fault detection method based on the precise access path. Precise access path models a memory location with accurate operations and offsets from a source. Points-to relations of variables are used to refine it. It can differentiate elements of aggregate structures and is more precise than the ordinary access path. Based on the precise access path, we perform an inter-procedural analysis with the help of an intra-procedural analysis and combined function summary. Furthermore, our method is designed backward to detect error handling bugs. Compared with the state-of-the-art tools, our method is more scalable, with higher precision and efficiency on both benchmarks and 11 widely-used applications.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130058615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 28th Asia-Pacific Software Engineering Conference (APSEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1