首页 > 最新文献

2021 28th Asia-Pacific Software Engineering Conference (APSEC)最新文献

英文 中文
AWaRE2-MM: A Meta-Model for Goal-Driven, Contract-Mediated, Team-Centric Autonomous Middleware Frameworks for Antifragility AWaRE2-MM:目标驱动、契约中介、以团队为中心的反脆弱性自治中间件框架的元模型
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00066
Anton V. Uzunov, Matthew Brennan, Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk, John Wondoh
In this paper, we introduce a new meta-model that captures core concepts for constructing software architectures for general-purpose, autonomous middleware frameworks that realize internalized and externalized self-adaptivity at both a system- and meta-level in order to achieve antifragility. The proposed meta-model builds on, specializes, and complements existing multi-agent meta-models in line with a previously published reference model for antifragile systems in the cyber domain.
在本文中,我们引入了一个新的元模型,它捕获了为通用的、自治的中间件框架构建软件体系结构的核心概念,这些框架在系统和元级别上实现了内部化和外部化的自适应,以实现反脆弱性。提出的元模型建立在现有的多智能体元模型的基础上,专门研究和补充了先前发布的网络领域反脆弱系统参考模型。
{"title":"AWaRE2-MM: A Meta-Model for Goal-Driven, Contract-Mediated, Team-Centric Autonomous Middleware Frameworks for Antifragility","authors":"Anton V. Uzunov, Matthew Brennan, Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk, John Wondoh","doi":"10.1109/APSEC53868.2021.00066","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00066","url":null,"abstract":"In this paper, we introduce a new meta-model that captures core concepts for constructing software architectures for general-purpose, autonomous middleware frameworks that realize internalized and externalized self-adaptivity at both a system- and meta-level in order to achieve antifragility. The proposed meta-model builds on, specializes, and complements existing multi-agent meta-models in line with a previously published reference model for antifragile systems in the cyber domain.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122469773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Probabilistic testing of asynchronously communicating systems 异步通信系统的概率测试
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00058
Puneet Bhateja
Input-output labelled transition system (IOLTS) is a state-based model that is widely used to describe the functional behaviour of a reactive system. However when the same system is observed asynchronously through a pair of unbounded FIFO queues (or channels), its apparent behaviour is different from its actual behaviour. This is because an execution trace of the system could appear distorted in a multitude of ways. The apparent behaviour is called the asynchronous behaviour of the system. It is well known that the asynchronous behaviour can also be described by an infinite-state IOLTS. This description however proves to be appropriate only as long as the channels are assumed to be reliable. The moment we throw in unreliability assumptions, the asynchronous behaviour becomes probabilistic in nature. The plain IOLTS model is simply not expressive enough to capture this probabilistic behaviour. To this end, we in this paper show how the asynchronous behaviour of a reactive system can be captured by Segala's probabilistic automata (SPA). We further show how the SPA expressing the asynchronous behaviour can serve as a reference model for probabilistic testing of asynchronously communicating systems.
输入-输出标记转换系统(IOLTS)是一种基于状态的模型,广泛用于描述反应系统的功能行为。然而,当同一个系统通过一对无界FIFO队列(或通道)异步观察时,其表面行为与实际行为不同。这是因为系统的执行跟踪可能以多种方式出现扭曲。这种明显的行为称为系统的异步行为。众所周知,异步行为也可以用无限状态IOLTS来描述。然而,这种描述证明只有在假定信道是可靠的情况下才适用。当我们引入不可靠性假设时,异步行为在本质上就变成了概率性。普通IOLTS模型的表达能力不足以捕捉这种概率行为。为此,我们在本文中展示了如何通过Segala的概率自动机(SPA)捕获响应系统的异步行为。我们进一步展示了表达异步行为的SPA如何作为异步通信系统概率测试的参考模型。
{"title":"Probabilistic testing of asynchronously communicating systems","authors":"Puneet Bhateja","doi":"10.1109/APSEC53868.2021.00058","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00058","url":null,"abstract":"Input-output labelled transition system (IOLTS) is a state-based model that is widely used to describe the functional behaviour of a reactive system. However when the same system is observed asynchronously through a pair of unbounded FIFO queues (or channels), its apparent behaviour is different from its actual behaviour. This is because an execution trace of the system could appear distorted in a multitude of ways. The apparent behaviour is called the asynchronous behaviour of the system. It is well known that the asynchronous behaviour can also be described by an infinite-state IOLTS. This description however proves to be appropriate only as long as the channels are assumed to be reliable. The moment we throw in unreliability assumptions, the asynchronous behaviour becomes probabilistic in nature. The plain IOLTS model is simply not expressive enough to capture this probabilistic behaviour. To this end, we in this paper show how the asynchronous behaviour of a reactive system can be captured by Segala's probabilistic automata (SPA). We further show how the SPA expressing the asynchronous behaviour can serve as a reference model for probabilistic testing of asynchronously communicating systems.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PGraph: A Graph-based Structure for Interactive Event Exploration on Social Media PGraph:基于图的社交媒体互动事件探索结构
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00015
Yang Yu, Minglai Shao, Hongyan Xu, Ying Sun, Wenjun Wang, Bofei Ma
Event detection is a common research topic in visualization. Existing methods always follow an exploration mode, where machine learning algorithms identify events and then analyze them via a visualization system. The detection process does not integrate the expert's experience. In this paper, we propose a novel framework that organizes the original dataset as an integrated graph that allows for Interactive Event Detection (IED) on the graph. Specifically, we formulate the problem Interactive Event Detection as subgraph detection on the graph under expert's interactions. Further, we define a flexible structure called PGraph to model the dataset and then propose an efficient algorithm that returns a subgraph as an event. Our proposed method supports performing various IED tasks under the expert's interactions. We evaluate the utility of our approach by applying it in two scenarios. One uses a social media dataset to study hot events; the other urban burglary dataset is used to detect consecutive burglary cases. Case studies show that our algorithm could detect more global events considering the expert's experience. By quantitative performance experiments, our method outperforms traditional machine detection approaches, especially in the social media dataset; our method's accuracy is higher than baselines at least 10%.
事件检测是可视化领域的一个常见研究课题。现有的方法总是遵循探索模式,其中机器学习算法识别事件,然后通过可视化系统对其进行分析。检测过程没有整合专家的经验。在本文中,我们提出了一个新的框架,该框架将原始数据集组织为一个集成图,允许在图上进行交互式事件检测(IED)。具体来说,我们将交互事件检测问题表述为在专家交互下的图上的子图检测问题。此外,我们定义了一个名为PGraph的灵活结构来对数据集进行建模,然后提出了一种有效的算法,该算法将子图作为事件返回。我们提出的方法支持在专家的交互下执行各种IED任务。我们通过在两个场景中应用我们的方法来评估它的效用。一个使用社交媒体数据集来研究热点事件;另一个城市入室盗窃数据集用于检测连续入室盗窃案件。案例研究表明,考虑到专家的经验,我们的算法可以检测到更多的全局事件。通过定量性能实验,我们的方法优于传统的机器检测方法,特别是在社交媒体数据集中;我们的方法的准确度比基线至少高出10%。
{"title":"PGraph: A Graph-based Structure for Interactive Event Exploration on Social Media","authors":"Yang Yu, Minglai Shao, Hongyan Xu, Ying Sun, Wenjun Wang, Bofei Ma","doi":"10.1109/APSEC53868.2021.00015","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00015","url":null,"abstract":"Event detection is a common research topic in visualization. Existing methods always follow an exploration mode, where machine learning algorithms identify events and then analyze them via a visualization system. The detection process does not integrate the expert's experience. In this paper, we propose a novel framework that organizes the original dataset as an integrated graph that allows for Interactive Event Detection (IED) on the graph. Specifically, we formulate the problem Interactive Event Detection as subgraph detection on the graph under expert's interactions. Further, we define a flexible structure called PGraph to model the dataset and then propose an efficient algorithm that returns a subgraph as an event. Our proposed method supports performing various IED tasks under the expert's interactions. We evaluate the utility of our approach by applying it in two scenarios. One uses a social media dataset to study hot events; the other urban burglary dataset is used to detect consecutive burglary cases. Case studies show that our algorithm could detect more global events considering the expert's experience. By quantitative performance experiments, our method outperforms traditional machine detection approaches, especially in the social media dataset; our method's accuracy is higher than baselines at least 10%.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards a Dynamic Visualization of Complex Reverse-Engineered Object Collaboration 面向复杂逆向工程对象协作的动态可视化
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00071
Aki Hongo, Naoya Nitta
UML is useful to model a higher abstraction level concepts of the software in a forward engineering context, but it is still challenging to reverse engineer more complex behavior of realistic object-oriented programs (OOPs) based on such visualization techniques. For example in a sequence diagram, an object appears in quite different ways when it serves as a sender or receiver of some message and as a parameter or return value of another message, and thus compound method invocations such as invocation chains and callbacks cannot be represented directly. In this paper, first, we define a dynamic metrics named alternation complexity that indicates the number of alternations of object roles between sender/receiver and parameter/return value within collaboration. Through experiments with 12 professional programmers, we confirmed that the metrics captures a certain aspect of difficulty in comprehending features. Furthermore, we present a dynamic visualization model to directly represent collaboration where the types of object roles frequently change.
UML对于在正向工程环境中建模更高抽象层次的软件概念是有用的,但是基于这种可视化技术对现实的面向对象程序(oop)的更复杂的行为进行逆向工程仍然是一个挑战。例如,在序列图中,当一个对象作为某个消息的发送者或接收者以及另一个消息的参数或返回值时,它以完全不同的方式出现,因此不能直接表示诸如调用链和回调之类的复合方法调用。在本文中,首先,我们定义了一个名为交替复杂性的动态度量,它表示协作中发送方/接收方与参数/返回值之间对象角色的交替数量。通过与12名专业程序员的实验,我们确认了这些指标在理解特性时捕捉到了困难的某个方面。此外,我们提出了一个动态可视化模型来直接表示对象角色类型频繁变化的协作。
{"title":"Towards a Dynamic Visualization of Complex Reverse-Engineered Object Collaboration","authors":"Aki Hongo, Naoya Nitta","doi":"10.1109/APSEC53868.2021.00071","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00071","url":null,"abstract":"UML is useful to model a higher abstraction level concepts of the software in a forward engineering context, but it is still challenging to reverse engineer more complex behavior of realistic object-oriented programs (OOPs) based on such visualization techniques. For example in a sequence diagram, an object appears in quite different ways when it serves as a sender or receiver of some message and as a parameter or return value of another message, and thus compound method invocations such as invocation chains and callbacks cannot be represented directly. In this paper, first, we define a dynamic metrics named alternation complexity that indicates the number of alternations of object roles between sender/receiver and parameter/return value within collaboration. Through experiments with 12 professional programmers, we confirmed that the metrics captures a certain aspect of difficulty in comprehending features. Furthermore, we present a dynamic visualization model to directly represent collaboration where the types of object roles frequently change.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122172427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monitoring Negative Sentiment-Related Events in Open Source Software Projects 监控开源软件项目中的负面情绪相关事件
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00017
Lingjia Li, Jian Cao, Qing Qi
Open source software (OSS) development is a highly collaborative process where individuals, groups and organizations interact to develop, operate and maintain software and related artifacts. The developers' sentiment in this process can have an impact on their working willingness and efficiency. Monitoring sentiment factors can help to improve OSS development and management. However, no method has been proposed to dynamically monitor the sentiment phenomena during the OSS development process. In this paper, an approach to detect Negative Sentiment-related Events (NSE) is proposed. It consists of two steps. The first step is to identify the burst interval of negative comments from open source projects, which corresponds to a NSE. The second step is to annotate this NSE with its event type. To support this approach, the types of NSEs in OSS projects are defined through an empirical study and classifiers are trained to annotate event types automatically. Moreover, conversation disentanglement techniques are employed to make the comments extracted more complete. Finally, the factors that have an influence on NSEs in the OSS project are studied.
开源软件(OSS)开发是一个高度协作的过程,在这个过程中,个人、团体和组织相互作用来开发、操作和维护软件及相关工件。在这个过程中,开发者的情绪会影响他们的工作意愿和效率。监视情绪因素可以帮助改进OSS的开发和管理。然而,目前还没有提出一种方法来动态监测OSS开发过程中的情绪现象。本文提出了一种检测负面情绪相关事件(NSE)的方法。它包括两个步骤。第一步是确定来自开源项目的负面评论的爆发间隔,这与NSE相对应。第二步是用它的事件类型注释这个NSE。为了支持这种方法,通过经验研究定义了OSS项目中的nse类型,并且训练了分类器来自动注释事件类型。此外,还采用了会话解纠缠技术,使提取的评论更加完整。最后,对影响OSS项目中nse的因素进行了研究。
{"title":"Monitoring Negative Sentiment-Related Events in Open Source Software Projects","authors":"Lingjia Li, Jian Cao, Qing Qi","doi":"10.1109/APSEC53868.2021.00017","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00017","url":null,"abstract":"Open source software (OSS) development is a highly collaborative process where individuals, groups and organizations interact to develop, operate and maintain software and related artifacts. The developers' sentiment in this process can have an impact on their working willingness and efficiency. Monitoring sentiment factors can help to improve OSS development and management. However, no method has been proposed to dynamically monitor the sentiment phenomena during the OSS development process. In this paper, an approach to detect Negative Sentiment-related Events (NSE) is proposed. It consists of two steps. The first step is to identify the burst interval of negative comments from open source projects, which corresponds to a NSE. The second step is to annotate this NSE with its event type. To support this approach, the types of NSEs in OSS projects are defined through an empirical study and classifiers are trained to annotate event types automatically. Moreover, conversation disentanglement techniques are employed to make the comments extracted more complete. Finally, the factors that have an influence on NSEs in the OSS project are studied.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133898679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Impact of ML use cases on Industrial Data Pipelines 机器学习用例对工业数据管道的影响
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00053
M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson
The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.
人工智能革命对我们的社会、生活、企业和就业的影响无疑是巨大的。由于数据是一个关键因素,组织正在努力获得高质量的数据来训练他们的人工智能模型。尽管在引入ML模型之前,数据、数据管理和数据管道就已经是工业实践的一部分,但随着ML模型的出现,数据的重要性进一步增加,这迫使数据管道开发人员超越对数据质量的传统关注。本研究的目的是分析ML用例对数据管道的影响。我们假设服务于ML模型的数据管道比传统的数据管道更重要。我们报告了一项研究,我们观察了三家公司的软件团队,他们开发了传统(非机器学习)数据管道和服务于基于机器学习的应用程序的数据管道。我们研究了来自三家公司的六个数据管道,并根据它们的重要性和目的对它们进行了分类。此外,我们还确定了可用于比较这些数据管道的开发和维护的决定因素。最后,我们将这些因素映射到二维空间中,以说明它们在低、中、高尺度上的重要性。
{"title":"On the Impact of ML use cases on Industrial Data Pipelines","authors":"M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson","doi":"10.1109/APSEC53868.2021.00053","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00053","url":null,"abstract":"The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CHIS: A Novel Hybrid Granularity Identifier Splitting Approach 一种新的混合粒度标识符分割方法
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00027
Siyuan Liu, Jingxuan Zhang, Jiahui Liang, Junpeng Luo, Yong Xu, Chenxing Sun
Information Retrieval (IR) techniques have been widely utilized by a growing number of software maintenance activities. However, there is a mismatch between source code lexicon (especially identifiers) and vocabulary in software artifacts, leading to the inefficiency of IR techniques. Consequently, it is essential to normalize identifiers, whose aim is to parse identifiers into several natural language terms. Identifier splitting significantly impacts on the effectiveness of identifier normalization. Even though researchers have proposed several approaches to split identifiers, three main drawbacks remain to be resolved, including without considering morphemes, over-splitting, and under-splitting. In this paper, we propose a new Character-level Hybrid-granularity Identifier Splitting approach CHIS to resolve the three drawbacks and better split identifiers. CHIS combines the Bidirectional Encoder Representation from Transformers (BERT) and Conditional Random Fields (CRF) to train a deep learning model to split identifiers. In addition, CHIS further employs a pre-processing component and a post-processing component to resolve the morpheme acquisition drawback and the over-splitting as well as the under-splitting drawbacks respectively, thus further improving its performance. Specifically, in the pre-processing component, CHIS obtains and labels the most frequent subwords of the training identifiers as morphemes through the Byte Pair Encoding (BPE) algorithm and the sequence labeling algorithm. In the post-processing component, CHIS iteratively merges and splits the splitting results obtained by the deep learning model to resolve the over-splitting and under-splitting drawbacks. We conduct extensive experiments to show the effectiveness of CHIS. Experimental results show that CHIS achieves the Accuracy of 0.943 on average and outperforms the state-of-the-art approach by 0.085 on average. In addition, the effectiveness of the pre-processing and post-processing components of CHIS are also validated.
信息检索(IR)技术已被越来越多的软件维护活动广泛应用。然而,源代码词汇(特别是标识符)和软件工件中的词汇之间存在不匹配,导致IR技术的低效率。因此,有必要对标识符进行规范化,其目的是将标识符解析为几个自然语言术语。标识符分裂严重影响标识符规范化的有效性。尽管研究人员已经提出了几种分割标识符的方法,但仍有三个主要缺点有待解决,包括不考虑语素、过度分割和欠分割。在本文中,我们提出了一种新的字符级混合粒度标识符分割方法CHIS,以解决这三个缺点并更好地分割标识符。CHIS结合了变形器的双向编码器表示(BERT)和条件随机场(CRF)来训练一个深度学习模型来分割标识符。此外,CHIS还采用预处理组件和后处理组件分别解决了语素获取缺陷和过拆分和欠拆分缺陷,从而进一步提高了其性能。具体来说,在预处理部分,CHIS通过字节对编码(Byte Pair Encoding, BPE)算法和序列标记算法,获取训练标识符中出现频率最高的子词,并将其标记为语素。在后处理部分,CHIS对深度学习模型得到的拆分结果进行迭代合并和拆分,解决了过拆分和欠拆分的缺点。我们进行了大量的实验来证明CHIS的有效性。实验结果表明,CHIS的平均准确率为0.943,比现有方法的平均准确率高出0.085。此外,还验证了CHIS的预处理和后处理组件的有效性。
{"title":"CHIS: A Novel Hybrid Granularity Identifier Splitting Approach","authors":"Siyuan Liu, Jingxuan Zhang, Jiahui Liang, Junpeng Luo, Yong Xu, Chenxing Sun","doi":"10.1109/APSEC53868.2021.00027","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00027","url":null,"abstract":"Information Retrieval (IR) techniques have been widely utilized by a growing number of software maintenance activities. However, there is a mismatch between source code lexicon (especially identifiers) and vocabulary in software artifacts, leading to the inefficiency of IR techniques. Consequently, it is essential to normalize identifiers, whose aim is to parse identifiers into several natural language terms. Identifier splitting significantly impacts on the effectiveness of identifier normalization. Even though researchers have proposed several approaches to split identifiers, three main drawbacks remain to be resolved, including without considering morphemes, over-splitting, and under-splitting. In this paper, we propose a new Character-level Hybrid-granularity Identifier Splitting approach CHIS to resolve the three drawbacks and better split identifiers. CHIS combines the Bidirectional Encoder Representation from Transformers (BERT) and Conditional Random Fields (CRF) to train a deep learning model to split identifiers. In addition, CHIS further employs a pre-processing component and a post-processing component to resolve the morpheme acquisition drawback and the over-splitting as well as the under-splitting drawbacks respectively, thus further improving its performance. Specifically, in the pre-processing component, CHIS obtains and labels the most frequent subwords of the training identifiers as morphemes through the Byte Pair Encoding (BPE) algorithm and the sequence labeling algorithm. In the post-processing component, CHIS iteratively merges and splits the splitting results obtained by the deep learning model to resolve the over-splitting and under-splitting drawbacks. We conduct extensive experiments to show the effectiveness of CHIS. Experimental results show that CHIS achieves the Accuracy of 0.943 on average and outperforms the state-of-the-art approach by 0.085 on average. In addition, the effectiveness of the pre-processing and post-processing components of CHIS are also validated.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124197305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Literature Review on Log Anomaly Detection Approaches Utilizing Online Parsing Methodology* 利用在线解析方法进行日志异常检测的文献综述*
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00068
Scott Lupton, H. Washizaki, Nobukazu Yoshioka, Y. Fukazawa
The use of anomaly detection for log monitoring requires parsing model input features from raw, unstructured data. Log parsing methods come in many forms, but are generally categorized as being either offline or online. In this study, a systematic literature review of anomaly detection approaches utilizing online parsing methods is performed. An inventory of these approaches is taken, research gaps are explored, and suggestions for future exploration and study are presented.
使用异常检测进行日志监控需要从原始的非结构化数据中解析模型输入特征。日志解析方法有多种形式,但通常分为离线和在线两类。在本研究中,对利用在线解析方法的异常检测方法进行了系统的文献综述。对这些方法进行了盘点,探讨了研究差距,并提出了对未来探索和研究的建议。
{"title":"Literature Review on Log Anomaly Detection Approaches Utilizing Online Parsing Methodology*","authors":"Scott Lupton, H. Washizaki, Nobukazu Yoshioka, Y. Fukazawa","doi":"10.1109/APSEC53868.2021.00068","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00068","url":null,"abstract":"The use of anomaly detection for log monitoring requires parsing model input features from raw, unstructured data. Log parsing methods come in many forms, but are generally categorized as being either offline or online. In this study, a systematic literature review of anomaly detection approaches utilizing online parsing methods is performed. An inventory of these approaches is taken, research gaps are explored, and suggestions for future exploration and study are presented.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Finding repeated strings in code repositories and its applications to code-clone detection 查找代码存储库中的重复字符串及其在代码克隆检测中的应用
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00057
Yoriyuki Yamagata, Fabien Hervé, Yuji Fujiwara, Katsuro Inoue
Although researchers have created many advanced code-clone detection techniques, more effort is required to realize wide adaptation of these techniques in the industry. One of the reasons behind this is the reliance of these advanced techniques on lexing and parsing programs. Modern programming languages have complex lexical conventions and grammar, which evolve constantly. Therefore, using advanced code-clone detection techniques requires substantial and continuous effort. This paper proposes a lightweight language-independent method to detect code clones by simply finding repeated strings in a code repository, relying on neither lexing nor parsing. The proposed method is based on an efficient technique developed in a bio-informatics context to find repeated strings. We refer to the repeated strings in the source-code as weak Type-1 clones. Because the proposed technique normalizes newlines, tabs, and white spaces into a single white space, it can find clones in which newline positions or indentations are changed, as often in the case when copy-pasting occurs. Although the proposed method only finds verbatim copies, it also makes interesting observations regarding repository structures. Many developers may prefer the proposed simple approach because it is easier to understand than other advanced techniques that use heuristics, approximation, and machine learning.
尽管研究人员已经创造了许多先进的代码克隆检测技术,但要实现这些技术在工业上的广泛应用,还需要付出更多的努力。这背后的原因之一是这些高级技术依赖于词法分析和解析程序。现代编程语言具有复杂的词汇约定和语法,并且不断发展。因此,使用先进的代码克隆检测技术需要大量和持续的努力。本文提出了一种轻量级的独立于语言的方法,通过简单地在代码存储库中查找重复字符串来检测代码克隆,而不依赖于词法分析和解析。提出的方法是基于在生物信息学环境中开发的一种高效技术来查找重复字符串。我们将源代码中的重复字符串称为弱Type-1克隆。由于所建议的技术将换行符、制表符和空格规范化为单个空白,因此它可以找到换行符位置或缩进被更改的克隆,这通常发生在复制粘贴的情况下。尽管所建议的方法只查找逐字副本,但它也对存储库结构进行了有趣的观察。许多开发人员可能更喜欢建议的简单方法,因为它比使用启发式、近似和机器学习的其他高级技术更容易理解。
{"title":"Finding repeated strings in code repositories and its applications to code-clone detection","authors":"Yoriyuki Yamagata, Fabien Hervé, Yuji Fujiwara, Katsuro Inoue","doi":"10.1109/APSEC53868.2021.00057","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00057","url":null,"abstract":"Although researchers have created many advanced code-clone detection techniques, more effort is required to realize wide adaptation of these techniques in the industry. One of the reasons behind this is the reliance of these advanced techniques on lexing and parsing programs. Modern programming languages have complex lexical conventions and grammar, which evolve constantly. Therefore, using advanced code-clone detection techniques requires substantial and continuous effort. This paper proposes a lightweight language-independent method to detect code clones by simply finding repeated strings in a code repository, relying on neither lexing nor parsing. The proposed method is based on an efficient technique developed in a bio-informatics context to find repeated strings. We refer to the repeated strings in the source-code as weak Type-1 clones. Because the proposed technique normalizes newlines, tabs, and white spaces into a single white space, it can find clones in which newline positions or indentations are changed, as often in the case when copy-pasting occurs. Although the proposed method only finds verbatim copies, it also makes interesting observations regarding repository structures. Many developers may prefer the proposed simple approach because it is easier to understand than other advanced techniques that use heuristics, approximation, and machine learning.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Do Programmers Express High-Level Concepts using Primitive Data Types?
Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00043
Yusuke Shinyama, Yoshitaka Arahori, K. Gondow
We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83 % F -score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.
我们研究了程序员如何使用基本数据类型表达高级概念,如路径名和坐标。虽然过分依赖原始数据类型有时被批评为一种不好的气味,但这仍然是程序员的常见做法。我们提出了一种新的方法,通过检查API调用来准确地识别某些预定义概念的表达式。我们定义了Java Standard API中使用的12种概念类型。然后,我们从26个开源项目中获得了每个概念类型的表达式。基于得到的表达式,我们训练了一个基于决策树的分类器。它在正确预测给定表达式的概念类型方面获得了83%的F分。我们的结果表明,只要给出足够多的例子,就有可能从源代码中很好地推断出概念类型。获得的分类器可用于潜在的错误检测、测试用例生成和文档。
{"title":"How Do Programmers Express High-Level Concepts using Primitive Data Types?","authors":"Yusuke Shinyama, Yoshitaka Arahori, K. Gondow","doi":"10.1109/APSEC53868.2021.00043","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00043","url":null,"abstract":"We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83 % F -score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126687066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 28th Asia-Pacific Software Engineering Conference (APSEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1