首页 > 最新文献

Applied Computing Review最新文献

英文 中文
zCeph: Achieving High Performance On Storage System Using Small Zoned ZNS SSD zCeph:在存储系统上使用小分区ZNS SSD实现高性能
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577758
Jin-Yong Ha, H. Yeom
ZNS SSDs (Zoned Namespace SSD) are block devices that provide stable performance and low price by forcing sequential writes, however their users have to pay the price to guarantee strong write order. In addition, to get the best performance from small zoned ZNS SSDs that give the users control over device-internal parallel elements, the users need to utilize the SSDs in detail. Due to these overheads, Ceph, one of the distributed storage systems, has up to 69% lower performance when using ZNS SSDs compared to using legacy SSD. In this paper, we present zCeph which solves the problems that occur when using small zoned ZNS SSD in storage systems. We implemented zCeph based on legacy Ceph and evaluated it using synthesized and real-world workloads, showing that the performance improved by up to 4.1x and 7x, respectively, compared to the legacy Ceph using ZNS SSD.
ZNS SSD (Zoned Namespace SSD)是一种块设备,通过强制顺序写入提供稳定的性能和低廉的价格,但用户必须付出代价才能保证强写顺序。此外,为了从允许用户控制设备内部并行元素的小型分区ZNS ssd中获得最佳性能,用户需要详细地利用ssd。由于这些开销,Ceph(分布式存储系统之一)在使用ZNS SSD时,与使用传统SSD相比,性能降低了69%。在本文中,我们提出了zCeph,它解决了在存储系统中使用小分区ZNS SSD时出现的问题。我们基于旧Ceph实现了zCeph,并使用合成和实际工作负载对其进行了评估,结果表明,与使用ZNS SSD的旧Ceph相比,zCeph的性能分别提高了4.1倍和7倍。
{"title":"zCeph: Achieving High Performance On Storage System Using Small Zoned ZNS SSD","authors":"Jin-Yong Ha, H. Yeom","doi":"10.1145/3555776.3577758","DOIUrl":"https://doi.org/10.1145/3555776.3577758","url":null,"abstract":"ZNS SSDs (Zoned Namespace SSD) are block devices that provide stable performance and low price by forcing sequential writes, however their users have to pay the price to guarantee strong write order. In addition, to get the best performance from small zoned ZNS SSDs that give the users control over device-internal parallel elements, the users need to utilize the SSDs in detail. Due to these overheads, Ceph, one of the distributed storage systems, has up to 69% lower performance when using ZNS SSDs compared to using legacy SSD. In this paper, we present zCeph which solves the problems that occur when using small zoned ZNS SSD in storage systems. We implemented zCeph based on legacy Ceph and evaluated it using synthesized and real-world workloads, showing that the performance improved by up to 4.1x and 7x, respectively, compared to the legacy Ceph using ZNS SSD.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89732334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing Power Consumption during Server Maintenance on Edge Computing Infrastructures 降低边缘计算基础设施维护服务器的功耗
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577739
Felipe Rubin, Paulo S. Souza, T. Ferreto
Edge servers must routinely undergo maintenance to ensure the environment's performance and security. During maintenance, applications hosted by outdated servers must be relocated to alternative servers to avoid downtime. In distributed edges with servers spread across large regions, ensuring that applications are not migrated to servers too far away from their users to avoid high latency hardens the maintenance planning. In addition, the limited power supply of edge sites restricts the list of suitable alternative hosts for the applications even further. Past work has focused on optimizing maintenance or increasing the power efficiency of edge computing infrastructures. However, no work addresses both objectives together. This paper presents Emma, a maintenance strategy that reduces power consumption during edge server maintenance without excessively extending maintenance time or increasing application latency. Experiments show that Emma can minimize power consumption during maintenance by up to 26.48% compared to strategies from the literature.
边缘服务器必须定期进行维护,以确保环境的性能和安全性。在维护期间,必须将由过时服务器托管的应用程序重新定位到替代服务器,以避免停机。在服务器分布在大区域的分布式边缘中,确保应用程序不会迁移到离用户太远的服务器上,以避免高延迟,这加强了维护计划。此外,边缘站点的有限电源进一步限制了应用程序的合适替代主机列表。过去的工作主要集中在优化维护或提高边缘计算基础设施的功率效率上。然而,没有一项工作能同时解决这两个目标。本文介绍了Emma,这是一种维护策略,可以在不过度延长维护时间或增加应用程序延迟的情况下降低边缘服务器维护期间的功耗。实验表明,与文献中的策略相比,Emma可以将维护期间的功耗降低26.48%。
{"title":"Reducing Power Consumption during Server Maintenance on Edge Computing Infrastructures","authors":"Felipe Rubin, Paulo S. Souza, T. Ferreto","doi":"10.1145/3555776.3577739","DOIUrl":"https://doi.org/10.1145/3555776.3577739","url":null,"abstract":"Edge servers must routinely undergo maintenance to ensure the environment's performance and security. During maintenance, applications hosted by outdated servers must be relocated to alternative servers to avoid downtime. In distributed edges with servers spread across large regions, ensuring that applications are not migrated to servers too far away from their users to avoid high latency hardens the maintenance planning. In addition, the limited power supply of edge sites restricts the list of suitable alternative hosts for the applications even further. Past work has focused on optimizing maintenance or increasing the power efficiency of edge computing infrastructures. However, no work addresses both objectives together. This paper presents Emma, a maintenance strategy that reduces power consumption during edge server maintenance without excessively extending maintenance time or increasing application latency. Experiments show that Emma can minimize power consumption during maintenance by up to 26.48% compared to strategies from the literature.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89677045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Student Research Abstract: SplitChain: Blockchain with fully decentralized dynamic sharding resilient to fast adaptive adversaries 摘要:SplitChain:具有完全去中心化动态分片的区块链,具有快速适应对手的弹性
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577207
Arthur Rauch
Over the past few years, blockchains have captured the public's interest with the promise of pseudo-anonymous decentralized exchange infrastructures. However, their potential is hindered by various technical issues, such as their ability to scale with problematic storage and communication costs and a fairly low transaction throughput. In this paper, we present SplitChain, a protocol intended to support the creation of scalable account-based blockchains without undermining decentralization and security. This is achieved by using sharding, i.e. by splitting the blockchain into several lighter chains managed by their own disjoint sets of validators called shards. These shards balance the load by processing disjoint sets of transactions in parallel. SplitChain distinguishes itself from other sharded blockchains by minimizing the synchronization constraints among shards while maintaining security guarantees. Finally, the protocol is designed to dynamically adapt the number of shards to the system load, to avoid over-dimensioning issues of most of the existing sharding-based solutions where the number of shards is static.
在过去的几年里,区块链以其伪匿名去中心化交换基础设施的承诺吸引了公众的兴趣。然而,它们的潜力受到各种技术问题的阻碍,例如它们在有问题的存储和通信成本以及相当低的事务吞吐量的情况下进行扩展的能力。在本文中,我们介绍了SplitChain,这是一种旨在支持创建可扩展的基于账户的区块链而不破坏去中心化和安全性的协议。这是通过使用分片来实现的,即将区块链分成几个更轻的链,由它们自己的不相交的验证器集(称为分片)管理。这些分片通过并行处理不相交的事务集来平衡负载。SplitChain与其他分片区块链的区别在于最小化分片之间的同步约束,同时保持安全保证。最后,该协议被设计为动态地适应系统负载的分片数量,以避免大多数现有的基于分片的解决方案中分片数量是静态的过维问题。
{"title":"Student Research Abstract: SplitChain: Blockchain with fully decentralized dynamic sharding resilient to fast adaptive adversaries","authors":"Arthur Rauch","doi":"10.1145/3555776.3577207","DOIUrl":"https://doi.org/10.1145/3555776.3577207","url":null,"abstract":"Over the past few years, blockchains have captured the public's interest with the promise of pseudo-anonymous decentralized exchange infrastructures. However, their potential is hindered by various technical issues, such as their ability to scale with problematic storage and communication costs and a fairly low transaction throughput. In this paper, we present SplitChain, a protocol intended to support the creation of scalable account-based blockchains without undermining decentralization and security. This is achieved by using sharding, i.e. by splitting the blockchain into several lighter chains managed by their own disjoint sets of validators called shards. These shards balance the load by processing disjoint sets of transactions in parallel. SplitChain distinguishes itself from other sharded blockchains by minimizing the synchronization constraints among shards while maintaining security guarantees. Finally, the protocol is designed to dynamically adapt the number of shards to the system load, to avoid over-dimensioning issues of most of the existing sharding-based solutions where the number of shards is static.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89892912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Free Willy: Prune System Calls to Enhance Software Security Free Willy:减少系统调用以增强软件安全性
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577593
Charlie Groh, Sergej Proskurin, Apostolis Zarras
Many privilege escalation exploits on Linux abuse vulnerable system calls to threaten the system's security. Therefore, various static and dynamic analysis based seccomp policy generation frameworks emerged. Yet, they either focus on a subset of the available binaries or are constrained by the inherent properties of dynamic, testing-based analysis, which are prone to false negatives. In this paper, we present Jesse, a static-analysis-based framework for generating seccomp policies for ELF binaries. We design and implement an abstract-interpretation-based constant propagation that helps the analyst identify vital system calls for arbitrary, non-obfuscated binaries. Using the extracted results, Jesse allows producing effective seccomp policies, reducing the system's attack vector. To assess Jesse's effectiveness and accuracy, we have applied our system to over 1,000 ELF binaries for Debian 10, and show that---contrary to existing solutions---Jesse produces accurate and safely approximated results, without relying on any properties of the target binaries. In addition, we conduct a case study in which we combine Jesse's constant propagation strategy with container debloating techniques to produce seccomp policies that restrict up to five times more system calls than the Docker's default seccomp policy on average.
Linux上的许多特权升级利用滥用易受攻击的系统调用来威胁系统的安全性。因此,出现了各种基于静态和动态分析的seccomp策略生成框架。然而,它们要么专注于可用二进制文件的一个子集,要么受到动态的、基于测试的分析的固有属性的限制,这些属性容易产生假阴性。在本文中,我们提出了Jesse,一个基于静态分析的框架,用于为ELF二进制文件生成次要策略。我们设计并实现了一个基于抽象解释的常量传播,它可以帮助分析人员识别任意的、非混淆的二进制文件的重要系统调用。使用提取的结果,Jesse允许生成有效的seccomp策略,减少系统的攻击向量。为了评估Jesse的有效性和准确性,我们将我们的系统应用于Debian 10的1000多个ELF二进制文件,并表明——与现有解决方案相反——Jesse产生了准确且安全的近似结果,而不依赖于目标二进制文件的任何属性。此外,我们还进行了一个案例研究,在这个案例中,我们将Jesse的恒定传播策略与容器膨胀技术结合起来,生成了seccomp策略,该策略限制的系统调用平均最多是Docker默认seccomp策略的五倍。
{"title":"Free Willy: Prune System Calls to Enhance Software Security","authors":"Charlie Groh, Sergej Proskurin, Apostolis Zarras","doi":"10.1145/3555776.3577593","DOIUrl":"https://doi.org/10.1145/3555776.3577593","url":null,"abstract":"Many privilege escalation exploits on Linux abuse vulnerable system calls to threaten the system's security. Therefore, various static and dynamic analysis based seccomp policy generation frameworks emerged. Yet, they either focus on a subset of the available binaries or are constrained by the inherent properties of dynamic, testing-based analysis, which are prone to false negatives. In this paper, we present Jesse, a static-analysis-based framework for generating seccomp policies for ELF binaries. We design and implement an abstract-interpretation-based constant propagation that helps the analyst identify vital system calls for arbitrary, non-obfuscated binaries. Using the extracted results, Jesse allows producing effective seccomp policies, reducing the system's attack vector. To assess Jesse's effectiveness and accuracy, we have applied our system to over 1,000 ELF binaries for Debian 10, and show that---contrary to existing solutions---Jesse produces accurate and safely approximated results, without relying on any properties of the target binaries. In addition, we conduct a case study in which we combine Jesse's constant propagation strategy with container debloating techniques to produce seccomp policies that restrict up to five times more system calls than the Docker's default seccomp policy on average.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90065825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling a Conversational Agent using BDI Framework 使用BDI框架对会话代理建模
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577657
Alexandre Yukio Ichida, Felipe Meneguzzi
Building conversational agents to help humans in domain-specific tasks is challenging since the agent needs to understand the natural language and act over it while accessing domain expert knowledge. Modern natural language processing techniques led to an expansion of conversational agents, with recent pretrained language models achieving increasingly accurate language recognition results using ever-larger open datasets. However, the black-box nature of such pretrained language models obscures the agent's reasoning and its motivations when responding, leading to unexplained dialogues. We develop a belief-desire-intention (BDI) agent as a task-oriented dialogue system to introduce mental attitudes similar to humans describing their behavior during a dialogue. We compare the resulting model with a pipeline dialogue model by leveraging existing components from dialogue systems and developing the agent's intention selection as a dialogue policy. We show that combining traditional agent modelling approaches, such as BDI, with more recent learning techniques can result in efficient and scrutable dialogue systems.
构建会话代理来帮助人类完成特定领域的任务是具有挑战性的,因为代理需要理解自然语言并在访问领域专家知识时对其进行操作。现代自然语言处理技术导致了对话代理的扩展,最近的预训练语言模型使用越来越大的开放数据集实现了越来越准确的语言识别结果。然而,这种预训练语言模型的黑箱性质模糊了智能体在响应时的推理和动机,导致无法解释的对话。我们开发了一个信念-欲望-意图(BDI)代理作为一个任务导向的对话系统,以引入类似于人类在对话中描述他们的行为的心理态度。我们通过利用对话系统中的现有组件并将代理的意图选择作为对话策略,将生成的模型与管道对话模型进行比较。我们表明,将传统的智能体建模方法(如BDI)与最新的学习技术相结合,可以产生高效且可分析的对话系统。
{"title":"Modeling a Conversational Agent using BDI Framework","authors":"Alexandre Yukio Ichida, Felipe Meneguzzi","doi":"10.1145/3555776.3577657","DOIUrl":"https://doi.org/10.1145/3555776.3577657","url":null,"abstract":"Building conversational agents to help humans in domain-specific tasks is challenging since the agent needs to understand the natural language and act over it while accessing domain expert knowledge. Modern natural language processing techniques led to an expansion of conversational agents, with recent pretrained language models achieving increasingly accurate language recognition results using ever-larger open datasets. However, the black-box nature of such pretrained language models obscures the agent's reasoning and its motivations when responding, leading to unexplained dialogues. We develop a belief-desire-intention (BDI) agent as a task-oriented dialogue system to introduce mental attitudes similar to humans describing their behavior during a dialogue. We compare the resulting model with a pipeline dialogue model by leveraging existing components from dialogue systems and developing the agent's intention selection as a dialogue policy. We show that combining traditional agent modelling approaches, such as BDI, with more recent learning techniques can result in efficient and scrutable dialogue systems.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74685130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Taxonomy Mapping for Document Classification 用于文档分类的零采样分类法映射
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577653
L. Bongiovanni, Luca Bruno, Fabrizio Dominici, Giuseppe Rizzo
Classification of documents according to a custom internal hierarchical taxonomy is a common problem for many organizations that deal with textual data. Approaches aimed to address this challenge are, for the vast majority, supervised methods, which have the advantage of producing good results on specific datasets, but the major drawbacks of requiring an entire corpus of annotated documents, and the resulting models are not directly applicable to a different taxonomy. In this paper, we aim to contribute to this important issue, by proposing a method to classify text according to a custom hierarchical taxonomy entirely without the need of labelled data. The idea is to first leverage the semantic information encoded into pre-trained Deep Language Models to assigned a prior relevance score for each label of the taxonomy using zero-shot, and secondly take advantage of the hierarchical structure to reinforce this prior belief. Experiments are conducted on three hierarchically annotated datasets: WebOfScience, DBpedia Extracts and Amazon Product Reviews, which are very diverse in the type of language adopted and have taxonomy depth of two and three levels. We first compare different zero-shot methods, and then we show that our hierarchy-aware approach substantially improves results across every dataset.
对于许多处理文本数据的组织来说,根据自定义的内部分层分类法对文档进行分类是一个常见问题。对于大多数人来说,旨在解决这一挑战的方法是监督方法,它具有在特定数据集上产生良好结果的优势,但主要缺点是需要整个带注释的文档语料库,并且所得到的模型不能直接适用于不同的分类法。在本文中,我们的目标是通过提出一种完全不需要标记数据就可以根据自定义层次分类法对文本进行分类的方法来解决这个重要问题。其思想是首先利用编码到预训练深度语言模型中的语义信息,使用zero-shot为分类法的每个标签分配一个先验相关分数,然后利用分层结构来强化这种先验信念。实验在WebOfScience、DBpedia extract和Amazon Product Reviews三个分层标注的数据集上进行,这三个数据集采用的语言类型非常多样,分类深度分别为二级和三级。我们首先比较了不同的零射击方法,然后我们展示了我们的层次感知方法大大提高了每个数据集的结果。
{"title":"Zero-Shot Taxonomy Mapping for Document Classification","authors":"L. Bongiovanni, Luca Bruno, Fabrizio Dominici, Giuseppe Rizzo","doi":"10.1145/3555776.3577653","DOIUrl":"https://doi.org/10.1145/3555776.3577653","url":null,"abstract":"Classification of documents according to a custom internal hierarchical taxonomy is a common problem for many organizations that deal with textual data. Approaches aimed to address this challenge are, for the vast majority, supervised methods, which have the advantage of producing good results on specific datasets, but the major drawbacks of requiring an entire corpus of annotated documents, and the resulting models are not directly applicable to a different taxonomy. In this paper, we aim to contribute to this important issue, by proposing a method to classify text according to a custom hierarchical taxonomy entirely without the need of labelled data. The idea is to first leverage the semantic information encoded into pre-trained Deep Language Models to assigned a prior relevance score for each label of the taxonomy using zero-shot, and secondly take advantage of the hierarchical structure to reinforce this prior belief. Experiments are conducted on three hierarchically annotated datasets: WebOfScience, DBpedia Extracts and Amazon Product Reviews, which are very diverse in the type of language adopted and have taxonomy depth of two and three levels. We first compare different zero-shot methods, and then we show that our hierarchy-aware approach substantially improves results across every dataset.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79385181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese 葡萄牙语肿瘤健康记录的生物医学实体提取管道
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3578577
Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes
Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.
癌症患者的文本健康记录通常是冗长且高度无结构的,这使得卫生专业人员对患者的治疗过程进行完整的概述非常耗时。由于这些限制可能导致次优和/或低效的治疗程序,医疗保健提供者将从有效总结这些记录信息的系统中受益匪浅。随着深度神经模型的出现,这一目标在英语临床文本中已经部分实现,然而,对于资源有限的语言,研究界仍然缺乏有效的解决方案。在本文中,我们提出了我们开发的方法,从欧洲葡萄牙语写的肿瘤健康记录中提取程序,药物和疾病。该项目是与葡萄牙肿瘤研究所合作开展的,该研究所除了保存了10多年得到适当保护的医疗记录外,还在整个项目开发过程中提供了肿瘤学家的专业知识。由于没有葡萄牙语生物医学实体提取的注释语料库,我们还提出了我们在为模型开发注释语料库时遵循的策略。最终的模型结合了神经结构和实体链接,在程序、药物和疾病的提及提取方面分别获得了88.6、95.0和55.8%的F1分数。
{"title":"A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese","authors":"Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes","doi":"10.1145/3555776.3578577","DOIUrl":"https://doi.org/10.1145/3555776.3578577","url":null,"abstract":"Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75786805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep-Learning based Trust Management with Self-Adaptation in the Internet of Behavior 行为网络中基于深度学习的自适应信任管理
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577694
Hind Bangui, Emilia Cioroaica, Mouzhi Ge, Barbora Buhnova
Internet of Behavior (IoB) has emerged as a new research paradigm within the context of digital ecosystems, with the support for understanding and positively influencing human behavior by merging behavioral sciences with information technology, and fostering mutual trust building between humans and technology. For example, when automated systems identify improper human driving behavior, IoB can support integrated behavioral adaptation to avoid driving risks that could lead to hazardous situations. In this paper, we propose an ecosystem-level self-adaptation mechanism that aims to provide runtime evidence for trust building in interaction among IoB elements. Our approach employs an indirect trust management scheme based on deep learning, which has the ability to mimic human behaviour and trust building patterns. In order to validate the model, we consider Pay-How-You-Drive vehicle insurance as a showcase of a IoB application aiming to advance the adaptation of business incentives based on improving driver behavior profiling. The experimental results show that the proposed model can identify different driving states with high accuracy, to support the IoB applications.
行为互联网(Internet of Behavior, IoB)是数字生态系统背景下的一种新的研究范式,通过将行为科学与信息技术相结合,促进人与技术之间的相互信任,支持理解和积极影响人类行为。例如,当自动系统识别出人类不当驾驶行为时,IoB可以支持综合行为适应,以避免可能导致危险情况的驾驶风险。在本文中,我们提出了一种生态系统级的自适应机制,旨在为IoB元素之间相互作用中的信任建立提供运行时证据。我们的方法采用了基于深度学习的间接信任管理方案,该方案具有模仿人类行为和信任建立模式的能力。为了验证该模型,我们将按需付费汽车保险作为IoB应用的一个展示,该应用旨在通过改进驾驶员行为分析来促进商业激励的适应。实验结果表明,该模型能较准确地识别不同的驾驶状态,支持IoB应用。
{"title":"Deep-Learning based Trust Management with Self-Adaptation in the Internet of Behavior","authors":"Hind Bangui, Emilia Cioroaica, Mouzhi Ge, Barbora Buhnova","doi":"10.1145/3555776.3577694","DOIUrl":"https://doi.org/10.1145/3555776.3577694","url":null,"abstract":"Internet of Behavior (IoB) has emerged as a new research paradigm within the context of digital ecosystems, with the support for understanding and positively influencing human behavior by merging behavioral sciences with information technology, and fostering mutual trust building between humans and technology. For example, when automated systems identify improper human driving behavior, IoB can support integrated behavioral adaptation to avoid driving risks that could lead to hazardous situations. In this paper, we propose an ecosystem-level self-adaptation mechanism that aims to provide runtime evidence for trust building in interaction among IoB elements. Our approach employs an indirect trust management scheme based on deep learning, which has the ability to mimic human behaviour and trust building patterns. In order to validate the model, we consider Pay-How-You-Drive vehicle insurance as a showcase of a IoB application aiming to advance the adaptation of business incentives based on improving driver behavior profiling. The experimental results show that the proposed model can identify different driving states with high accuracy, to support the IoB applications.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80312584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A DTW Approach for Complex Data A Case Study with Network Data Streams 复杂数据的DTW方法——以网络数据流为例
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577638
Paula Raissa Silva, João Vinagre, J. Gama
Dynamic Time Warping (DTW) is a robust method to measure the similarity between two sequences. This paper proposes a method based on DTW to analyse high-speed data streams. The central idea is to decompose the network traffic into sequences of histograms of packet sizes and then calculate the distance between pairs of such sequences using DTW with Kullback-Leibler (KL) distance. As a baseline, we also compute the Euclidean Distance between the sequences of histograms. Since our preliminary experiments indicate that the distance between two sequences falls within a different range of values for distinct types of streams, we then exploit this distance information for stream classification using a Random Forest. The approach was investigated using recent internet traffic data from a telecommunications company. To illustrate the application of our approach, we conducted a case study with encrypted Internet Protocol Television (IPTV) network traffic data. The goal was to use our DTW-based approach to detect the video codec used in the streams, as well as the IPTV channel. Results strongly suggest that the DTW distance value between the data streams is highly informative for such classification tasks.
动态时间翘曲(DTW)是一种鲁棒的序列相似性度量方法。提出了一种基于DTW的高速数据流分析方法。其核心思想是将网络流量分解为数据包大小的直方图序列,然后使用具有KL距离的DTW计算这些序列对之间的距离。作为基线,我们还计算直方图序列之间的欧几里得距离。由于我们的初步实验表明,对于不同类型的流,两个序列之间的距离落在不同的值范围内,因此我们随后利用该距离信息使用随机森林进行流分类。研究人员利用一家电信公司最近的互联网流量数据对这种方法进行了调查。为了说明我们的方法的应用,我们对加密的互联网协议电视(IPTV)网络流量数据进行了一个案例研究。我们的目标是使用基于dtw的方法来检测流中使用的视频编解码器,以及IPTV频道。结果强烈表明,数据流之间的DTW距离值对于此类分类任务具有很高的信息量。
{"title":"A DTW Approach for Complex Data A Case Study with Network Data Streams","authors":"Paula Raissa Silva, João Vinagre, J. Gama","doi":"10.1145/3555776.3577638","DOIUrl":"https://doi.org/10.1145/3555776.3577638","url":null,"abstract":"Dynamic Time Warping (DTW) is a robust method to measure the similarity between two sequences. This paper proposes a method based on DTW to analyse high-speed data streams. The central idea is to decompose the network traffic into sequences of histograms of packet sizes and then calculate the distance between pairs of such sequences using DTW with Kullback-Leibler (KL) distance. As a baseline, we also compute the Euclidean Distance between the sequences of histograms. Since our preliminary experiments indicate that the distance between two sequences falls within a different range of values for distinct types of streams, we then exploit this distance information for stream classification using a Random Forest. The approach was investigated using recent internet traffic data from a telecommunications company. To illustrate the application of our approach, we conducted a case study with encrypted Internet Protocol Television (IPTV) network traffic data. The goal was to use our DTW-based approach to detect the video codec used in the streams, as well as the IPTV channel. Results strongly suggest that the DTW distance value between the data streams is highly informative for such classification tasks.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77968707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aging and rejuvenating strategies for fading windows in multi-label classification on data streams 数据流多标签分类中衰落窗口的老化与恢复策略
IF 1 Pub Date : 2023-03-27 DOI: 10.1145/3555776.3577625
M. Roseberry, S. Džeroski, A. Bifet, Alberto Cano
Combining the challenges of streaming data and multi-label learning, the task of mining a drifting, multi-label data stream requires methods that can accurately predict labelsets, adapt to various types of concept drift and run fast enough to process each data point before the next arrives. To achieve greater accuracy, many multi-label algorithms use computationally expensive techniques, such as multiple adaptive windows, with little concern for runtime and memory complexity. We present Aging and Rejuvenating kNN (ARkNN) which uses simple resources and efficient strategies to weight instances based on age, predictive performance, and similarity to the incoming data. We break down ARkNN into its component strategies to show the impact of each and experimentally compare ARkNN to seven state-of-the-art methods for learning from multi-label data streams. We demonstrate that it is possible to achieve competitive performance in multi-label classification on streams without sacrificing runtime and memory use, and without using complex and computationally expensive dual memory strategies.
结合流数据和多标签学习的挑战,挖掘漂移、多标签数据流的任务需要能够准确预测标签集的方法,适应各种类型的概念漂移,并且运行速度足够快,以便在下一个数据点到来之前处理每个数据点。为了获得更高的准确性,许多多标签算法使用计算成本较高的技术,例如多个自适应窗口,很少考虑运行时和内存复杂性。我们提出了老化和恢复kNN (ARkNN),它使用简单的资源和有效的策略来基于年龄、预测性能和与传入数据的相似性来加权实例。我们将ARkNN分解为其组成策略,以显示每个策略的影响,并通过实验将ARkNN与七个最先进的多标签数据流学习方法进行比较。我们证明,在不牺牲运行时和内存使用的情况下,在流上实现多标签分类的竞争性能是可能的,并且不使用复杂和计算昂贵的双内存策略。
{"title":"Aging and rejuvenating strategies for fading windows in multi-label classification on data streams","authors":"M. Roseberry, S. Džeroski, A. Bifet, Alberto Cano","doi":"10.1145/3555776.3577625","DOIUrl":"https://doi.org/10.1145/3555776.3577625","url":null,"abstract":"Combining the challenges of streaming data and multi-label learning, the task of mining a drifting, multi-label data stream requires methods that can accurately predict labelsets, adapt to various types of concept drift and run fast enough to process each data point before the next arrives. To achieve greater accuracy, many multi-label algorithms use computationally expensive techniques, such as multiple adaptive windows, with little concern for runtime and memory complexity. We present Aging and Rejuvenating kNN (ARkNN) which uses simple resources and efficient strategies to weight instances based on age, predictive performance, and similarity to the incoming data. We break down ARkNN into its component strategies to show the impact of each and experimentally compare ARkNN to seven state-of-the-art methods for learning from multi-label data streams. We demonstrate that it is possible to achieve competitive performance in multi-label classification on streams without sacrificing runtime and memory use, and without using complex and computationally expensive dual memory strategies.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76923236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Applied Computing Review
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1