首页 > 最新文献

Big Data Research最新文献

英文 中文
Data Stream Classification Based on Extreme Learning Machine: A Review 基于极限学习机的数据流分类研究综述
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100356
Xiulin Zheng , Peipei Li , Xindong Wu

Many daily applications are generating massive amount of data in the form of stream at an ever higher speed, such as medical data, clicking stream, internet record and banking transaction, etc. In contrast to the traditional static data, data streams are of some inherent properties, to name a few, infinite length, concept drift, multiple labels and concept evolution. Among all the data mining tasks, classification is one of the basic topics in data stream mining and has gained more and more attentions among different research communities. Extreme Learning Machine (ELM) has drawn much interests in data classification due to its high efficiency, universal approximation capability, generalization ability, and simplicity, which have greatly inspired the development of many ELM-based algorithms and their applications during the past decades. In this paper, we mainly provide a comprehensive review on ELM theoretical research and its variants in data stream classification, and categorize these algorithms from different perspectives. Firstly, we briefly introduce the basic principles of ELM and its characteristics. Secondly, we give an overview of different ELM variants to address the particular issues of data stream classification. Thirdly, we present an overview of different strategies to optimize the ELM, which have further improved the stability, accuracy and generalization ability of ELM, and briefly introduce some practical applications of ELM in data stream classification. Finally, we conduct several groups of experiments to compare the performance of ELM based models addressing the focused issues. Also, the open issues and prospects of ELM models used for stream classification are discussed, which are worthwhile to be further studied in the future.

许多日常应用正在以越来越快的速度以流的形式产生大量的数据,如医疗数据、点击流、互联网记录和银行交易等。与传统的静态数据相比,数据流具有一些固有的属性,例如无限长度,概念漂移,多重标签和概念演变。在所有数据挖掘任务中,分类是数据流挖掘的基本课题之一,越来越受到各研究领域的关注。极限学习机(Extreme Learning Machine, ELM)以其高效、通用逼近能力、泛化能力和简单性等特点在数据分类领域引起了广泛的关注,在过去几十年里极大地激发了许多基于极限学习机的算法的发展及其应用。本文主要对数据流分类中的ELM理论研究及其变体进行了综述,并从不同的角度对这些算法进行了分类。首先简要介绍了ELM的基本原理及其特点。其次,我们概述了不同的ELM变体,以解决数据流分类的特定问题。第三,综述了各种优化ELM的策略,这些策略进一步提高了ELM的稳定性、准确性和泛化能力,并简要介绍了ELM在数据流分类中的一些实际应用。最后,我们进行了几组实验来比较基于ELM的模型解决重点问题的性能。最后,对ELM模型在河流分类中的应用存在的问题和前景进行了讨论,值得今后进一步研究。
{"title":"Data Stream Classification Based on Extreme Learning Machine: A Review","authors":"Xiulin Zheng ,&nbsp;Peipei Li ,&nbsp;Xindong Wu","doi":"10.1016/j.bdr.2022.100356","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100356","url":null,"abstract":"<div><p>Many daily applications are generating massive amount of data in the form of stream at an ever higher speed, such as medical data, clicking stream, internet record and banking transaction, etc. In contrast to the traditional static data, data streams are of some inherent properties, to name a few, infinite length, concept drift, multiple labels and concept evolution. Among all the data mining tasks<span><span>, classification is one of the basic topics in data stream mining and has gained more and more attentions among different research communities. Extreme Learning Machine<span> (ELM) has drawn much interests in data classification due to its high efficiency, universal approximation capability, </span></span>generalization ability<span>, and simplicity, which have greatly inspired the development of many ELM-based algorithms and their applications during the past decades. In this paper, we mainly provide a comprehensive review on ELM theoretical research and its variants in data stream classification, and categorize these algorithms from different perspectives. Firstly, we briefly introduce the basic principles of ELM and its characteristics. Secondly, we give an overview of different ELM variants to address the particular issues of data stream classification. Thirdly, we present an overview of different strategies to optimize the ELM, which have further improved the stability, accuracy and generalization ability of ELM, and briefly introduce some practical applications of ELM in data stream classification. Finally, we conduct several groups of experiments to compare the performance of ELM based models addressing the focused issues. Also, the open issues and prospects of ELM models used for stream classification are discussed, which are worthwhile to be further studied in the future.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Augmented Functional Analysis of Variance (A-fANOVA): Theory and Application to Google Trends for Detecting Differences in Abortion Drugs Queries 增强功能方差分析(A-fANOVA):谷歌趋势检测流产药物查询差异的理论与应用
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100354
Fabrizio Maturo , Annamaria Porreca

The World Wide Web (WWW) has become a popular and readily accessible big data source in recent decades. The information in the WWW is offered in many different types, e.g. Google Trends, which provides deep insights into people's search queries in the Google Search engine. Analysing this kind of data is not straightforward because they usually take the form of high-dimensional data, given that the latter can be collected over extensive periods. Comparing Google Trends' means of different groups of people or Countries can help understand many phenomena and provide very appealing insights into populations' interests in specific periods and areas. However, appropriate statistical techniques should be adopted when inspecting and testing differences in such data due to the well-known curse of dimensionality. This paper suggests an original approach to dealing with Google Trends by concentrating on the search for the “Cytotec” abortion drug. The final purpose of the application is to determine if different Countries' abortion legislation can influence the research trends. This research focuses on Functional Data Analysis (FDA) to deal with high-dimensional data and proposes a generalisation of the classical functional analysis of variance model, namely the Augmented Functional Analysis of Variance (A-fANOVA). To test the existence of statistically significant differences among groups of Countries, A-fANOVA considers additional curves' characteristics provided by the velocity and acceleration of the original google queries over time. The proposed methodology appears to be intriguing for capturing additional information about curves' behaviours with the final aim of offering a monitoring tool for policy-makers.

近几十年来,万维网(WWW)已经成为一个流行且易于访问的大数据源。WWW上的信息以许多不同的类型提供,例如b谷歌Trends,它提供了对人们在谷歌搜索引擎上的搜索查询的深刻见解。分析这类数据并不简单,因为它们通常采用高维数据的形式,而后者可以在很长一段时间内收集。比较谷歌Trends对不同人群或国家的方法可以帮助理解许多现象,并对特定时期和地区的人群兴趣提供非常有吸引力的见解。然而,由于众所周知的维度诅咒,在检查和测试这些数据中的差异时,应采用适当的统计技术。本文提出了一种处理谷歌趋势的原始方法,即集中搜索“Cytotec”堕胎药物。申请的最终目的是确定不同国家的堕胎立法是否会影响研究趋势。本研究聚焦于功能数据分析(Functional Data Analysis, FDA)来处理高维数据,并提出了经典方差的功能分析模型的推广,即增强功能方差分析(Augmented Functional Analysis of variance, a - fanova)。为了检验国家组之间是否存在统计学上的显著差异,A-fANOVA考虑了原始谷歌查询随时间的速度和加速度所提供的附加曲线特征。所提出的方法似乎很有趣,因为它可以捕获曲线行为的额外信息,最终目的是为政策制定者提供一种监测工具。
{"title":"Augmented Functional Analysis of Variance (A-fANOVA): Theory and Application to Google Trends for Detecting Differences in Abortion Drugs Queries","authors":"Fabrizio Maturo ,&nbsp;Annamaria Porreca","doi":"10.1016/j.bdr.2022.100354","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100354","url":null,"abstract":"<div><p>The World Wide Web (WWW) has become a popular and readily accessible big data source in recent decades. The information in the WWW is offered in many different types, e.g. Google Trends, which provides deep insights into people's search queries in the Google Search engine. Analysing this kind of data is not straightforward because they usually take the form of high-dimensional data, given that the latter can be collected over extensive periods. Comparing Google Trends' means of different groups of people or Countries can help understand many phenomena and provide very appealing insights into populations' interests in specific periods and areas. However, appropriate statistical techniques should be adopted when inspecting and testing differences in such data due to the well-known curse of dimensionality. This paper suggests an original approach to dealing with Google Trends by concentrating on the search for the “<em>Cytotec</em><span>” abortion drug. The final purpose of the application is to determine if different Countries' abortion legislation can influence the research trends. This research focuses on Functional Data Analysis (FDA) to deal with high-dimensional data and proposes a generalisation of the classical functional analysis of variance model, namely the Augmented Functional Analysis of Variance (A-fANOVA). To test the existence of statistically significant differences among groups of Countries, A-fANOVA considers additional curves' characteristics provided by the velocity and acceleration of the original google queries over time. The proposed methodology appears to be intriguing for capturing additional information about curves' behaviours with the final aim of offering a monitoring tool for policy-makers.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Embedding Model for Knowledge Graph Completion Based on Graph Sub-Hop Convolutional Network 基于图子跳卷积网络的知识图补全嵌入模型
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100351
Haitao He , Haoran Niu , Jianzhou Feng , Junlan Nie , Yangsen Zhang , Jiadong Ren

The research on knowledge graph completion based on representation learning is increasingly dependent on the node structural feature in the graph. However, a large number of nodes have few immediate neighbors, resulting in the node features unable to be fully expressed. Hence, multi-hop structure features are crucial to the representation learning of nodes. GCN (Graph Convolutional Network) is a graph embedding model that can introduce the multi-hop structure. However, the multi-hop information transmitted between GCN layers suffers a lot of losses. This would lead to the insufficient mining of the node structure features and semantic feature association among entities, further reducing the efficiency of graph knowledge completion. A gate-controlled graph sub-hop convolutional network model for knowledge graph completion is proposed to fill these research gaps. Firstly, a graph sub-hop convolutional network based on matrix representation is designed, which can transmit multi-hop neighbor features directly to the encoded node vector to avoid a large loss of features during multi-hop transmission. On this basis, the implicit multi-hop relations are explicitly embedded into the model based on the TransE. In the process of each hop convolution, aiming at the accumulation of noise redundancy caused by the increase of the receptive field, a sub-hop gate mechanism strategy is proposed to filter information. Finally, the linear model is used to decode the encoded nodes and then complete the knowledge graph. We carried out experimental comparison and analysis on WN18RR, FB15k-237, UMLS, and KINSHIP datasets. The results show that the embedding method based on the sub-hop structural information fusion can greatly improve the results of link prediction.

基于表示学习的知识图补全研究越来越依赖于图中节点的结构特征。然而,由于大量节点的近邻很少,导致节点特征无法得到充分表达。因此,多跳结构特征对节点的表示学习至关重要。GCN(图卷积网络)是一种引入多跳结构的图嵌入模型。然而,在GCN层之间传输的多跳信息存在很大的损失。这将导致实体之间的节点结构特征和语义特征关联挖掘不足,进一步降低图知识补全的效率。提出了一种用于知识图补全的门控图子跳卷积网络模型来填补这些研究空白。首先,设计了一种基于矩阵表示的图子跳卷积网络,该网络可以将多跳邻居特征直接传输到编码的节点向量上,避免了多跳传输过程中特征的大量丢失;在此基础上,隐式多跳关系被显式嵌入到基于TransE的模型中。在每跳卷积过程中,针对接收野增大导致的噪声冗余积累,提出了一种子跳门机制策略对信息进行过滤。最后,利用线性模型对编码节点进行解码,完成知识图谱。我们对WN18RR、FB15k-237、UMLS和KINSHIP数据集进行了实验比较和分析。结果表明,基于子跳结构信息融合的嵌入方法可以大大提高链路预测的结果。
{"title":"An Embedding Model for Knowledge Graph Completion Based on Graph Sub-Hop Convolutional Network","authors":"Haitao He ,&nbsp;Haoran Niu ,&nbsp;Jianzhou Feng ,&nbsp;Junlan Nie ,&nbsp;Yangsen Zhang ,&nbsp;Jiadong Ren","doi":"10.1016/j.bdr.2022.100351","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100351","url":null,"abstract":"<div><p>The research on knowledge graph completion based on representation learning<span><span> is increasingly dependent on the node structural feature in the graph. However, a large number of nodes have few immediate neighbors, resulting in the node features unable to be fully expressed. Hence, multi-hop structure features are crucial to the representation learning of nodes. GCN (Graph Convolutional Network) is a graph embedding model that can introduce the multi-hop structure. However, the multi-hop information transmitted between GCN layers suffers a lot of losses. This would lead to the insufficient mining of the node structure features and semantic feature association among entities, further reducing the efficiency of graph knowledge completion. A gate-controlled graph sub-hop </span>convolutional network<span> model for knowledge graph completion is proposed to fill these research gaps. Firstly, a graph sub-hop convolutional network based on matrix representation is designed, which can transmit multi-hop neighbor features directly to the encoded node vector to avoid a large loss of features during multi-hop transmission. On this basis, the implicit multi-hop relations are explicitly embedded into the model based on the TransE. In the process of each hop convolution, aiming at the accumulation of noise redundancy caused by the increase of the receptive field, a sub-hop gate mechanism strategy is proposed to filter information. Finally, the linear model is used to decode the encoded nodes and then complete the knowledge graph. We carried out experimental comparison and analysis on WN18RR, FB15k-237, UMLS, and KINSHIP datasets. The results show that the embedding method based on the sub-hop structural information fusion can greatly improve the results of link prediction.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correlation Expert Tuning System for Performance Acceleration 性能加速相关专家调谐系统
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100345
Yanfeng Chai , Jiake Ge , Qiang Zhang , Yunpeng Chai , Xin Wang , Qingpeng Zhang

One configuration can not fit all workloads and diverse resources limitations in modern databases. Auto-tuning methods based on reinforcement learning (RL) normally depend on the exhaustive offline training process with a huge amount of performance measurements, which includes large inefficient knobs combinations under a trial-and-error method. The most time-consuming part of the process is not the RL network training but the performance measurements for acquiring the reward values of target goals like higher throughput or lower latency. In other words, the whole process nearly could be considered as a zero-knowledge method without any experience or rules to constrain it. So we propose a correlation expert tuning system (CXTuning) for acceleration, which contains a correlation knowledge model to remove unnecessary training costs and a multi-instance mechanism (MIM) to support fine-grained tuning for diverse workloads. The models define the importance and correlations among these configuration knobs for the user's specified target. But knobs-based optimization should not be the final destination for auto-tuning. Furthermore, we import an abstracted architectural optimization method into CXTuning as a part of the progressive expert knowledge tuning (PEKT) algorithm. Experiments show that CXTuning can effectively reduce the training time and achieve extra performance promotion compared with the state-of-the-art auto-tuning method.

一种配置不能适应现代数据库中的所有工作负载和各种资源限制。基于强化学习(RL)的自动调优方法通常依赖于具有大量性能测量的穷举离线训练过程,其中包括在试错法下的大量低效旋钮组合。这个过程中最耗时的部分不是强化学习网络的训练,而是获得目标奖励值的性能测量,比如更高的吞吐量或更低的延迟。换句话说,整个过程几乎可以看作是一种没有任何经验和规则约束的零知识方法。因此,我们提出了一个用于加速的相关专家调优系统(CXTuning),该系统包含一个相关知识模型来消除不必要的训练成本,以及一个多实例机制(MIM)来支持针对不同工作负载的细粒度调优。这些模型定义了这些配置旋钮对于用户指定目标的重要性和相关性。但是基于旋钮的优化不应该是自动调优的最终目标。此外,我们将抽象的体系结构优化方法引入到CXTuning中,作为渐进专家知识调优(PEKT)算法的一部分。实验表明,与目前最先进的自动调优方法相比,CXTuning可以有效地减少训练时间,并获得额外的性能提升。
{"title":"Correlation Expert Tuning System for Performance Acceleration","authors":"Yanfeng Chai ,&nbsp;Jiake Ge ,&nbsp;Qiang Zhang ,&nbsp;Yunpeng Chai ,&nbsp;Xin Wang ,&nbsp;Qingpeng Zhang","doi":"10.1016/j.bdr.2022.100345","DOIUrl":"10.1016/j.bdr.2022.100345","url":null,"abstract":"<div><p>One configuration can not fit all workloads and diverse resources limitations in modern databases. Auto-tuning methods based on reinforcement learning (RL) normally depend on the exhaustive offline training process with a huge amount of performance measurements, which includes large inefficient knobs combinations under a trial-and-error method. The most time-consuming part of the process is not the RL network training but the performance measurements for acquiring the reward values of target goals like higher throughput or lower latency. In other words, the whole process nearly could be considered as a zero-knowledge method without any experience or rules to constrain it. So we propose a correlation expert tuning system (CXTuning) for acceleration, which contains a correlation knowledge model to remove unnecessary training costs and a multi-instance mechanism (MIM) to support fine-grained tuning for diverse workloads. The models define the importance and correlations among these configuration knobs for the user's specified target. But knobs-based optimization should not be the final destination for auto-tuning. Furthermore, we import an abstracted architectural optimization method into CXTuning as a part of the progressive expert knowledge tuning (PEKT) algorithm. Experiments show that CXTuning can effectively reduce the training time and achieve extra performance promotion compared with the state-of-the-art auto-tuning method.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579622000399/pdfft?md5=959f53ff5a4e8dcd1c236afdbde633e4&pid=1-s2.0-S2214579622000399-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86236930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Facial Expression Recognition Approach for Social IoT Frameworks 基于社交物联网框架的面部表情识别方法
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100353
Silvio Barra , Sanoar Hossain , Chiara Pero , Saiyed Umer

Social IoT has become a sensitive topic in the last years, mainly due to the attraction of social networks and the related digital activities amongst the population. These techniques are gaining even more importance in the current period, in which digital tools are the only ones allowed to maintain social distancing due to the COVID-19 restrictions. In order to aid patients and elderly people in-home healthcare context, this article explores the usage of facial patient images and emotional detection. In this regard, a Social IoT approach is proposed, which is based on a camera connected home, allowing medical examinations at a distance by keeping posted the preferred contacts of the patient. A facial expression analysis is done to infer the patient's emotional state, thus communicating to the doctor and the emergency contacts any change in the patient's state (pain, suffering, etc.). The proposed facial expression recognition system consists of three main steps: during the image preprocessing phase, face detection and normalization are performed; the feature extraction process involves the computation of discriminative patterns using the Spatial Pyramid Technique; finally, an expression recognition model is built using a multi-class linear Support Vector Machine classifier. The performance of the proposed system has been tested on two challenging benchmarks for facial expression recognition, namely KDEF and GENKI-4K, which show that the proposed system overcomes state-of-the-art methods.

社交物联网在过去几年已经成为一个敏感的话题,主要是由于社交网络和相关数字活动在人群中的吸引力。这些技术在当前时期变得更加重要,因为由于COVID-19的限制,数字工具是唯一允许保持社交距离的工具。为了帮助患者和老年人在家庭医疗保健背景下,本文探讨了面部患者图像和情绪检测的使用。为此,提出了以连接家庭的摄像头为基础,通过公布患者的首选联系人,远距离进行医疗检查的社会物联网方法。通过面部表情分析来推断患者的情绪状态,从而向医生和急救人员传达患者状态(疼痛、痛苦等)的任何变化。所提出的面部表情识别系统包括三个主要步骤:在图像预处理阶段,进行人脸检测和归一化;特征提取过程包括利用空间金字塔技术计算判别模式;最后,利用多类线性支持向量机分类器建立表情识别模型。所提出的系统的性能已经在两个具有挑战性的面部表情识别基准上进行了测试,即KDEF和GENKI-4K,这表明所提出的系统克服了最先进的方法。
{"title":"A Facial Expression Recognition Approach for Social IoT Frameworks","authors":"Silvio Barra ,&nbsp;Sanoar Hossain ,&nbsp;Chiara Pero ,&nbsp;Saiyed Umer","doi":"10.1016/j.bdr.2022.100353","DOIUrl":"10.1016/j.bdr.2022.100353","url":null,"abstract":"<div><p>Social IoT<span> has become a sensitive topic in the last years, mainly due to the attraction of social networks and the related digital activities amongst the population. These techniques are gaining even more importance in the current period, in which digital tools are the only ones allowed to maintain social distancing due to the COVID-19 restrictions. In order to aid patients and elderly people in-home healthcare context, this article explores the usage of facial patient images and emotional detection. In this regard, a Social IoT approach is proposed, which is based on a camera connected home, allowing medical examinations at a distance by keeping posted the preferred contacts of the patient. A facial expression analysis is done to infer the patient's emotional state, thus communicating to the doctor and the emergency contacts any change in the patient's state (pain, suffering, etc.). The proposed facial expression recognition system consists of three main steps: during the image preprocessing phase<span>, face detection and normalization are performed; the feature extraction process involves the computation of discriminative patterns using the Spatial Pyramid Technique; finally, an expression recognition model is built using a multi-class linear Support Vector Machine classifier. The performance of the proposed system has been tested on two challenging benchmarks for facial expression recognition, namely KDEF and GENKI-4K, which show that the proposed system overcomes state-of-the-art methods.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81811615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Segmented PageRank-Based Value Compensation Method for Personal Data in Alliance Blockchains 联盟区块链中基于分段pagerank的个人数据价值补偿方法
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100326
Chaoxia Qin , Bing Guo , Yun Zhang , Omar Cheikhrouhou , Yan Shen , Zhen Zhang , Hong Su

Alliance blockchains provide a multi-party trusted data trading environment, promoting the development of the data trading market in which the value compensation for personal data is still a key issue. However, limited by the data format and content, traditional attempts on data value compensation cannot form a widely applicable solution. Therefore, we propose a universal value compensation method for personal data in alliance blockchains. The basic idea of this method is to evaluate the value weight of data based on the collaborative relationship of data value. First, we construct a Data Collaboration Markov Model (DCMM) to formalize the collaboration network of data value. Then, aiming at data collaboration networks with different structures, the corresponding Segmented PageRank (SPR) algorithm is proposed. SPR can universally evaluate the value weight of each data account without being subjected to the data format or content. Finally, we theoretically deduce that the time complexity and space complexity of SPR algorithm are respectively 1/K and 1/K2 taken by PageRank algorithm. Experiments show the feasibility and superior performance of SPR.

联盟区块链提供了多方可信的数据交易环境,促进了个人数据价值补偿仍是关键问题的数据交易市场的发展。然而,由于数据格式和内容的限制,传统的数据价值补偿尝试无法形成一种广泛适用的解决方案。因此,我们提出了一种联盟区块链中个人数据的通用价值补偿方法。该方法的基本思想是基于数据价值的协同关系来评估数据的价值权重。首先,构建数据协作马尔可夫模型(DCMM)来形式化数据价值的协作网络。然后,针对不同结构的数据协作网络,提出了相应的分段PageRank (SPR)算法。SPR可以在不受数据格式和内容限制的情况下,统一评价每个数据账户的价值权重。最后,我们从理论上推导出SPR算法的时间复杂度和空间复杂度分别为PageRank算法的1/K和1/K2。实验证明了SPR的可行性和优越的性能。
{"title":"A Segmented PageRank-Based Value Compensation Method for Personal Data in Alliance Blockchains","authors":"Chaoxia Qin ,&nbsp;Bing Guo ,&nbsp;Yun Zhang ,&nbsp;Omar Cheikhrouhou ,&nbsp;Yan Shen ,&nbsp;Zhen Zhang ,&nbsp;Hong Su","doi":"10.1016/j.bdr.2022.100326","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100326","url":null,"abstract":"<div><p><span>Alliance blockchains<span><span> provide a multi-party trusted data trading environment, promoting the development of the data trading market in which the value compensation for personal data is still a key issue. However, limited by the data format and content, traditional attempts on data value compensation cannot form a widely applicable solution. Therefore, we propose a universal value compensation method for personal data in alliance blockchains. The basic idea of this method is to evaluate the value weight of data based on the </span>collaborative relationship of data value. First, we construct a Data Collaboration Markov Model (DCMM) to formalize the collaboration network of data value. Then, aiming at data collaboration networks with different structures, the corresponding Segmented PageRank (SPR) algorithm is proposed. SPR can universally evaluate the value weight of each data account without being subjected to the data format or content. Finally, we theoretically deduce that the time complexity and space complexity of SPR algorithm are respectively </span></span><span><math><mn>1</mn><mo>/</mo><mi>K</mi></math></span> and <span><math><mn>1</mn><mo>/</mo><msup><mrow><mi>K</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span><span> taken by PageRank algorithm. Experiments show the feasibility and superior performance of SPR.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evaluating Standard Feature Sets Towards Increased Generalisability and Explainability of ML-Based Network Intrusion Detection 评估标准特征集以提高基于ml的网络入侵检测的通用性和可解释性
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100359
Mohanad Sarhan, Siamak Layeghy, Marius Portmann

Machine Learning (ML)-based network intrusion detection systems bring many benefits for enhancing the cybersecurity posture of an organisation. Many systems have been designed and developed in the research community, often achieving a close to perfect detection rate when evaluated using synthetic datasets. However, there are ongoing challenges with the development and evaluation of ML-based NIDSs; the limited ability of comprehensive evaluation of ML models and lack of understanding of internal ML operations. This paper overcomes the challenges by evaluating and explaining the generalisability of a common feature set to different network environments and attack scenarios. Two feature sets (NetFlow and CICFlowMeter) have been evaluated in terms of detection accuracy across three key datasets, i.e., CSE-CIC-IDS2018, BoT-IoT, and ToN-IoT. The results show the superiority of the NetFlow feature set in enhancing the ML model's detection accuracy of various network attacks. In addition, due to the complexity of the learning models, SHapley Additive exPlanations (SHAP), an explainable AI methodology, has been adopted to explain and interpret the achieved classification decisions of ML models. The Shapley values of two common feature sets have been analysed across multiple datasets to determine the influence contributed by each feature towards the final ML prediction.

基于机器学习(ML)的网络入侵检测系统为提高组织的网络安全态势带来了许多好处。在研究界已经设计和开发了许多系统,当使用合成数据集进行评估时,通常达到接近完美的检出率。然而,基于ml的nids的开发和评估仍然存在挑战;对机器学习模型的综合评价能力有限,对机器学习内部操作缺乏了解。本文通过评估和解释通用特征集在不同网络环境和攻击场景中的通用性,克服了这些挑战。两个功能集(NetFlow和CICFlowMeter)在三个关键数据集(即CSE-CIC-IDS2018, BoT-IoT和ToN-IoT)的检测精度方面进行了评估。结果表明,NetFlow特征集在提高机器学习模型对各种网络攻击的检测精度方面具有优势。此外,由于学习模型的复杂性,采用SHapley Additive exPlanations (SHAP)这一可解释的AI方法来解释和解释ML模型的分类决策。在多个数据集上分析了两个常见特征集的Shapley值,以确定每个特征对最终ML预测的影响。
{"title":"Evaluating Standard Feature Sets Towards Increased Generalisability and Explainability of ML-Based Network Intrusion Detection","authors":"Mohanad Sarhan,&nbsp;Siamak Layeghy,&nbsp;Marius Portmann","doi":"10.1016/j.bdr.2022.100359","DOIUrl":"10.1016/j.bdr.2022.100359","url":null,"abstract":"<div><p>Machine Learning<span><span> (ML)-based network intrusion detection systems bring many benefits for enhancing the cybersecurity posture of an organisation. Many systems have been designed and developed in the research community, often achieving a close to perfect detection rate when evaluated using synthetic datasets. However, there are ongoing challenges with the development and evaluation of ML-based NIDSs; the limited ability of comprehensive evaluation of ML models and lack of understanding of internal ML operations. This paper overcomes the challenges by evaluating and explaining the generalisability of a common feature set to different network environments and attack scenarios. Two feature sets (NetFlow and CICFlowMeter) have been evaluated in terms of detection accuracy across three key datasets, i.e., CSE-CIC-IDS2018, BoT-IoT, and ToN-IoT. The results show the superiority of the NetFlow feature set in enhancing the ML model's detection accuracy of various network attacks. In addition, due to the complexity of the learning models, SHapley Additive exPlanations (SHAP), an </span>explainable AI methodology, has been adopted to explain and interpret the achieved classification decisions of ML models. The Shapley values of two common feature sets have been analysed across multiple datasets to determine the influence contributed by each feature towards the final ML prediction.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90904283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Linked Open Government Data to Predict and Explain House Prices: The Case of Scottish Statistics Portal 链接开放政府数据预测和解释房价:苏格兰统计门户的案例
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100355
Areti Karamanou, Evangelos Kalampokis, Konstantinos Tarabanis

Accurately estimating the prices of houses is important for various stakeholders including house owners, real estate agencies, government agencies, and policy-makers. Towards this end, traditional statistics and, only recently, advanced machine learning and artificial intelligence models are used. Open Government Data (OGD) have a huge potential especially when combined with AI technologies. OGD are often published as linked data to facilitate data integration and re-usability. EXplainable Artificial Intelligence (XAI) can be used by stakeholders to understand the decisions of a predictive model. This work creates a model that predicts house prices by applying machine learning on linked OGD. We present a case study that uses XGBoost, a powerful machine learning algorithm, and linked OGD from the official Scottish data portal to predict the probability the mean prices of houses in the various data zones of Scotland to be higher than the average price in Scotland. XAI is also used to globally and locally explain the decisions of the model. The created model has Receiver Operating Characteristic (ROC) AUC score 0.923 and Precision Recall Curve (PRC) AUC score 0.891. According to XAI, the variable that mostly affects the decisions of the model is Comparative Illness Factor, an indicator of health conditions. However, local explainability shows that the decisions made in some data zones may be mostly affected by other variables such as the percent of detached dwellings and employment deprived population.

准确估计房价对于包括房主、房地产中介、政府机构和政策制定者在内的各种利益相关者都很重要。为此,传统的统计数据以及最近才开始使用的先进机器学习和人工智能模型得到了应用。开放政府数据(OGD)具有巨大的潜力,特别是与人工智能技术相结合时。OGD通常作为链接数据发布,以促进数据集成和可重用性。可解释的人工智能(XAI)可以被利益相关者用来理解预测模型的决策。这项工作创建了一个模型,通过在关联的OGD上应用机器学习来预测房价。我们提出了一个案例研究,使用XGBoost,一种强大的机器学习算法,并从苏格兰官方数据门户网站链接OGD来预测苏格兰各个数据区域的房屋平均价格高于苏格兰平均价格的概率。XAI还用于全局和局部解释模型的决策。该模型的Receiver Operating Characteristic (ROC) AUC得分为0.923,Precision Recall Curve (PRC) AUC得分为0.891。根据XAI的说法,影响模型决策的主要变量是比较疾病因子,这是健康状况的一个指标。然而,地方可解释性表明,在一些数据区做出的决定可能主要受到其他变量的影响,如独立住宅的百分比和失业人口。
{"title":"Linked Open Government Data to Predict and Explain House Prices: The Case of Scottish Statistics Portal","authors":"Areti Karamanou,&nbsp;Evangelos Kalampokis,&nbsp;Konstantinos Tarabanis","doi":"10.1016/j.bdr.2022.100355","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100355","url":null,"abstract":"<div><p>Accurately estimating the prices of houses is important for various stakeholders including house owners, real estate agencies, government agencies, and policy-makers. Towards this end, traditional statistics and, only recently, advanced machine learning<span><span> and artificial intelligence<span> models are used. Open Government Data (OGD) have a huge potential especially when combined with AI technologies. OGD are often published as linked data to facilitate data integration and re-usability. </span></span>EXplainable Artificial Intelligence<span><span> (XAI) can be used by stakeholders to understand the decisions of a predictive model. This work creates a model that predicts house prices by applying machine learning on linked OGD. We present a case study that uses XGBoost, a powerful </span>machine learning algorithm, and linked OGD from the official Scottish data portal to predict the probability the mean prices of houses in the various data zones of Scotland to be higher than the average price in Scotland. XAI is also used to globally and locally explain the decisions of the model. The created model has Receiver Operating Characteristic (ROC) AUC score 0.923 and Precision Recall Curve (PRC) AUC score 0.891. According to XAI, the variable that mostly affects the decisions of the model is Comparative Illness Factor, an indicator of health conditions. However, local explainability shows that the decisions made in some data zones may be mostly affected by other variables such as the percent of detached dwellings and employment deprived population.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89991696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Satellite IoT Based Road Extraction from VHR Images Through Superpixel-CNN Architecture 基于卫星物联网的超像素cnn结构VHR图像道路提取
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100334
Tanmay Kumar Behera , Pankaj Kumar Sa , Michele Nappi , Sambit Bakshi

In the past few decades, technology has progressively become ineluctable in human lives, primarily due to the growth of certain fields like space technology, Big Data, the Internet of Things (IoT), and machine learning. Space technology has revolutionized communication mechanisms while creating opportunities for various research areas, including remote sensing (RS)-inspired applications. On the other hand, IoT presents a platform to use the power of the internet over a whole range of devices through a phenomenon known as social IoT. These devices generate a humongous amount of data that requires handling and managing by big data technology incorporated with deep learning techniques to reduce the manual workload of an operator. Moreover, deep learning architectures like convolutional neural networks (CNNs) have presented a scope to extract the underlying features from the large-scale input images in providing better solutions for tasks such as automatic road detection that come at the cost of time and memory overhead. In this context, we have proposed a three-layer edge-fog-cloud-based intelligent satellite IoT architecture that uses the superpixel-based CNN approach. At the fog layer, the superpixel-based simple linear iterative cluster (SLIC) algorithm uses the images captured by the satellites of the edge level to produce the smaller-sized superpixel images that can be transferred even in a low bandwidth link. The CNN module at the cloud level is then trained with these superpixel images to predict the road networks from these RS images. Two popular road datasets: the DeepGlobe Road dataset and the Massachusetts Road dataset, have been considered to prove the usefulness of the proposed SLIC-CNN architecture in satellite-based IoT platforms to address the problems like RS image-based road extraction. The proposed architecture achieves better performance accuracy than the classical CNN while reducing the incurred overhead by a noticeable limit.

在过去的几十年里,技术已经逐渐成为人类生活中不可避免的一部分,这主要是由于空间技术、大数据、物联网(IoT)和机器学习等领域的发展。空间技术彻底改变了通信机制,同时为各种研究领域创造了机会,包括遥感(RS)启发的应用。另一方面,物联网提供了一个平台,通过一种被称为社交物联网的现象,在一系列设备上使用互联网的力量。这些设备产生了大量的数据,需要通过结合深度学习技术的大数据技术来处理和管理,以减少操作人员的手工工作量。此外,像卷积神经网络(cnn)这样的深度学习架构已经提供了从大规模输入图像中提取底层特征的范围,为以时间和内存开销为代价的自动道路检测等任务提供了更好的解决方案。在此背景下,我们提出了一种基于边缘雾云的三层智能卫星物联网架构,该架构采用基于超像素的CNN方法。在雾层,基于超像素的简单线性迭代聚类(SLIC)算法利用边缘级卫星捕获的图像生成小尺寸的超像素图像,即使在低带宽链路上也可以传输。然后使用这些超像素图像对云级CNN模块进行训练,以根据这些RS图像预测道路网络。两个流行的道路数据集:DeepGlobe道路数据集和马萨诸塞州道路数据集,被认为证明了所提出的SLIC-CNN架构在基于卫星的物联网平台上的有用性,可以解决基于RS图像的道路提取等问题。所提出的体系结构比经典的CNN具有更好的性能精度,同时显著减少了开销。
{"title":"Satellite IoT Based Road Extraction from VHR Images Through Superpixel-CNN Architecture","authors":"Tanmay Kumar Behera ,&nbsp;Pankaj Kumar Sa ,&nbsp;Michele Nappi ,&nbsp;Sambit Bakshi","doi":"10.1016/j.bdr.2022.100334","DOIUrl":"10.1016/j.bdr.2022.100334","url":null,"abstract":"<div><p><span>In the past few decades, technology has progressively become ineluctable in human lives, primarily due to the growth of certain fields like space technology, Big Data, the Internet of Things<span><span> (IoT), and machine learning. Space technology has revolutionized communication mechanisms while creating opportunities for various research areas, including remote sensing (RS)-inspired applications. On the other hand, IoT presents a platform to use the power of the internet over a whole range of devices through a phenomenon known as social IoT. These devices generate a humongous amount of data that requires handling and managing by big data technology incorporated with </span>deep learning techniques<span><span> to reduce the manual workload of an operator. Moreover, deep learning architectures like </span>convolutional neural networks<span><span> (CNNs) have presented a scope to extract the underlying features from the large-scale input images in providing better solutions for tasks such as automatic road detection that come at the cost of time and memory overhead. In this context, we have proposed a three-layer edge-fog-cloud-based intelligent satellite IoT architecture that uses the superpixel-based CNN approach. At the fog layer, the superpixel-based simple linear iterative cluster (SLIC) algorithm uses the images captured by the satellites of the edge level to produce the smaller-sized </span>superpixel<span> images that can be transferred even in a low bandwidth link. The CNN module at the cloud level is then trained with these superpixel images to predict the road networks from these </span></span></span></span></span>RS images. Two popular road datasets: the DeepGlobe Road dataset and the Massachusetts Road dataset, have been considered to prove the usefulness of the proposed SLIC-CNN architecture in satellite-based IoT platforms to address the problems like RS image-based road extraction. The proposed architecture achieves better performance accuracy than the classical CNN while reducing the incurred overhead by a noticeable limit.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88275113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Prediction of T2/T3 Staging of Rectal Cancer Based on Radiomics and Machine Learning 基于放射组学和机器学习的直肠癌T2/T3分期自动预测
IF 3.3 3区 计算机科学 Q1 Business, Management and Accounting Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100346
Xinhong Zhang , Boyan Zhang , Binjie Wang , Fan Zhang

The staging of rectal cancer is very important to determine the treatment plans. This study investigated the relationship between the imaging features and the rectal cancer staging, so that the staging of rectal cancer can be automatically predicted based on the imaging features. A total of 81 patients who underwent with T2 or T3 stage rectal cancer from April 2018 to March 2019 were included. Firstly, tumor was labeled by the radiologist to outline the ROI (region of interest) in the high-resolution MRI images. Then the ROI was segmented by FCNN model and MedicalNet model. Secondly, features of the ROI were extracted by radiomics method. Thirdly, the key features were screened out from large number of features. Finally, a machine learning model was trained to predict rectal cancer stage. Two machine learning tools, back-projected neural network (BPNN) and support vector machine method (SVM) were used for the T2/T3 staging prediction of rectal cancer. The accuracy of our methods was 88.2%∼90.5% in the testing dataset, with a confidence interval of 95%, the sensitivity was 90.8%∼91.2%, the specificity was 85.9%∼87.6%, which were better than the traditional method. The area under the curve (AUC) of the BPNN method was 0.81 ± 0.01, which had better prediction performance than the SVM method (AUC = 0.75 ± 0.03). Some of the radiomics features have a significant relationship with the T2/T3 stage of rectal cancer, so it is possible to effectively predict the T2/T3 stage of rectal cancer using the selected radiomics features and machine learning methods.

直肠癌的分期是决定治疗方案的重要因素。本研究探讨了影像学特征与直肠癌分期的关系,以便根据影像学特征自动预测直肠癌的分期。2018年4月至2019年3月,共有81名患者接受了T2或T3期直肠癌。首先,放射科医生对肿瘤进行标记,勾勒出高分辨率MRI图像中的感兴趣区域(ROI)。然后利用FCNN模型和MedicalNet模型对ROI进行分割。其次,利用放射组学方法提取感兴趣区域的特征;第三,从大量特征中筛选出关键特征。最后,训练一个机器学习模型来预测直肠癌的分期。采用反投影神经网络(BPNN)和支持向量机方法(SVM)两种机器学习工具进行直肠癌T2/T3分期预测。我们的方法在测试数据集中准确率为88.2% ~ 90.5%,置信区间为95%,灵敏度为90.8% ~ 91.2%,特异性为85.9% ~ 87.6%,优于传统方法。BPNN方法的曲线下面积(AUC)为0.81±0.01,预测效果优于SVM方法(AUC = 0.75±0.03)。部分放射组学特征与直肠癌T2/T3分期有显著关系,因此选择放射组学特征并结合机器学习方法有效预测直肠癌T2/T3分期是可能的。
{"title":"Automatic Prediction of T2/T3 Staging of Rectal Cancer Based on Radiomics and Machine Learning","authors":"Xinhong Zhang ,&nbsp;Boyan Zhang ,&nbsp;Binjie Wang ,&nbsp;Fan Zhang","doi":"10.1016/j.bdr.2022.100346","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100346","url":null,"abstract":"<div><p><span>The staging of rectal cancer is very important to determine the treatment plans. This study investigated the relationship between the imaging features and the rectal cancer staging, so that the staging of rectal cancer can be automatically predicted based on the imaging features. A total of 81 patients who underwent with T2 or T3 stage rectal cancer from April 2018 to March 2019 were included. Firstly, tumor was labeled by the radiologist to outline the ROI (region of interest) in the high-resolution MRI images. Then the ROI was segmented by FCNN model and MedicalNet model. Secondly, features of the ROI were extracted by radiomics method. Thirdly, the key features were screened out from large number of features. Finally, a </span>machine learning<span><span> model was trained to predict rectal cancer stage. Two machine learning tools, back-projected neural network (BPNN) and </span>support vector machine method (SVM) were used for the T2/T3 staging prediction of rectal cancer. The accuracy of our methods was 88.2%∼90.5% in the testing dataset, with a confidence interval of 95%, the sensitivity was 90.8%∼91.2%, the specificity was 85.9%∼87.6%, which were better than the traditional method. The area under the curve (AUC) of the BPNN method was 0.81 ± 0.01, which had better prediction performance than the SVM method (AUC = 0.75 ± 0.03). Some of the radiomics features have a significant relationship with the T2/T3 stage of rectal cancer, so it is possible to effectively predict the T2/T3 stage of rectal cancer using the selected radiomics features and machine learning methods.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89991697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Big Data Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1