Pub Date : 2024-08-01Epub Date: 2023-03-22DOI: 10.1089/big.2022.0050
Vijay Srinivas Tida, Sonya Hsu, Xiali Hei
An efficient fake news detector becomes essential as the accessibility of social media platforms increases rapidly. Previous studies mainly focused on designing the models solely based on individual data sets and might suffer from degradable performance. Therefore, developing a robust model for a combined data set with diverse knowledge becomes crucial. However, designing the model with a combined data set requires extensive training time and sequential workload to obtain optimal performance without having some prior knowledge about the model's parameters. The presented study here will help solve these issues by introducing the unified training strategy to have a base structure for the classifier and all hyperparameters from individual models using a pretrained transformer model. The performance of the proposed model is noted using three publicly available data sets, namely ISOT and others from the Kaggle website. The results indicate that the proposed unified training strategy surpassed the existing models such as Random Forests, convolutional neural networks, and long short-term memory, with 97% accuracy and achieved the F1 score of 0.97. Furthermore, there was a significant reduction in training time by almost 1.5 to 1.8 × by removing words lower than three letters from the input samples. We also did extensive performance analysis by varying the number of encoder blocks to build compact models and trained on the combined data set. We justify that reducing encoder blocks resulted in lower performance from the obtained results.
{"title":"A Unified Training Process for Fake News Detection Based on Finetuned Bidirectional Encoder Representation from Transformers Model.","authors":"Vijay Srinivas Tida, Sonya Hsu, Xiali Hei","doi":"10.1089/big.2022.0050","DOIUrl":"10.1089/big.2022.0050","url":null,"abstract":"<p><p>An efficient fake news detector becomes essential as the accessibility of social media platforms increases rapidly. Previous studies mainly focused on designing the models solely based on individual data sets and might suffer from degradable performance. Therefore, developing a robust model for a combined data set with diverse knowledge becomes crucial. However, designing the model with a combined data set requires extensive training time and sequential workload to obtain optimal performance without having some prior knowledge about the model's parameters. The presented study here will help solve these issues by introducing the unified training strategy to have a base structure for the classifier and all hyperparameters from individual models using a pretrained transformer model. The performance of the proposed model is noted using three publicly available data sets, namely ISOT and others from the Kaggle website. The results indicate that the proposed unified training strategy surpassed the existing models such as Random Forests, convolutional neural networks, and long short-term memory, with 97% accuracy and achieved the F1 score of 0.97. Furthermore, there was a significant reduction in training time by almost 1.5 to 1.8 × by removing words lower than three letters from the input samples. We also did extensive performance analysis by varying the number of encoder blocks to build compact models and trained on the combined data set. We justify that reducing encoder blocks resulted in lower performance from the obtained results.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"331-342"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9150389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2023-09-04DOI: 10.1089/big.2022.0086
Derya Turfan, Bulent Altunkaynak, Özgür Yeniay
Over the years, many studies have been carried out to reduce and eliminate the effects of diseases on human health. Gene expression data sets play a critical role in diagnosing and treating diseases. These data sets consist of thousands of genes and a small number of sample sizes. This situation creates the curse of dimensionality and it becomes problematic to analyze such data sets. One of the most effective strategies to solve this problem is feature selection methods. Feature selection is a preprocessing step to improve classification performance by selecting the most relevant and informative features while increasing the accuracy of classification. In this article, we propose a new statistically based filter method for the feature selection approach named Effective Range-based Feature Selection Algorithm (FSAER). As an extension of the previous Effective Range based Gene Selection (ERGS) and Improved Feature Selection based on Effective Range (IFSER) algorithms, our novel method includes the advantages of both methods while taking into account the disjoint area. To illustrate the efficacy of the proposed algorithm, the experiments have been conducted on six benchmark gene expression data sets. The results of the FSAER and the other filter methods have been compared in terms of classification accuracies to demonstrate the effectiveness of the proposed method. For classification methods, support vector machines, naive Bayes classifier, and k-nearest neighbor algorithms have been used.
多年来,为了减少和消除疾病对人类健康的影响,人们开展了许多研究。基因表达数据集在诊断和治疗疾病方面发挥着至关重要的作用。这些数据集由数千个基因和少量样本组成。这种情况造成了 "维度诅咒",使分析这类数据集成为难题。解决这一问题的最有效策略之一就是特征选择方法。特征选择是一种预处理步骤,通过选择最相关、信息量最大的特征来提高分类性能,同时提高分类的准确性。在本文中,我们为特征选择方法提出了一种新的基于统计的过滤方法,命名为基于有效范围的特征选择算法(FSAER)。作为之前基于有效范围的基因选择算法(ERGS)和基于有效范围的改进特征选择算法(IFSER)的扩展,我们的新方法既包含了这两种方法的优点,又考虑到了不相交区域。为了说明所提算法的有效性,我们在六个基准基因表达数据集上进行了实验。通过比较 FSAER 和其他滤波方法的分类准确率,证明了所提方法的有效性。在分类方法中,使用了支持向量机、天真贝叶斯分类器和 k 近邻算法。
{"title":"A New Filter Approach Based on Effective Ranges for Classification of Gene Expression Data.","authors":"Derya Turfan, Bulent Altunkaynak, Özgür Yeniay","doi":"10.1089/big.2022.0086","DOIUrl":"10.1089/big.2022.0086","url":null,"abstract":"<p><p>Over the years, many studies have been carried out to reduce and eliminate the effects of diseases on human health. Gene expression data sets play a critical role in diagnosing and treating diseases. These data sets consist of thousands of genes and a small number of sample sizes. This situation creates the curse of dimensionality and it becomes problematic to analyze such data sets. One of the most effective strategies to solve this problem is feature selection methods. Feature selection is a preprocessing step to improve classification performance by selecting the most relevant and informative features while increasing the accuracy of classification. In this article, we propose a new statistically based filter method for the feature selection approach named Effective Range-based Feature Selection Algorithm (FSAER). As an extension of the previous Effective Range based Gene Selection (ERGS) and Improved Feature Selection based on Effective Range (IFSER) algorithms, our novel method includes the advantages of both methods while taking into account the disjoint area. To illustrate the efficacy of the proposed algorithm, the experiments have been conducted on six benchmark gene expression data sets. The results of the FSAER and the other filter methods have been compared in terms of classification accuracies to demonstrate the effectiveness of the proposed method. For classification methods, support vector machines, naive Bayes classifier, and k-nearest neighbor algorithms have been used.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"312-330"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10211345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2023-03-07DOI: 10.1089/big.2022.0120
Enes Gul, Mir Jafar Sadegh Safari
Sediment transport modeling is an important problem to minimize sedimentation in open channels that could lead to unexpected operation expenses. From an engineering perspective, the development of accurate models based on effective variables involved for flow velocity computation could provide a reliable solution in channel design. Furthermore, validity of sediment transport models is linked to the range of data used for the model development. Existing design models were established on the limited data ranges. Thus, the present study aimed to utilize all experimental data available in the literature, including recently published datasets that covered an extensive range of hydraulic properties. Extreme learning machine (ELM) algorithm and generalized regularized extreme learning machine (GRELM) were implemented for the modeling, and then, particle swarm optimization (PSO) and gradient-based optimizer (GBO) were utilized for the hybridization of ELM and GRELM. GRELM-PSO and GRELM-GBO findings were compared to the standalone ELM, GRELM, and existing regression models to determine their accurate computations. The analysis of the models demonstrated the robustness of the models that incorporate channel parameter. The poor results of some existing regression models seem to be linked to the disregarding of the channel parameter. Statistical analysis of the model outcomes illustrated the outperformance of GRELM-GBO in contrast to the ELM, GRELM, GRELM-PSO, and regression models, although GRELM-GBO performed slightly better when compared to the GRELM-PSO counterpart. It was found that the mean accuracy of GRELM-GBO was 18.5% better when compared to the best regression model. The promising findings of the current study not only may encourage the use of recommended algorithms for channel design in practice but also may further the application of novel ELM-based methods in alternative environmental problems.
泥沙输运模型是一个重要问题,可最大限度地减少明渠中的泥沙淤积,从而减少意外的运行费用。从工程角度看,根据流速计算所涉及的有效变量开发精确模型,可为渠道设计提供可靠的解决方案。此外,泥沙输运模型的有效性与模型开发所使用的数据范围有关。现有的设计模型是在有限的数据范围内建立的。因此,本研究旨在利用文献中的所有实验数据,包括最近发表的涵盖广泛水力特性的数据集。在建模过程中采用了极限学习机(ELM)算法和广义正则化极限学习机(GRELM),然后利用粒子群优化(PSO)和基于梯度的优化器(GBO)对 ELM 和 GRELM 进行混合。GRELM-PSO 和 GRELM-GBO 的结果与独立的 ELM、GRELM 和现有回归模型进行了比较,以确定其计算的准确性。对模型的分析表明,包含信道参数的模型具有稳健性。一些现有回归模型的结果不佳,似乎与忽略信道参数有关。对模型结果的统计分析表明,GRELM-GBO 的性能优于 ELM、GRELM、GRELM-PSO 和回归模型,但 GRELM-GBO 的性能略高于 GRELM-PSO。研究发现,与最佳回归模型相比,GRELM-GBO 的平均准确率高出 18.5%。当前研究的良好结果不仅可以鼓励在实践中使用推荐算法进行通道设计,还可以进一步推动基于 ELM 的新型方法在其他环境问题中的应用。
{"title":"Hybrid Generalized Regularized Extreme Learning Machine Through Gradient-Based Optimizer Model for Self-Cleansing Nondeposition with Clean Bed Mode of Sediment Transport.","authors":"Enes Gul, Mir Jafar Sadegh Safari","doi":"10.1089/big.2022.0120","DOIUrl":"10.1089/big.2022.0120","url":null,"abstract":"<p><p>Sediment transport modeling is an important problem to minimize sedimentation in open channels that could lead to unexpected operation expenses. From an engineering perspective, the development of accurate models based on effective variables involved for flow velocity computation could provide a reliable solution in channel design. Furthermore, validity of sediment transport models is linked to the range of data used for the model development. Existing design models were established on the limited data ranges. Thus, the present study aimed to utilize all experimental data available in the literature, including recently published datasets that covered an extensive range of hydraulic properties. Extreme learning machine (ELM) algorithm and generalized regularized extreme learning machine (GRELM) were implemented for the modeling, and then, particle swarm optimization (PSO) and gradient-based optimizer (GBO) were utilized for the hybridization of ELM and GRELM. GRELM-PSO and GRELM-GBO findings were compared to the standalone ELM, GRELM, and existing regression models to determine their accurate computations. The analysis of the models demonstrated the robustness of the models that incorporate channel parameter. The poor results of some existing regression models seem to be linked to the disregarding of the channel parameter. Statistical analysis of the model outcomes illustrated the outperformance of GRELM-GBO in contrast to the ELM, GRELM, GRELM-PSO, and regression models, although GRELM-GBO performed slightly better when compared to the GRELM-PSO counterpart. It was found that the mean accuracy of GRELM-GBO was 18.5% better when compared to the best regression model. The promising findings of the current study not only may encourage the use of recommended algorithms for channel design in practice but also may further the application of novel ELM-based methods in alternative environmental problems.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"282-298"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10861174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2023-03-14DOI: 10.1089/big.2022.0125
Babak Vaheddoost, Shervin Rahimzadeh Arashloo, Mir Jafar Sadegh Safari
A joint determination of horizontal and vertical movement of water through porous medium is addressed in this study through fast multi-output relevance vector regression (FMRVR). To do this, an experimental data set conducted in a sand box with 300 × 300 × 150 mm dimensions made of Plexiglas is used. A random mixture of sand having size of 0.5-1 mm is used to simulate the porous medium. Within the experiments, 2, 3, 7, and 12 cm walls are used together with different injection locations as 130.7, 91.3, and 51.8 mm measured from the cutoff wall at the upstream. Then, the Cartesian coordinated of the tracer, time interval, length of the wall in each setup, and two dummy variables for determination of the initial point are considered as independent variables for joint estimation of horizontal and vertical velocity of water movement in the porous medium. Alternatively, the multi-linear regression, random forest, and the support vector regression approaches are used to alternate the results obtained by the FMRVR method. It was concluded that the FMRVR outperforms the other models, while the uncertainty in estimation of horizontal penetration is larger than the vertical one.
{"title":"Vertical and Horizontal Water Penetration Velocity Modeling in Nonhomogenous Soil Using Fast Multi-Output Relevance Vector Regression.","authors":"Babak Vaheddoost, Shervin Rahimzadeh Arashloo, Mir Jafar Sadegh Safari","doi":"10.1089/big.2022.0125","DOIUrl":"10.1089/big.2022.0125","url":null,"abstract":"<p><p>A joint determination of horizontal and vertical movement of water through porous medium is addressed in this study through fast multi-output relevance vector regression (FMRVR). To do this, an experimental data set conducted in a sand box with 300 × 300 × 150 mm dimensions made of Plexiglas is used. A random mixture of sand having size of 0.5-1 mm is used to simulate the porous medium. Within the experiments, 2, 3, 7, and 12 cm walls are used together with different injection locations as 130.7, 91.3, and 51.8 mm measured from the cutoff wall at the upstream. Then, the Cartesian coordinated of the tracer, time interval, length of the wall in each setup, and two dummy variables for determination of the initial point are considered as independent variables for joint estimation of horizontal and vertical velocity of water movement in the porous medium. Alternatively, the multi-linear regression, random forest, and the support vector regression approaches are used to alternate the results obtained by the FMRVR method. It was concluded that the FMRVR outperforms the other models, while the uncertainty in estimation of horizontal penetration is larger than the vertical one.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"299-311"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9105192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2023-04-24DOI: 10.1089/big.2022.0124
Golsa Mahdavi, Mohammad Amin Hariri-Ardebili
In material science and engineering, the estimation of material properties and their failure modes is associated with physical experiments followed by modeling and optimization. However, proper optimization is challenging and computationally expensive. The main reason is the highly nonlinear behavior of brittle materials such as concrete. In this study, the application of surrogate models to predict the mechanical characteristics of concrete is investigated. Specifically, meta-models such as polynomial chaos expansion, Kriging, and canonical low-rank approximation are used for predicting the compressive strength of two different types of concrete (collected from experimental data in the literature). Various assumptions in surrogate models are examined, and the accuracy of each one is evaluated for the problem at hand. Finally, the optimal solution is provided. This study paves the road for other applications of surrogate models in material science and engineering.
{"title":"Kriging, Polynomial Chaos Expansion, and Low-Rank Approximations in Material Science and Big Data Analytics.","authors":"Golsa Mahdavi, Mohammad Amin Hariri-Ardebili","doi":"10.1089/big.2022.0124","DOIUrl":"10.1089/big.2022.0124","url":null,"abstract":"<p><p>In material science and engineering, the estimation of material properties and their failure modes is associated with physical experiments followed by modeling and optimization. However, proper optimization is challenging and computationally expensive. The main reason is the highly nonlinear behavior of brittle materials such as concrete. In this study, the application of surrogate models to predict the mechanical characteristics of concrete is investigated. Specifically, meta-models such as polynomial chaos expansion, Kriging, and canonical low-rank approximation are used for predicting the compressive strength of two different types of concrete (collected from experimental data in the literature). Various assumptions in surrogate models are examined, and the accuracy of each one is evaluated for the problem at hand. Finally, the optimal solution is provided. This study paves the road for other applications of surrogate models in material science and engineering.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"270-281"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9446353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-04-19DOI: 10.1089/big.2022.0278
Chandu Thota, Constandinos X Mavromoustakis, George Mastorakis
The reliability in medical data organization and transmission is eased with the inheritance of information and communication technologies in recent years. The growth of digital communication and sharing medium imposes the necessity for optimizing the accessibility and transmission of sensitive medical data to the end-users. In this article, the Preemptive Information Transmission Model (PITM) is introduced for improving the promptness in medical data delivery. This transmission model is designed to acquire the least communication in an epidemic region for seamless information availability. The proposed model makes use of a noncyclic connection procedure and preemptive forwarding inside and outside the epidemic region. The first is responsible for replication-less connection maximization ensuring better availability of the edge nodes. The connection replications are reduced using the pruning tree classifiers based on the communication time and delivery balancing factor. The later process is responsible for the reliable forwarding of the acquired data using a conditional selection of the infrastructure units. Both the processes of PITM are accountable for improving the delivery of observed medical data, over better transmissions, communication time, and achieving fewer delays.
{"title":"Preemptive Epidemic Information Transmission Model Using Nonreplication Edge Node Connectivity in Health Care Networks.","authors":"Chandu Thota, Constandinos X Mavromoustakis, George Mastorakis","doi":"10.1089/big.2022.0278","DOIUrl":"10.1089/big.2022.0278","url":null,"abstract":"<p><p>The reliability in medical data organization and transmission is eased with the inheritance of information and communication technologies in recent years. The growth of digital communication and sharing medium imposes the necessity for optimizing the accessibility and transmission of sensitive medical data to the end-users. In this article, the Preemptive Information Transmission Model (PITM) is introduced for improving the promptness in medical data delivery. This transmission model is designed to acquire the least communication in an epidemic region for seamless information availability. The proposed model makes use of a noncyclic connection procedure and preemptive forwarding inside and outside the epidemic region. The first is responsible for replication-less connection maximization ensuring better availability of the edge nodes. The connection replications are reduced using the pruning tree classifiers based on the communication time and delivery balancing factor. The later process is responsible for the reliable forwarding of the acquired data using a conditional selection of the infrastructure units. Both the processes of PITM are accountable for improving the delivery of observed medical data, over better transmissions, communication time, and achieving fewer delays.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"141-154"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9440721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-02-27DOI: 10.1089/big.2022.0029
Jie Huang, Cheng Xu, Zhaohua Ji, Shan Xiao, Teng Liu, Nan Ma, Qinghui Zhou
Car networking systems based on 5G-V2X (vehicle-to-everything) have high requirements for reliability and low-latency communication to further improve communication performance. In the V2X scenario, this article establishes an extended model (basic expansion model) suitable for high-speed mobile scenarios based on the sparsity of the channel impulse response. And propose a channel estimation algorithm based on deep learning, the method designed a multilayer convolutional neural network to complete frequency domain interpolation. A two-way control cycle gating unit (bidirectional gated recurrent unit) is designed to predict the state in the time domain. And introduce speed parameters and multipath parameters to accurately train channel data under different moving speed environments. System simulation shows that the proposed algorithm can accurately train the number of channels. Compared with the traditional car networking channel estimation algorithm, the proposed algorithm improves the accuracy of channel estimation and effectively reduces the bit error rate.
{"title":"An Intelligent Channel Estimation Algorithm Based on Extended Model for 5G-V2X.","authors":"Jie Huang, Cheng Xu, Zhaohua Ji, Shan Xiao, Teng Liu, Nan Ma, Qinghui Zhou","doi":"10.1089/big.2022.0029","DOIUrl":"10.1089/big.2022.0029","url":null,"abstract":"<p><p>Car networking systems based on 5G-V2X (vehicle-to-everything) have high requirements for reliability and low-latency communication to further improve communication performance. In the V2X scenario, this article establishes an extended model (basic expansion model) suitable for high-speed mobile scenarios based on the sparsity of the channel impulse response. And propose a channel estimation algorithm based on deep learning, the method designed a multilayer convolutional neural network to complete frequency domain interpolation. A two-way control cycle gating unit (bidirectional gated recurrent unit) is designed to predict the state in the time domain. And introduce speed parameters and multipath parameters to accurately train channel data under different moving speed environments. System simulation shows that the proposed algorithm can accurately train the number of channels. Compared with the traditional car networking channel estimation algorithm, the proposed algorithm improves the accuracy of channel estimation and effectively reduces the bit error rate.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"127-140"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9328213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-06-08DOI: 10.1089/big.2022.0283
Chandu Thota, Dinesh Jackson Samuel, Mustafa Musa Jaber, M M Kamruzzaman, Renjith V Ravi, Lydia J Gnanasigamani, R Premalatha
Diabetic foot ulcer (DFU) is a problem worldwide, and prevention is crucial. The image segmentation analysis of DFU identification plays a significant role. This will produce different segmentation of the same idea, incomplete, imprecise, and other problems. To address these issues, a method of image segmentation analysis of DFU through internet of things with the technique of virtual sensing for semantically similar objects, the analysis of four levels of range segmentation (region-based, edge-based, image-based, and computer-aided design-based range segmentation) for deeper segmentation of images is implemented. In this study, the multimodal is compressed with the object co-segmentation for semantical segmentation. The result is predicting the better validity and reliability assessment. The experimental results demonstrate that the proposed model can efficiently perform segmentation analysis, with a lower error rate, than the existing methodologies. The findings on the multiple-image dataset show that DFU obtains an average segmentation score of 90.85% and 89.03% correspondingly in two types of labeled ratios before DFU with virtual sensing and after DFU without virtual sensing (i.e., 25% and 30%), which is an increase of 10.91% and 12.22% over the previous best results. In live DFU studies, our proposed system improved by 59.1% compared with existing deep segmentation-based techniques and its average image smart segmentation improvements over its contemporaries are 15.06%, 23.94%, and 45.41%, respectively. Proposed range-based segmentation achieves interobserver reliability by 73.9% on the positive test namely likelihood ratio test set with only a 0.25 million parameters at the pace of labeled data.
{"title":"Image Smart Segmentation Analysis Against Diabetic Foot Ulcer Using Internet of Things with Virtual Sensing.","authors":"Chandu Thota, Dinesh Jackson Samuel, Mustafa Musa Jaber, M M Kamruzzaman, Renjith V Ravi, Lydia J Gnanasigamani, R Premalatha","doi":"10.1089/big.2022.0283","DOIUrl":"10.1089/big.2022.0283","url":null,"abstract":"<p><p>Diabetic foot ulcer (DFU) is a problem worldwide, and prevention is crucial. The image segmentation analysis of DFU identification plays a significant role. This will produce different segmentation of the same idea, incomplete, imprecise, and other problems. To address these issues, a method of image segmentation analysis of DFU through internet of things with the technique of virtual sensing for semantically similar objects, the analysis of four levels of range segmentation (region-based, edge-based, image-based, and computer-aided design-based range segmentation) for deeper segmentation of images is implemented. In this study, the multimodal is compressed with the object co-segmentation for semantical segmentation. The result is predicting the better validity and reliability assessment. The experimental results demonstrate that the proposed model can efficiently perform segmentation analysis, with a lower error rate, than the existing methodologies. The findings on the multiple-image dataset show that DFU obtains an average segmentation score of 90.85% and 89.03% correspondingly in two types of labeled ratios before DFU with virtual sensing and after DFU without virtual sensing (i.e., 25% and 30%), which is an increase of 10.91% and 12.22% over the previous best results. In live DFU studies, our proposed system improved by 59.1% compared with existing deep segmentation-based techniques and its average image smart segmentation improvements over its contemporaries are 15.06%, 23.94%, and 45.41%, respectively. Proposed range-based segmentation achieves interobserver reliability by 73.9% on the positive test namely likelihood ratio test set with only a 0.25 million parameters at the pace of labeled data.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"155-172"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9595875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-02-24DOI: 10.1089/big.2022.0155
Suyel Namasudra, S Dhamodharavadhani, R Rathipriya, Ruben Gonzalez Crespo, Nageswara Rao Moparthi
Big data is a combination of large structured, semistructured, and unstructured data collected from various sources that must be processed before using them in many analytical applications. Anomalies or inconsistencies in big data refer to the occurrences of some data that are in some way unusual and do not fit the general patterns. It is considered one of the major problems of big data. Data trust method (DTM) is a technique used to identify and replace anomaly or untrustworthy data using the interpolation method. This article discusses the DTM used for univariate time series (UTS) forecasting algorithms for big data, which is considered the preprocessing approach by using a neural network (NN) model. In this work, DTM is the combination of statistical-based untrustworthy data detection method and statistical-based untrustworthy data replacement method, and it is used to improve the forecast quality of UTS. In this study, an enhanced NN model has been proposed for big data that incorporates DTMs with the NN-based UTS forecasting model. The coefficient variance root mean squared error is utilized as the main characteristic indicator in the proposed work to choose the best UTS data for model development. The results show the effectiveness of the proposed method as it can improve the prediction process by determining and replacing the untrustworthy big data.
大数据是从各种来源收集的大量结构化、半结构化和非结构化数据的组合,在许多分析应用中使用这些数据之前必须对其进行处理。大数据中的异常或不一致是指某些数据在某种程度上不寻常,不符合一般模式。它被认为是大数据的主要问题之一。数据信任方法(DTM)是一种使用插值法识别和替换异常或不可信数据的技术。本文讨论了用于大数据单变量时间序列(UTS)预测算法的 DTM,它被认为是使用神经网络(NN)模型的预处理方法。在这项工作中,DTM 是基于统计的不可信数据检测方法和基于统计的不可信数据替换方法的组合,用于提高 UTS 的预测质量。本研究提出了一种针对大数据的增强型 NN 模型,将 DTM 与基于 NN 的UTS 预测模型相结合。该模型以系数方差均方根误差为主要特征指标,选择最佳的UTS数据进行模型开发。结果表明了所提方法的有效性,因为它可以通过确定和替换不可信的大数据来改进预测过程。
{"title":"Enhanced Neural Network-Based Univariate Time-Series Forecasting Model for Big Data.","authors":"Suyel Namasudra, S Dhamodharavadhani, R Rathipriya, Ruben Gonzalez Crespo, Nageswara Rao Moparthi","doi":"10.1089/big.2022.0155","DOIUrl":"10.1089/big.2022.0155","url":null,"abstract":"<p><p>Big data is a combination of large structured, semistructured, and unstructured data collected from various sources that must be processed before using them in many analytical applications. Anomalies or inconsistencies in big data refer to the occurrences of some data that are in some way unusual and do not fit the general patterns. It is considered one of the major problems of big data. Data trust method (DTM) is a technique used to identify and replace anomaly or untrustworthy data using the interpolation method. This article discusses the DTM used for univariate time series (UTS) forecasting algorithms for big data, which is considered the preprocessing approach by using a neural network (NN) model. In this work, DTM is the combination of statistical-based untrustworthy data detection method and statistical-based untrustworthy data replacement method, and it is used to improve the forecast quality of UTS. In this study, an enhanced NN model has been proposed for big data that incorporates DTMs with the NN-based UTS forecasting model. The coefficient variance root mean squared error is utilized as the main characteristic indicator in the proposed work to choose the best UTS data for model development. The results show the effectiveness of the proposed method as it can improve the prediction process by determining and replacing the untrustworthy big data.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"83-99"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9320511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Public persons are nodes with high attention to public events, and their opinions can directly affect the development on events. However, because of rationality, the followers' acceptance to the public persons' opinions will depend on the information trait on public persons' opinions and own comprehension. To study how different opinions of the public persons guide different followers, we build an opinion dynamics model, which would provide a theoretical method for public opinion management. Based on the classical bounded confidence model, we extract the information quality variables and individual trust threshold and introduce them to construct our two-stage opinion evolution model. And then in the simulation experiments, we analyze the different effects of opinion information quality, opinion release time, and frequency on public opinion by adjusting the different parameters. Finally, we added a case to compare real data, the data from classical model simulation and the data from improved model simulation to verify the effectiveness on our model. The research found that the more sufficient the argument and the more moderate the attitude, the more likely to guide the public opinion. If public person holds different opinions and different information quality, he should choose different time to present his opinion to achieve ideal guide effect. When public person holds neutral opinion and the information quality is relatively general, he/she can intervene in public opinion as soon as possible to control final public opinion; when public person holds extreme opinion and the information quality is relatively high, he/she can choose to express opinion after a certain period on public opinion evolution, which is conducive to improve the guidance effect on public opinion. The frequency of releasing opinions of public person consistently has a positive impact on the final public opinion.
{"title":"Opinion Evolution with Information Quality of Public Person and Mass Acceptance Threshold.","authors":"Jing Wei, Yuguang Jia, Wanyi Tie, Hengmin Zhu, Weidong Huang","doi":"10.1089/big.2022.0271","DOIUrl":"10.1089/big.2022.0271","url":null,"abstract":"<p><p>Public persons are nodes with high attention to public events, and their opinions can directly affect the development on events. However, because of rationality, the followers' acceptance to the public persons' opinions will depend on the information trait on public persons' opinions and own comprehension. To study how different opinions of the public persons guide different followers, we build an opinion dynamics model, which would provide a theoretical method for public opinion management. Based on the classical bounded confidence model, we extract the information quality variables and individual trust threshold and introduce them to construct our two-stage opinion evolution model. And then in the simulation experiments, we analyze the different effects of opinion information quality, opinion release time, and frequency on public opinion by adjusting the different parameters. Finally, we added a case to compare real data, the data from classical model simulation and the data from improved model simulation to verify the effectiveness on our model. The research found that the more sufficient the argument and the more moderate the attitude, the more likely to guide the public opinion. If public person holds different opinions and different information quality, he should choose different time to present his opinion to achieve ideal guide effect. When public person holds neutral opinion and the information quality is relatively general, he/she can intervene in public opinion as soon as possible to control final public opinion; when public person holds extreme opinion and the information quality is relatively high, he/she can choose to express opinion after a certain period on public opinion evolution, which is conducive to improve the guidance effect on public opinion. The frequency of releasing opinions of public person consistently has a positive impact on the final public opinion.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"100-109"},"PeriodicalIF":2.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9547763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}