Pub Date : 2024-08-01Epub Date: 2023-04-24DOI: 10.1089/big.2022.0124
Golsa Mahdavi, Mohammad Amin Hariri-Ardebili
In material science and engineering, the estimation of material properties and their failure modes is associated with physical experiments followed by modeling and optimization. However, proper optimization is challenging and computationally expensive. The main reason is the highly nonlinear behavior of brittle materials such as concrete. In this study, the application of surrogate models to predict the mechanical characteristics of concrete is investigated. Specifically, meta-models such as polynomial chaos expansion, Kriging, and canonical low-rank approximation are used for predicting the compressive strength of two different types of concrete (collected from experimental data in the literature). Various assumptions in surrogate models are examined, and the accuracy of each one is evaluated for the problem at hand. Finally, the optimal solution is provided. This study paves the road for other applications of surrogate models in material science and engineering.
{"title":"Kriging, Polynomial Chaos Expansion, and Low-Rank Approximations in Material Science and Big Data Analytics.","authors":"Golsa Mahdavi, Mohammad Amin Hariri-Ardebili","doi":"10.1089/big.2022.0124","DOIUrl":"10.1089/big.2022.0124","url":null,"abstract":"<p><p>In material science and engineering, the estimation of material properties and their failure modes is associated with physical experiments followed by modeling and optimization. However, proper optimization is challenging and computationally expensive. The main reason is the highly nonlinear behavior of brittle materials such as concrete. In this study, the application of surrogate models to predict the mechanical characteristics of concrete is investigated. Specifically, meta-models such as polynomial chaos expansion, Kriging, and canonical low-rank approximation are used for predicting the compressive strength of two different types of concrete (collected from experimental data in the literature). Various assumptions in surrogate models are examined, and the accuracy of each one is evaluated for the problem at hand. Finally, the optimal solution is provided. This study paves the road for other applications of surrogate models in material science and engineering.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"270-281"},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9446353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinuo Qian, Fuzhong Nian, Zheming Wang, Yabing Yao
Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.
{"title":"Research on the Influence of Information Iterative Propagation on Complex Network Structure.","authors":"Yinuo Qian, Fuzhong Nian, Zheming Wang, Yabing Yao","doi":"10.1089/big.2023.0016","DOIUrl":"https://doi.org/10.1089/big.2023.0016","url":null,"abstract":"<p><p>Dynamic propagation will affect the change of network structure. Different networks are affected by the iterative propagation of information to different degrees. The iterative propagation of information in the network changes the connection strength of the chain edge between nodes. Most studies on temporal networks build networks based on time characteristics, and the iterative propagation of information in the network can also reflect the time characteristics of network evolution. The change of network structure is a macromanifestation of time characteristics, whereas the dynamics in the network is a micromanifestation of time characteristics. How to concretely visualize the change of network structure influenced by the characteristics of propagation dynamics has become the focus of this article. The appearance of chain edge is the micro change of network structure, and the division of community is the macro change of network structure. Based on this, the node participation is proposed to quantify the influence of different users on the information propagation in the network, and it is simulated in different types of networks. By analyzing the iterative propagation of information, the weighted network of different networks based on the iterative propagation of information is constructed. Finally, the chain edge and community division in the network are analyzed to achieve the purpose of quantifying the influence of network propagation on complex network structure.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141789804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Survival models have found wider and wider applications in credit scoring recently due to their ability to estimate the dynamics of risk over time. In this research, we propose a Buckley-James safe sample screening support vector regression (BJS4VR) algorithm to model large-scale survival data by combing the Buckley-James transformation and support vector regression. Different from previous support vector regression survival models, censored samples here are imputed using a censoring unbiased Buckley-James estimator. Safe sample screening is then applied to discard samples that guaranteed to be non-active at the final optimal solution from the original data to improve efficiency. Experimental results on the large-scale real lending club loan data have shown that the proposed BJS4VR model outperforms existing popular survival models such as RSFM, CoxRidge and CoxBoost in terms of both prediction accuracy and time efficiency. Important variables highly correlated with credit risk are also identified with the proposed method.
{"title":"A Fast Survival Support Vector Regression Approach to Large Scale Credit Scoring via Safe Screening.","authors":"Hong Wang, Ling Hong","doi":"10.1089/big.2023.0033","DOIUrl":"https://doi.org/10.1089/big.2023.0033","url":null,"abstract":"<p><p>Survival models have found wider and wider applications in credit scoring recently due to their ability to estimate the dynamics of risk over time. In this research, we propose a Buckley-James safe sample screening support vector regression (BJS4VR) algorithm to model large-scale survival data by combing the Buckley-James transformation and support vector regression. Different from previous support vector regression survival models, censored samples here are imputed using a censoring unbiased Buckley-James estimator. Safe sample screening is then applied to discard samples that guaranteed to be non-active at the final optimal solution from the original data to improve efficiency. Experimental results on the large-scale real lending club loan data have shown that the proposed BJS4VR model outperforms existing popular survival models such as RSFM, CoxRidge and CoxBoost in terms of both prediction accuracy and time efficiency. Important variables highly correlated with credit risk are also identified with the proposed method.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extracting meaningful patterns of human mobility from accumulating trajectories is essential for understanding human behavior. However, previous works identify human mobility patterns based on the spatial co-occurrence of trajectories, which ignores the effect of activity content, leaving challenges in effectively extracting and understanding patterns. To bridge this gap, this study incorporates the activity content of trajectories to extract human mobility patterns, and proposes acontent-aware mobility pattern model. The model first embeds the activity content in distributed continuous vector space by taking point-of-interest as an agent and then extracts representative and interpretable mobility patterns from human trajectory sets using a derived topic model. To investigate the performance of the proposed model, several evaluation metrics are developed, including pattern coherence, pattern similarity, and manual scoring. A real-world case study is conducted, and its experimental results show that the proposed model improves interpretability and helps to understand mobility patterns. This study provides not only a novel solution and several evaluation metrics for human mobility patterns but also a method reference for fusing content semantics of human activities for trajectory analysis and mining.
{"title":"Content-Aware Human Mobility Pattern Extraction.","authors":"Shengwen Li, Chaofan Fan, Tianci Li, Renyao Chen, Qingyuan Liu, Junfang Gong","doi":"10.1089/big.2022.0281","DOIUrl":"https://doi.org/10.1089/big.2022.0281","url":null,"abstract":"<p><p>Extracting meaningful patterns of human mobility from accumulating trajectories is essential for understanding human behavior. However, previous works identify human mobility patterns based on the spatial co-occurrence of trajectories, which ignores the effect of activity content, leaving challenges in effectively extracting and understanding patterns. To bridge this gap, this study incorporates the activity content of trajectories to extract human mobility patterns, and proposes acontent-aware mobility pattern model. The model first embeds the activity content in distributed continuous vector space by taking point-of-interest as an agent and then extracts representative and interpretable mobility patterns from human trajectory sets using a derived topic model. To investigate the performance of the proposed model, several evaluation metrics are developed, including pattern coherence, pattern similarity, and manual scoring. A real-world case study is conducted, and its experimental results show that the proposed model improves interpretability and helps to understand mobility patterns. This study provides not only a novel solution and several evaluation metrics for human mobility patterns but also a method reference for fusing content semantics of human activities for trajectory analysis and mining.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-02-27DOI: 10.1089/big.2022.0029
Jie Huang, Cheng Xu, Zhaohua Ji, Shan Xiao, Teng Liu, Nan Ma, Qinghui Zhou
Car networking systems based on 5G-V2X (vehicle-to-everything) have high requirements for reliability and low-latency communication to further improve communication performance. In the V2X scenario, this article establishes an extended model (basic expansion model) suitable for high-speed mobile scenarios based on the sparsity of the channel impulse response. And propose a channel estimation algorithm based on deep learning, the method designed a multilayer convolutional neural network to complete frequency domain interpolation. A two-way control cycle gating unit (bidirectional gated recurrent unit) is designed to predict the state in the time domain. And introduce speed parameters and multipath parameters to accurately train channel data under different moving speed environments. System simulation shows that the proposed algorithm can accurately train the number of channels. Compared with the traditional car networking channel estimation algorithm, the proposed algorithm improves the accuracy of channel estimation and effectively reduces the bit error rate.
{"title":"An Intelligent Channel Estimation Algorithm Based on Extended Model for 5G-V2X.","authors":"Jie Huang, Cheng Xu, Zhaohua Ji, Shan Xiao, Teng Liu, Nan Ma, Qinghui Zhou","doi":"10.1089/big.2022.0029","DOIUrl":"10.1089/big.2022.0029","url":null,"abstract":"<p><p>Car networking systems based on 5G-V2X (vehicle-to-everything) have high requirements for reliability and low-latency communication to further improve communication performance. In the V2X scenario, this article establishes an extended model (basic expansion model) suitable for high-speed mobile scenarios based on the sparsity of the channel impulse response. And propose a channel estimation algorithm based on deep learning, the method designed a multilayer convolutional neural network to complete frequency domain interpolation. A two-way control cycle gating unit (bidirectional gated recurrent unit) is designed to predict the state in the time domain. And introduce speed parameters and multipath parameters to accurately train channel data under different moving speed environments. System simulation shows that the proposed algorithm can accurately train the number of channels. Compared with the traditional car networking channel estimation algorithm, the proposed algorithm improves the accuracy of channel estimation and effectively reduces the bit error rate.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"127-140"},"PeriodicalIF":4.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9328213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-04-19DOI: 10.1089/big.2022.0278
Chandu Thota, Constandinos X Mavromoustakis, George Mastorakis
The reliability in medical data organization and transmission is eased with the inheritance of information and communication technologies in recent years. The growth of digital communication and sharing medium imposes the necessity for optimizing the accessibility and transmission of sensitive medical data to the end-users. In this article, the Preemptive Information Transmission Model (PITM) is introduced for improving the promptness in medical data delivery. This transmission model is designed to acquire the least communication in an epidemic region for seamless information availability. The proposed model makes use of a noncyclic connection procedure and preemptive forwarding inside and outside the epidemic region. The first is responsible for replication-less connection maximization ensuring better availability of the edge nodes. The connection replications are reduced using the pruning tree classifiers based on the communication time and delivery balancing factor. The later process is responsible for the reliable forwarding of the acquired data using a conditional selection of the infrastructure units. Both the processes of PITM are accountable for improving the delivery of observed medical data, over better transmissions, communication time, and achieving fewer delays.
{"title":"Preemptive Epidemic Information Transmission Model Using Nonreplication Edge Node Connectivity in Health Care Networks.","authors":"Chandu Thota, Constandinos X Mavromoustakis, George Mastorakis","doi":"10.1089/big.2022.0278","DOIUrl":"10.1089/big.2022.0278","url":null,"abstract":"<p><p>The reliability in medical data organization and transmission is eased with the inheritance of information and communication technologies in recent years. The growth of digital communication and sharing medium imposes the necessity for optimizing the accessibility and transmission of sensitive medical data to the end-users. In this article, the Preemptive Information Transmission Model (PITM) is introduced for improving the promptness in medical data delivery. This transmission model is designed to acquire the least communication in an epidemic region for seamless information availability. The proposed model makes use of a noncyclic connection procedure and preemptive forwarding inside and outside the epidemic region. The first is responsible for replication-less connection maximization ensuring better availability of the edge nodes. The connection replications are reduced using the pruning tree classifiers based on the communication time and delivery balancing factor. The later process is responsible for the reliable forwarding of the acquired data using a conditional selection of the infrastructure units. Both the processes of PITM are accountable for improving the delivery of observed medical data, over better transmissions, communication time, and achieving fewer delays.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"141-154"},"PeriodicalIF":4.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9440721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-06-08DOI: 10.1089/big.2022.0283
Chandu Thota, Dinesh Jackson Samuel, Mustafa Musa Jaber, M M Kamruzzaman, Renjith V Ravi, Lydia J Gnanasigamani, R Premalatha
Diabetic foot ulcer (DFU) is a problem worldwide, and prevention is crucial. The image segmentation analysis of DFU identification plays a significant role. This will produce different segmentation of the same idea, incomplete, imprecise, and other problems. To address these issues, a method of image segmentation analysis of DFU through internet of things with the technique of virtual sensing for semantically similar objects, the analysis of four levels of range segmentation (region-based, edge-based, image-based, and computer-aided design-based range segmentation) for deeper segmentation of images is implemented. In this study, the multimodal is compressed with the object co-segmentation for semantical segmentation. The result is predicting the better validity and reliability assessment. The experimental results demonstrate that the proposed model can efficiently perform segmentation analysis, with a lower error rate, than the existing methodologies. The findings on the multiple-image dataset show that DFU obtains an average segmentation score of 90.85% and 89.03% correspondingly in two types of labeled ratios before DFU with virtual sensing and after DFU without virtual sensing (i.e., 25% and 30%), which is an increase of 10.91% and 12.22% over the previous best results. In live DFU studies, our proposed system improved by 59.1% compared with existing deep segmentation-based techniques and its average image smart segmentation improvements over its contemporaries are 15.06%, 23.94%, and 45.41%, respectively. Proposed range-based segmentation achieves interobserver reliability by 73.9% on the positive test namely likelihood ratio test set with only a 0.25 million parameters at the pace of labeled data.
{"title":"Image Smart Segmentation Analysis Against Diabetic Foot Ulcer Using Internet of Things with Virtual Sensing.","authors":"Chandu Thota, Dinesh Jackson Samuel, Mustafa Musa Jaber, M M Kamruzzaman, Renjith V Ravi, Lydia J Gnanasigamani, R Premalatha","doi":"10.1089/big.2022.0283","DOIUrl":"10.1089/big.2022.0283","url":null,"abstract":"<p><p>Diabetic foot ulcer (DFU) is a problem worldwide, and prevention is crucial. The image segmentation analysis of DFU identification plays a significant role. This will produce different segmentation of the same idea, incomplete, imprecise, and other problems. To address these issues, a method of image segmentation analysis of DFU through internet of things with the technique of virtual sensing for semantically similar objects, the analysis of four levels of range segmentation (region-based, edge-based, image-based, and computer-aided design-based range segmentation) for deeper segmentation of images is implemented. In this study, the multimodal is compressed with the object co-segmentation for semantical segmentation. The result is predicting the better validity and reliability assessment. The experimental results demonstrate that the proposed model can efficiently perform segmentation analysis, with a lower error rate, than the existing methodologies. The findings on the multiple-image dataset show that DFU obtains an average segmentation score of 90.85% and 89.03% correspondingly in two types of labeled ratios before DFU with virtual sensing and after DFU without virtual sensing (i.e., 25% and 30%), which is an increase of 10.91% and 12.22% over the previous best results. In live DFU studies, our proposed system improved by 59.1% compared with existing deep segmentation-based techniques and its average image smart segmentation improvements over its contemporaries are 15.06%, 23.94%, and 45.41%, respectively. Proposed range-based segmentation achieves interobserver reliability by 73.9% on the positive test namely likelihood ratio test set with only a 0.25 million parameters at the pace of labeled data.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"155-172"},"PeriodicalIF":4.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9595875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Public persons are nodes with high attention to public events, and their opinions can directly affect the development on events. However, because of rationality, the followers' acceptance to the public persons' opinions will depend on the information trait on public persons' opinions and own comprehension. To study how different opinions of the public persons guide different followers, we build an opinion dynamics model, which would provide a theoretical method for public opinion management. Based on the classical bounded confidence model, we extract the information quality variables and individual trust threshold and introduce them to construct our two-stage opinion evolution model. And then in the simulation experiments, we analyze the different effects of opinion information quality, opinion release time, and frequency on public opinion by adjusting the different parameters. Finally, we added a case to compare real data, the data from classical model simulation and the data from improved model simulation to verify the effectiveness on our model. The research found that the more sufficient the argument and the more moderate the attitude, the more likely to guide the public opinion. If public person holds different opinions and different information quality, he should choose different time to present his opinion to achieve ideal guide effect. When public person holds neutral opinion and the information quality is relatively general, he/she can intervene in public opinion as soon as possible to control final public opinion; when public person holds extreme opinion and the information quality is relatively high, he/she can choose to express opinion after a certain period on public opinion evolution, which is conducive to improve the guidance effect on public opinion. The frequency of releasing opinions of public person consistently has a positive impact on the final public opinion.
{"title":"Opinion Evolution with Information Quality of Public Person and Mass Acceptance Threshold.","authors":"Jing Wei, Yuguang Jia, Wanyi Tie, Hengmin Zhu, Weidong Huang","doi":"10.1089/big.2022.0271","DOIUrl":"10.1089/big.2022.0271","url":null,"abstract":"<p><p>Public persons are nodes with high attention to public events, and their opinions can directly affect the development on events. However, because of rationality, the followers' acceptance to the public persons' opinions will depend on the information trait on public persons' opinions and own comprehension. To study how different opinions of the public persons guide different followers, we build an opinion dynamics model, which would provide a theoretical method for public opinion management. Based on the classical bounded confidence model, we extract the information quality variables and individual trust threshold and introduce them to construct our two-stage opinion evolution model. And then in the simulation experiments, we analyze the different effects of opinion information quality, opinion release time, and frequency on public opinion by adjusting the different parameters. Finally, we added a case to compare real data, the data from classical model simulation and the data from improved model simulation to verify the effectiveness on our model. The research found that the more sufficient the argument and the more moderate the attitude, the more likely to guide the public opinion. If public person holds different opinions and different information quality, he should choose different time to present his opinion to achieve ideal guide effect. When public person holds neutral opinion and the information quality is relatively general, he/she can intervene in public opinion as soon as possible to control final public opinion; when public person holds extreme opinion and the information quality is relatively high, he/she can choose to express opinion after a certain period on public opinion evolution, which is conducive to improve the guidance effect on public opinion. The frequency of releasing opinions of public person consistently has a positive impact on the final public opinion.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"100-109"},"PeriodicalIF":4.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9547763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-02-24DOI: 10.1089/big.2022.0155
Suyel Namasudra, S Dhamodharavadhani, R Rathipriya, Ruben Gonzalez Crespo, Nageswara Rao Moparthi
Big data is a combination of large structured, semistructured, and unstructured data collected from various sources that must be processed before using them in many analytical applications. Anomalies or inconsistencies in big data refer to the occurrences of some data that are in some way unusual and do not fit the general patterns. It is considered one of the major problems of big data. Data trust method (DTM) is a technique used to identify and replace anomaly or untrustworthy data using the interpolation method. This article discusses the DTM used for univariate time series (UTS) forecasting algorithms for big data, which is considered the preprocessing approach by using a neural network (NN) model. In this work, DTM is the combination of statistical-based untrustworthy data detection method and statistical-based untrustworthy data replacement method, and it is used to improve the forecast quality of UTS. In this study, an enhanced NN model has been proposed for big data that incorporates DTMs with the NN-based UTS forecasting model. The coefficient variance root mean squared error is utilized as the main characteristic indicator in the proposed work to choose the best UTS data for model development. The results show the effectiveness of the proposed method as it can improve the prediction process by determining and replacing the untrustworthy big data.
大数据是从各种来源收集的大量结构化、半结构化和非结构化数据的组合,在许多分析应用中使用这些数据之前必须对其进行处理。大数据中的异常或不一致是指某些数据在某种程度上不寻常,不符合一般模式。它被认为是大数据的主要问题之一。数据信任方法(DTM)是一种使用插值法识别和替换异常或不可信数据的技术。本文讨论了用于大数据单变量时间序列(UTS)预测算法的 DTM,它被认为是使用神经网络(NN)模型的预处理方法。在这项工作中,DTM 是基于统计的不可信数据检测方法和基于统计的不可信数据替换方法的组合,用于提高 UTS 的预测质量。本研究提出了一种针对大数据的增强型 NN 模型,将 DTM 与基于 NN 的UTS 预测模型相结合。该模型以系数方差均方根误差为主要特征指标,选择最佳的UTS数据进行模型开发。结果表明了所提方法的有效性,因为它可以通过确定和替换不可信的大数据来改进预测过程。
{"title":"Enhanced Neural Network-Based Univariate Time-Series Forecasting Model for Big Data.","authors":"Suyel Namasudra, S Dhamodharavadhani, R Rathipriya, Ruben Gonzalez Crespo, Nageswara Rao Moparthi","doi":"10.1089/big.2022.0155","DOIUrl":"10.1089/big.2022.0155","url":null,"abstract":"<p><p>Big data is a combination of large structured, semistructured, and unstructured data collected from various sources that must be processed before using them in many analytical applications. Anomalies or inconsistencies in big data refer to the occurrences of some data that are in some way unusual and do not fit the general patterns. It is considered one of the major problems of big data. Data trust method (DTM) is a technique used to identify and replace anomaly or untrustworthy data using the interpolation method. This article discusses the DTM used for univariate time series (UTS) forecasting algorithms for big data, which is considered the preprocessing approach by using a neural network (NN) model. In this work, DTM is the combination of statistical-based untrustworthy data detection method and statistical-based untrustworthy data replacement method, and it is used to improve the forecast quality of UTS. In this study, an enhanced NN model has been proposed for big data that incorporates DTMs with the NN-based UTS forecasting model. The coefficient variance root mean squared error is utilized as the main characteristic indicator in the proposed work to choose the best UTS data for model development. The results show the effectiveness of the proposed method as it can improve the prediction process by determining and replacing the untrustworthy big data.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"83-99"},"PeriodicalIF":4.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9320511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2023-03-03DOI: 10.1089/big.2022.0095
Dipesh Kumar, Nirupama Mandal, Yugal Kumar
In recent years, the world has seen incremental growth in online activities owing to which the volume of data in cloud servers has also been increasing exponentially. With rapidly increasing data, load on cloud servers has increased in the cloud computing environment. With rapidly evolving technology, various cloud-based systems were developed to enhance the user experience. But, the increased online activities around the globe have also increased data load on the cloud-based systems. To maintain the efficiency and performance of the applications hosted in cloud servers, task scheduling has become very important. The task scheduling process helps in reducing the makespan time and average cost by scheduling the tasks to virtual machines (VMs). The task scheduling depends on assigning tasks to VMs to process the incoming tasks. The task scheduling should follow some algorithm for assigning tasks to VMs. Many researchers have proposed different scheduling algorithms for task scheduling in the cloud computing environment. In this article, an advanced form of the shuffled frog optimization algorithm, which works on the nature and behavior of frogs searching for food, has been proposed. The authors have introduced a new algorithm to shuffle the position of frogs in memeplex to obtain the best result. By using this optimization technique, the cost function of the central processing unit, makespan, and fitness function were calculated. The fitness function is the sum of the budget cost function and the makespan time. The proposed method helps in reducing the makespan time as well as the average cost by scheduling the tasks to VMs effectively. Finally, the performance of the proposed advanced shuffled frog optimization method is compared with existing task scheduling methods such as whale optimization-based scheduler (W-Scheduler), sliced particle swarm optimization (SPSO-SA), inverted ant colony optimization algorithm, and static learning particle swarm optimization (SLPSO-SA) in terms of average cost and metric makespan. Experimentally, it was concluded that the proposed advanced frog optimization algorithm can schedule tasks to the VMs more effectively as compared with other scheduling methods with a makespan of 6, average cost of 4, and fitness of 10.
{"title":"Cloud-Based Advanced Shuffled Frog Leaping Algorithm for Tasks Scheduling.","authors":"Dipesh Kumar, Nirupama Mandal, Yugal Kumar","doi":"10.1089/big.2022.0095","DOIUrl":"10.1089/big.2022.0095","url":null,"abstract":"<p><p>In recent years, the world has seen incremental growth in online activities owing to which the volume of data in cloud servers has also been increasing exponentially. With rapidly increasing data, load on cloud servers has increased in the cloud computing environment. With rapidly evolving technology, various cloud-based systems were developed to enhance the user experience. But, the increased online activities around the globe have also increased data load on the cloud-based systems. To maintain the efficiency and performance of the applications hosted in cloud servers, task scheduling has become very important. The task scheduling process helps in reducing the makespan time and average cost by scheduling the tasks to virtual machines (VMs). The task scheduling depends on assigning tasks to VMs to process the incoming tasks. The task scheduling should follow some algorithm for assigning tasks to VMs. Many researchers have proposed different scheduling algorithms for task scheduling in the cloud computing environment. In this article, an advanced form of the shuffled frog optimization algorithm, which works on the nature and behavior of frogs searching for food, has been proposed. The authors have introduced a new algorithm to shuffle the position of frogs in memeplex to obtain the best result. By using this optimization technique, the cost function of the central processing unit, makespan, and fitness function were calculated. The fitness function is the sum of the budget cost function and the makespan time. The proposed method helps in reducing the makespan time as well as the average cost by scheduling the tasks to VMs effectively. Finally, the performance of the proposed advanced shuffled frog optimization method is compared with existing task scheduling methods such as whale optimization-based scheduler (W-Scheduler), sliced particle swarm optimization (SPSO-SA), inverted ant colony optimization algorithm, and static learning particle swarm optimization (SLPSO-SA) in terms of average cost and metric makespan. Experimentally, it was concluded that the proposed advanced frog optimization algorithm can schedule tasks to the VMs more effectively as compared with other scheduling methods with a makespan of 6, average cost of 4, and fitness of 10.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"110-126"},"PeriodicalIF":4.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10821344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}