Pub Date : 2024-02-03DOI: 10.1016/j.bdr.2024.100439
Tongfei Li , Mingzheng Lai , Shixian Nie , Haifeng Liu , Zhiyao Liang , Wei Lv
The accurate and timely prediction of tropical cyclones is of paramount importance in mitigating the impact of these catastrophic meteorological events. Presently, methods for predicting tropical cyclones based on satellite remote sensing images encounter notable challenges, including the inadequate extraction of three-dimensional spatial features and limitations in long-term forecasting. As a response to these challenges, this study introduces the Temporal Attention Mechanism ConvLSTM (TAM-CL) model, designed to conduct thorough spatiotemporal feature extraction on three-dimensional atmospheric reanalysis data of tropical cyclones. By leveraging ConvLSTM with three-dimensional convolution kernels, our model enhances the extraction of three-dimensional spatiotemporal features. Furthermore, an attention mechanism is integrated to bolster long-term prediction accuracy by emphasizing crucial temporal nodes. In the evaluation of tropical cyclone track and intensity forecasts across 24, 48, and 72 h, TAM-CL demonstrates a notable reduction in prediction errors, thereby underscoring its efficacy in forecasting both cyclone tracks and intensities. This contributes to an effective exploration of the application of deep networks in conjunction with atmospheric reanalysis data.
{"title":"Tropical cyclone trajectory based on satellite remote sensing prediction and time attention mechanism ConvLSTM model","authors":"Tongfei Li , Mingzheng Lai , Shixian Nie , Haifeng Liu , Zhiyao Liang , Wei Lv","doi":"10.1016/j.bdr.2024.100439","DOIUrl":"10.1016/j.bdr.2024.100439","url":null,"abstract":"<div><p>The accurate and timely prediction of tropical cyclones is of paramount importance in mitigating the impact of these catastrophic meteorological events. Presently, methods for predicting tropical cyclones based on satellite remote sensing images encounter notable challenges, including the inadequate extraction of three-dimensional spatial features and limitations in long-term forecasting. As a response to these challenges, this study introduces the Temporal Attention Mechanism ConvLSTM (TAM-CL) model, designed to conduct thorough spatiotemporal feature extraction on three-dimensional atmospheric reanalysis data of tropical cyclones. By leveraging ConvLSTM with three-dimensional convolution kernels, our model enhances the extraction of three-dimensional spatiotemporal features. Furthermore, an attention mechanism is integrated to bolster long-term prediction accuracy by emphasizing crucial temporal nodes. In the evaluation of tropical cyclone track and intensity forecasts across 24, 48, and 72 h, TAM-CL demonstrates a notable reduction in prediction errors, thereby underscoring its efficacy in forecasting both cyclone tracks and intensities. This contributes to an effective exploration of the application of deep networks in conjunction with atmospheric reanalysis data.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01DOI: 10.1016/j.bdr.2024.100444
Bo Jiang, Hao Wang, Hanxu Ma
{"title":"A Big Data Driven Vegetation Disease and Pest Region Identification Method Based on Self supervised Convolutional Neural Networks and Parallel Extreme Learning Machines","authors":"Bo Jiang, Hao Wang, Hanxu Ma","doi":"10.1016/j.bdr.2024.100444","DOIUrl":"https://doi.org/10.1016/j.bdr.2024.100444","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139827607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-26DOI: 10.1016/j.bdr.2024.100427
Zhenzhen Zhao , Guojiang Shen , Lei Wang , Xiangjie Kong
Traffic information can reflect the operating status of a city, and accurate traffic forecasting is critical in intelligent transportation systems (ITS) and urban planning. However, traffic information has complex nonlinearity and dynamic spatial-temporal dependencies due to human mobility, bringing new traffic forecasting challenges. This paper proposed a graph spatial-temporal transformer network for traffic prediction (GSTTN) to cope with the above problems. Specifically, the proposed framework explores spatial characteristics of the across-road network of traffic information hidden in human behavior patterns via a multi-view graph convolutional network (GCN). Furthermore, the transformer network with a multi-head attention mechanism is adopted to capture the random disturbance in the time series characteristics of traffic information. As a result, these two components can be used to model spatial relations and temporal trends. Finally, we examine real-world datasets, and the experiments show that the proposed framework outperforms the current state-of-the-art baselines.
{"title":"Graph Spatial-Temporal Transformer Network for Traffic Prediction","authors":"Zhenzhen Zhao , Guojiang Shen , Lei Wang , Xiangjie Kong","doi":"10.1016/j.bdr.2024.100427","DOIUrl":"10.1016/j.bdr.2024.100427","url":null,"abstract":"<div><p><span>Traffic information can reflect the operating status of a city, and accurate traffic forecasting is critical in intelligent transportation systems (ITS) and urban planning. However, traffic information has complex nonlinearity and dynamic spatial-temporal dependencies due to human mobility, bringing new traffic forecasting challenges. This paper proposed a graph spatial-temporal transformer network for </span>traffic prediction<span> (GSTTN) to cope with the above problems. Specifically, the proposed framework explores spatial characteristics of the across-road network of traffic information hidden in human behavior patterns via a multi-view graph convolutional network<span> (GCN). Furthermore, the transformer network with a multi-head attention mechanism is adopted to capture the random disturbance in the time series characteristics of traffic information. As a result, these two components can be used to model spatial relations and temporal trends. Finally, we examine real-world datasets, and the experiments show that the proposed framework outperforms the current state-of-the-art baselines.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139582754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-18DOI: 10.1016/j.bdr.2024.100425
Yandong Li , Bo Jiang , Weilong Liu , Chenglong Li , Yunfan Zhou
Real-time and accurate prediction of terminal area arrival traffic flow is a key issue for terminal area traffic management. In this paper, we study the advantages and disadvantages of traditional dynamics-based prediction methods and time-series based prediction methods in the first step. Taking the advantages of the two type of methods, a terminal area arrival flow prediction framework based on airspace situation is proposed. In our method, the airspace situation is used as the machine learning feature to estimate the number of arrival aircraft. In addition, also based on machine learning approach, a correction stage is added to the algorithm to improve the accuracy of the prediction. ADS-B data collected from the terminal area of Chengdu is used to study the prediction accuracy based on different machine learning algorithms in the proposed framework. Experimental results show that the proposed method can predict the air traffic flow accurately. The average absolute error is only 0.35 aircraft/15 min, the root mean square error is 0.67 aircraft/15 min, and the maximum absolute error is 2 aircraft/15 min. Compared with the AOL method, our proposed method improves the accuracy of prediction by a margin of 90 % and 60 % according to the evaluation metrics of MAE and MAXAE, respectively.
{"title":"Airspace situation analysis of terminal area traffic flow prediction based on big data and machine learning methods","authors":"Yandong Li , Bo Jiang , Weilong Liu , Chenglong Li , Yunfan Zhou","doi":"10.1016/j.bdr.2024.100425","DOIUrl":"10.1016/j.bdr.2024.100425","url":null,"abstract":"<div><p>Real-time and accurate prediction of terminal area arrival traffic flow is a key issue for terminal area traffic management. In this paper, we study the advantages and disadvantages of traditional dynamics-based prediction methods and time-series based prediction methods in the first step. Taking the advantages of the two type of methods, a terminal area arrival flow prediction framework based on airspace situation is proposed. In our method, the airspace situation is used as the machine learning feature to estimate the number of arrival aircraft. In addition, also based on machine learning approach, a correction stage is added to the algorithm to improve the accuracy of the prediction. ADS-B data collected from the terminal area of Chengdu is used to study the prediction accuracy based on different machine learning algorithms in the proposed framework. Experimental results show that the proposed method can predict the air traffic flow accurately. The average absolute error is only 0.35 aircraft/15 min, the root mean square error is 0.67 aircraft/15 min, and the maximum absolute error is 2 aircraft/15 min. Compared with the AOL method, our proposed method improves the accuracy of prediction by a margin of 90 % and 60 % according to the evaluation metrics of MAE and MAXAE, respectively.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579624000017/pdfft?md5=399453e55e15e7b2fc74c8ad5fce66dc&pid=1-s2.0-S2214579624000017-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-17DOI: 10.1016/j.bdr.2023.100414
Yueshan Chen , Xingyu Xu , Tian Lan , Sihai Zhang
Whether or not stocks are predictable has been a topic of concern for decades. The efficient market hypothesis (EMH) says that it is difficult for investors to make extra profits by predicting stock prices, but this may not be true, especially for the Chinese stock market. Therefore, we explore the predictability of the Chinese stock market based on tick data, a widely studied high-frequency data. We obtain the predictability of 3, 834 Chinese stocks by adopting the concept of true entropy, which is calculated by Limpel-Ziv data compression method. The Markov chain model and the diffusion kernel model are used to compare the upper bounds on predictability, and it is concluded that there is still a significant performance gap between the forecasting models used and the theoretical upper bounds. Our work shows that more than 73% of stocks have prediction accuracy greater than 70% and RMSE less than 2 CNY under different quantification intervals with different models. We further take Spearman's correlation to reveal that the average stock price and price volatility may have a negative impact on prediction accuracy, which may be helpful for stock investors.
{"title":"The Predictability of Stock Price: Empirical Study on Tick Data in Chinese Stock Market","authors":"Yueshan Chen , Xingyu Xu , Tian Lan , Sihai Zhang","doi":"10.1016/j.bdr.2023.100414","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100414","url":null,"abstract":"<div><p>Whether or not stocks are predictable has been a topic of concern for decades. The efficient market hypothesis (EMH) says that it is difficult for investors to make extra profits by predicting stock prices, but this may not be true, especially for the Chinese stock market. Therefore, we explore the predictability of the Chinese stock market based on tick data, a widely studied high-frequency data. We obtain the predictability of 3, 834 Chinese stocks by adopting the concept of true entropy, which is calculated by Limpel-Ziv data compression method. The Markov chain model and the diffusion kernel model are used to compare the upper bounds on predictability, and it is concluded that there is still a significant performance gap between the forecasting models used and the theoretical upper bounds. Our work shows that more than 73% of stocks have prediction accuracy greater than 70% and RMSE less than 2 CNY under different quantification intervals with different models. We further take Spearman's correlation to reveal that the average stock price and price volatility may have a negative impact on prediction accuracy, which may be helpful for stock investors.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000473/pdfft?md5=df49b0edd2f0330b446f4870f4a82ce5&pid=1-s2.0-S2214579623000473-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138413020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-11DOI: 10.1016/j.bdr.2023.100417
Lei Wang , Guangjun Liu , Ibrar Ahmad
The assessment of cold chain logistics for fresh products can be more precise with high-dimensional information data, providing valuable insights for the optimization of associated costs. Nonetheless, traditional data processing techniques fail to meet the processing efficiency required for such high-dimensional cold chain logistics data. Therefore, this paper proposes a spectral clustering algorithm based on the local standard deviation and optimized initial center, which comprehensively analyzes the fixed, transportation, refrigeration, and cargo damage costs of cold chain logistics. Additionally, this algorithm includes a variation operator based on clustering and introduces a large neighborhood search mechanism for optimizing the individual connectivity gene layer after selecting the gene layer site for variation. Simulation results demonstrate that the proposed algorithm exhibits better convergence in 15 iterations, reduces error rates, and significantly cuts down on the clustering process time. This ultimately leads to a reduction in the total cost of cold chain calculation.
{"title":"Cost optimization model design of fresh food cold chain system in the context of big data","authors":"Lei Wang , Guangjun Liu , Ibrar Ahmad","doi":"10.1016/j.bdr.2023.100417","DOIUrl":"10.1016/j.bdr.2023.100417","url":null,"abstract":"<div><p>The assessment of cold chain logistics for fresh products can be more precise with high-dimensional information data, providing valuable insights for the optimization of associated costs. Nonetheless, traditional data processing techniques fail to meet the processing efficiency required for such high-dimensional cold chain logistics data. Therefore, this paper proposes a spectral clustering algorithm based on the local standard deviation and optimized initial center, which comprehensively analyzes the fixed, transportation, refrigeration, and cargo damage costs of cold chain logistics. Additionally, this algorithm includes a variation operator based on clustering and introduces a large neighborhood search mechanism for optimizing the individual connectivity gene layer after selecting the gene layer site for variation. Simulation results demonstrate that the proposed algorithm exhibits better convergence in 15 iterations, reduces error rates, and significantly cuts down on the clustering process time. This ultimately leads to a reduction in the total cost of cold chain calculation.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000503/pdfft?md5=0db9cf3ef6ea7d1e1fd34d6a3e87e1ee&pid=1-s2.0-S2214579623000503-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135670379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The urgency to protect natural water resources in a sustainable manner has risen as water scarcity and global climate change continue to worsen. Among various methods of collecting water, stormwater harvesting (SWH) is regarded as the most environmentally friendly approach to alleviating the strain on freshwater resources. The study introduces a robust approach to evaluating the potential for SWH, considering both technical and socioeconomic aspects. This method effectively identifies and assesses suitable areas, referred to as hotspots, for implementing SWH. Multiple criteria are established to quickly evaluate and analyze the suitability of these sites for stormwater harvesting. Moreover, the input from water experts is incorporated into the decision-making process. Initially, potential locations are chosen, and hotspots are identified based on the concept of accumulated catchments. Subsequently, a more detailed analysis is carried out on the shortlisted sites, utilizing multiple screening criteria such as demand, inverse weighted distance, and the runoff-to-demand ratio. A standardized method is then employed to rank the sites and determine the most suitable one for stormwater harvesting. The study identifies eight locations that are appropriate for SWH, with two of them being particularly suitable locations. Further, the radius of influence is added to encompass these sites in order to pinpoint the areas conducive to fulfilling water requirements and availability. This approach empowers water planners to make well-informed decisions in a more streamlined manner. Consequently, the methodology emphasizes the benefits of these tools for water experts who are actively seeking sustainable solutions to mitigate the pressure on freshwater resources.
{"title":"A methodology to assess and evaluate sites with high potential for stormwater harvesting in Dehradun, India","authors":"Shray Pathak , Shreya Sharma , Abhishek Banerjee , Sanjeev Kumar","doi":"10.1016/j.bdr.2023.100415","DOIUrl":"10.1016/j.bdr.2023.100415","url":null,"abstract":"<div><p>The urgency to protect natural water resources in a sustainable manner has risen as water scarcity and global climate change continue to worsen. Among various methods of collecting water, stormwater harvesting (SWH) is regarded as the most environmentally friendly approach to alleviating the strain on freshwater resources. The study introduces a robust approach to evaluating the potential for SWH, considering both technical and socioeconomic aspects. This method effectively identifies and assesses suitable areas, referred to as hotspots, for implementing SWH. Multiple criteria are established to quickly evaluate and analyze the suitability of these sites for stormwater harvesting. Moreover, the input from water experts is incorporated into the decision-making process. Initially, potential locations are chosen, and hotspots are identified based on the concept of accumulated catchments. Subsequently, a more detailed analysis is carried out on the shortlisted sites, utilizing multiple screening criteria such as demand, inverse weighted distance, and the runoff-to-demand ratio. A standardized method is then employed to rank the sites and determine the most suitable one for stormwater harvesting. The study identifies eight locations that are appropriate for SWH, with two of them being particularly suitable locations. Further, the radius of influence is added to encompass these sites in order to pinpoint the areas conducive to fulfilling water requirements and availability. This approach empowers water planners to make well-informed decisions in a more streamlined manner. Consequently, the methodology emphasizes the benefits of these tools for water experts who are actively seeking sustainable solutions to mitigate the pressure on freshwater resources.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000485/pdfft?md5=1736971c2f1584138324cb67603cb69a&pid=1-s2.0-S2214579623000485-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135614493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-09DOI: 10.1016/j.bdr.2023.100416
Rana Waqar Aslam , Hong Shu , Kanwal Javid , Shazia Pervaiz , Farhan Mustafa , Danish Raza , Bilal Ahmed , Abdul Quddoos , Saad Al-Ahmadi , Wesam Atef Hatamleh
Wetlands are important in many ways, including hydrological cycles, ecosystem diversity, climate change, and economic activity. Despite the Ramsar Convention's awareness programmes, the importance of wetlands is frequently disregarded in underdeveloped countries. The Ramsar Convention recognises 2491 wetlands worldwide, 19 of which are in Pakistan. The goal of this study is to use satellite sensor technology to identify neglected wetlands in Pakistan. The key goals of this research are to analyse water quality, monitor ecological changes, and comprehend the impact of climate change on the aforementioned wetlands. We used approaches like supervised classification and TCW to identify wetlands. To detect climate-induced changes, a change detection index was used to Quick Bird imagery. TCG and the NDTI were also employed to examine the water quality and ecological changes in these wetlands. Sentinel-2 data between 2016 and 2019 were used in the analysis. Furthermore, watershed analysis was carried out using ASTER DEM data. Modis data was used to calculate the LST (°C) of the selected wetlands, while rainfall (mm) data was collected from ANN databases. According to the study's findings, in 2016, Borith, Phander, Upper Kachura, Satpara, and Rama Lake held 22.73%, 20.79%, 23.01%, 24.63%, and 23.03% water, respectively. In 2019, the water ratios for these lakes were 23.40%, 22.10%, 22.43%, 25.01%, and 24.56%. These findings emphasise the need of taking preventative actions to protect these wetlands in order to improve ecosystem dynamics in the future. As a result, it is critical that the relevant authorities implement the necessary conservation measures.
{"title":"Wetland identification through remote sensing: Insights into wetness, greenness, turbidity, temperature, and changing landscapes","authors":"Rana Waqar Aslam , Hong Shu , Kanwal Javid , Shazia Pervaiz , Farhan Mustafa , Danish Raza , Bilal Ahmed , Abdul Quddoos , Saad Al-Ahmadi , Wesam Atef Hatamleh","doi":"10.1016/j.bdr.2023.100416","DOIUrl":"10.1016/j.bdr.2023.100416","url":null,"abstract":"<div><p>Wetlands are important in many ways, including hydrological cycles, ecosystem diversity, climate change, and economic activity. Despite the Ramsar Convention's awareness programmes, the importance of wetlands is frequently disregarded in underdeveloped countries. The Ramsar Convention recognises 2491 wetlands worldwide, 19 of which are in Pakistan. The goal of this study is to use satellite sensor technology to identify neglected wetlands in Pakistan. The key goals of this research are to analyse water quality, monitor ecological changes, and comprehend the impact of climate change on the aforementioned wetlands. We used approaches like supervised classification and TCW to identify wetlands. To detect climate-induced changes, a change detection index was used to Quick Bird imagery. TCG and the NDTI were also employed to examine the water quality and ecological changes in these wetlands. Sentinel-2 data between 2016 and 2019 were used in the analysis. Furthermore, watershed analysis was carried out using ASTER DEM data. Modis data was used to calculate the LST (°C) of the selected wetlands, while rainfall (mm) data was collected from ANN databases. According to the study's findings, in 2016, Borith, Phander, Upper Kachura, Satpara, and Rama Lake held 22.73%, 20.79%, 23.01%, 24.63%, and 23.03% water, respectively. In 2019, the water ratios for these lakes were 23.40%, 22.10%, 22.43%, 25.01%, and 24.56%. These findings emphasise the need of taking preventative actions to protect these wetlands in order to improve ecosystem dynamics in the future. As a result, it is critical that the relevant authorities implement the necessary conservation measures.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579623000497/pdfft?md5=6c2fd850b51a67adc45a9dc630b4afe6&pid=1-s2.0-S2214579623000497-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135565832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.1016/j.bdr.2023.100413
Harshal Mittal, Jagarlamudi Sai Laxman, Dheeraj Kumar
Clustering tendency assessment, which aims to deduce if a dataset contains any cluster structure, and, if it does, how many clusters it has, is a critical problem in exploratory data analysis. The VAT family of algorithms provides a “visual” means to assess the clustering tendency for various datasets. The VAT algorithm operates by reordering the pairwise distance matrix of the input data. When viewed as a monochrome image, this reordered dissimilarity matrix is called a reordered dissimilarity image (RDI), showing possible data clusters by dark blocks along the diagonal. This process, however, requires human intervention to interpret an RDI. Moreover, for datasets having complex cluster structure or noise, dark blocks along the diagonal of the RDI are not easily distinguishable, making it difficult to count them accurately, and different individuals can report different numbers of dark blocks. Only a handful of approaches have been proposed in the literature to automatically (algorithmically) infer the cluster structure from a VAT-type RDI without requiring human input. However, these approaches do not perform well for several data types and have impractically high run-time. This paper proposes and develops ML-aVAT: a novel two-stage machine-learning-based approach for automatic clustering tendency assessment from VAT-type RDI. Besides estimating the number of clusters, ML-aVAT can also infer the clustering hierarchy, i.e., sub-clusters within each group, something none of the previously proposed algorithms could do. Numerical experiments performed on various synthetic and real-life labeled and unlabeled datasets prove the effectiveness of ML-aVAT in estimating clustering tendency and cluster hierarchy.
{"title":"ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment","authors":"Harshal Mittal, Jagarlamudi Sai Laxman, Dheeraj Kumar","doi":"10.1016/j.bdr.2023.100413","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100413","url":null,"abstract":"<div><p>Clustering tendency assessment, which aims to deduce if a dataset contains any cluster structure, and, if it does, how many clusters it has, is a critical problem in exploratory data analysis. The VAT family of algorithms provides a “visual” means to assess the clustering tendency for various datasets. The VAT algorithm operates by reordering the pairwise distance matrix of the input data. When viewed as a monochrome image, this reordered dissimilarity matrix is called a reordered dissimilarity image (RDI), showing possible data clusters by dark blocks along the diagonal. This process, however, requires human intervention to interpret an RDI. Moreover, for datasets having complex cluster structure or noise, dark blocks along the diagonal of the RDI are not easily distinguishable, making it difficult to count them accurately, and different individuals can report different numbers of dark blocks. Only a handful of approaches have been proposed in the literature to automatically (algorithmically) infer the cluster structure from a VAT-type RDI without requiring human input. However, these approaches do not perform well for several data types and have impractically high run-time. This paper proposes and develops ML-aVAT: a novel two-stage machine-learning-based approach for automatic clustering tendency assessment from VAT-type RDI. Besides estimating the number of clusters, ML-aVAT can also infer the clustering hierarchy, i.e., sub-clusters within each group, something none of the previously proposed algorithms could do. Numerical experiments performed on various synthetic and real-life labeled and unlabeled datasets prove the effectiveness of ML-aVAT in estimating clustering tendency and cluster hierarchy.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92043108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-17DOI: 10.1016/j.bdr.2023.100412
Mohammad Khalid Imam Rahmani , Hayder M.A. Ghanimi , Syeda Fizzah Jilani , Muhammad Aslam , Meshal Alharbi , Roobaea Alroobaea , Sudhakar Sengan
The most prevalent microbe-caused issues that reduce agricultural output globally are viral and bacterial infections. It is currently quite challenging to identify pathogens due to the current living situation. Biosensors have become the standard for monitoring microbial and viral macromolecules. Disease diagnosis is improved by following the nanoparticles released by infections. Since the sensors' data includes different learning patterns, Machine Learning (ML) methods are used to analyze and interpret it. This research paper aimed to study whether Near-infrared (nIR) and Red, Green, and Blue (RGB) imaging might be used to define and detect Plant Disease (PD) using Convolutional Neural Network (CNN)-based Feature Extraction (FE) and Feature Classification (FC). A home-built Single-Walled Carbon NanoTube (SWCNTs) implemented with a Deoxyribonucleic Acid (DNA) aptamer that binds to a Hemi (HeApt + DNA + SWCNT) sensing device was used to analyze near-infrared (nIR) and RGB images of tea plant leaf samples. Three labels are extracted from the nIR + RGB using a Wasserstein Distance (WD)-based Feature Extraction Model (FEM), and then all those labels are loaded into the proposed CNN model to ensure precise classification. The proposed Wasserstein Distance-to-Convolutional Neural Network (WD2CNN) model was compared to different CNN architectures on the same dataset, achieving the highest accuracy of 98.72%. It is also the most computationally efficient, with the shortest average time per epoch. The model demonstrates high performance and efficiency in classifying biosensor images, which could aid in the early detection and prevention of Crop Diseases (CD).
{"title":"Early Pathogen Prediction in Crops Using Nano Biosensors and Neural Network-Based Feature Extraction and Classification","authors":"Mohammad Khalid Imam Rahmani , Hayder M.A. Ghanimi , Syeda Fizzah Jilani , Muhammad Aslam , Meshal Alharbi , Roobaea Alroobaea , Sudhakar Sengan","doi":"10.1016/j.bdr.2023.100412","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100412","url":null,"abstract":"<div><p>The most prevalent microbe-caused issues that reduce agricultural output globally are viral and bacterial infections. It is currently quite challenging to identify pathogens due to the current living situation. Biosensors have become the standard for monitoring microbial and viral macromolecules. Disease diagnosis is improved by following the nanoparticles released by infections. Since the sensors' data includes different learning patterns, Machine Learning<span> (ML) methods are used to analyze and interpret it. This research paper aimed to study whether Near-infrared (nIR) and Red, Green, and Blue (RGB) imaging might be used to define and detect Plant Disease (PD) using Convolutional Neural Network (CNN)-based Feature Extraction (FE) and Feature Classification (FC). A home-built Single-Walled Carbon NanoTube (SWCNTs) implemented with a Deoxyribonucleic Acid (DNA) aptamer that binds to a Hemi (HeApt + DNA + SWCNT) sensing device was used to analyze near-infrared (nIR) and RGB images of tea plant leaf samples. Three labels are extracted from the nIR + RGB using a Wasserstein Distance (WD)-based Feature Extraction Model (FEM), and then all those labels are loaded into the proposed CNN model to ensure precise classification. The proposed Wasserstein Distance-to-Convolutional Neural Network (WD2CNN) model was compared to different CNN architectures on the same dataset, achieving the highest accuracy of 98.72%. It is also the most computationally efficient, with the shortest average time per epoch. The model demonstrates high performance and efficiency in classifying biosensor images, which could aid in the early detection and prevention of Crop Diseases (CD).</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2023-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49711619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}