Pub Date : 2024-07-16DOI: 10.1007/s40745-024-00562-z
A. Chakraborty, S. Rana, S. I. Maiti
{"title":"Transmuted Shifted Lindley Distribution: Characterizations, Classical and Bayesian Estimation with Applications","authors":"A. Chakraborty, S. Rana, S. I. Maiti","doi":"10.1007/s40745-024-00562-z","DOIUrl":"https://doi.org/10.1007/s40745-024-00562-z","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"2 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141641587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-13DOI: 10.1007/s40745-024-00555-y
Ozair Ahmad Wani, Umer Zahoor, Syed Zubair Ahmad Shah, Rijwan Khan
Automated detection of plant diseases is crucial as it simplifies the task of monitoring large farms and identifies diseases at their early stages to mitigate further plant degradation. Besides the decline in plant health, reduced production severely impacts the country’s economy. Traditional disease identification methods, relying on human experts, are slow, time-consuming, and impractical for large farms. Our proposed model utilizes a combination of pre-trained Resnet18, Alexnet, GoogLeNet, and VGG16 networks to classify apple tree leaves into categories such as healthy, black rot, apple cedar rust, and apple scab based on images. Various image enhancement techniques were employed to enhance the model’s accuracy. Ultimately, our model achieved an accuracy of 97.25% on the validation dataset, demonstrating excellent performance across various metrics. This suggests its potential for efficient and accurate plant health monitoring in the agricultural sector.
{"title":"Apple Leaf Disease Detection Using Transfer Learning","authors":"Ozair Ahmad Wani, Umer Zahoor, Syed Zubair Ahmad Shah, Rijwan Khan","doi":"10.1007/s40745-024-00555-y","DOIUrl":"10.1007/s40745-024-00555-y","url":null,"abstract":"<div><p>Automated detection of plant diseases is crucial as it simplifies the task of monitoring large farms and identifies diseases at their early stages to mitigate further plant degradation. Besides the decline in plant health, reduced production severely impacts the country’s economy. Traditional disease identification methods, relying on human experts, are slow, time-consuming, and impractical for large farms. Our proposed model utilizes a combination of pre-trained Resnet18, Alexnet, GoogLeNet, and VGG16 networks to classify apple tree leaves into categories such as healthy, black rot, apple cedar rust, and apple scab based on images. Various image enhancement techniques were employed to enhance the model’s accuracy. Ultimately, our model achieved an accuracy of 97.25% on the validation dataset, demonstrating excellent performance across various metrics. This suggests its potential for efficient and accurate plant health monitoring in the agricultural sector.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"213 - 222"},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-13DOI: 10.1007/s40745-024-00557-w
Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani
In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.
{"title":"A Review of Anonymization Algorithms and Methods in Big Data","authors":"Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani","doi":"10.1007/s40745-024-00557-w","DOIUrl":"10.1007/s40745-024-00557-w","url":null,"abstract":"<div><p>In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"253 - 279"},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141650932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-13DOI: 10.1007/s40745-024-00556-x
Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani
In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency.
{"title":"Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing","authors":"Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani","doi":"10.1007/s40745-024-00556-x","DOIUrl":"10.1007/s40745-024-00556-x","url":null,"abstract":"<div><p>In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"223 - 252"},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141651856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1007/s40745-024-00554-z
M. Meraou, M. Z. Raqab, Fatmah B. Almathkour
{"title":"Analyzing Insurance Data with an Alpha Power Transformed Exponential Poisson Model","authors":"M. Meraou, M. Z. Raqab, Fatmah B. Almathkour","doi":"10.1007/s40745-024-00554-z","DOIUrl":"https://doi.org/10.1007/s40745-024-00554-z","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"52 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141659955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s40745-024-00551-2
Muhammad Tahir, Sufyan Ali, Ayesha Sohail, Ying Zhang, Xiaohua Jin
Machine learning algorithms can improve the time series data analysis as compared to the traditional methods such as moving averages or auto-regressive approaches. This advancement has helped to unlock several challenging problems since machine learning not only helps to forecast the overall trend of the data, but it also helps to keep the historical track of changes in factors, influencing this trend. These predictions play a pivotal role in almost all areas of research where the observations are time dependent, such as problems ranging from challenges of finance to public health, environmental and climate change challenges. A key challenge of these domains is the higher number of attributes and predictors since managing and manipulating data from many attributes is itself a significant challenge for future forecasting. Addressing these challenges is possible with Recursive Long Short-Term Memory models. The application of such models is crucial, and their efficacy is further amplified when considering transfer learning. During this research, a detailed and comprehensive description of such models is addressed. Practical application is illustrated through an example, emphasizing that these models, when transferred to complex and large datasets using transfer learning, hold great promise.
{"title":"Unlocking Online Insights: LSTM Exploration and Transfer Learning Prospects","authors":"Muhammad Tahir, Sufyan Ali, Ayesha Sohail, Ying Zhang, Xiaohua Jin","doi":"10.1007/s40745-024-00551-2","DOIUrl":"10.1007/s40745-024-00551-2","url":null,"abstract":"<div><p>Machine learning algorithms can improve the time series data analysis as compared to the traditional methods such as moving averages or auto-regressive approaches. This advancement has helped to unlock several challenging problems since machine learning not only helps to forecast the overall trend of the data, but it also helps to keep the historical track of changes in factors, influencing this trend. These predictions play a pivotal role in almost all areas of research where the observations are time dependent, such as problems ranging from challenges of finance to public health, environmental and climate change challenges. A key challenge of these domains is the higher number of attributes and predictors since managing and manipulating data from many attributes is itself a significant challenge for future forecasting. Addressing these challenges is possible with Recursive Long Short-Term Memory models. The application of such models is crucial, and their efficacy is further amplified when considering transfer learning. During this research, a detailed and comprehensive description of such models is addressed. Practical application is illustrated through an example, emphasizing that these models, when transferred to complex and large datasets using transfer learning, hold great promise.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1421 - 1434"},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00551-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141667526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1007/s40745-024-00548-x
Alaor Cervati Neto, A. Levada, Michel Ferreira Cardia Haddad
{"title":"A New Kernel Density Estimation-Based Entropic Isometric Feature Mapping for Unsupervised Metric Learning","authors":"Alaor Cervati Neto, A. Levada, Michel Ferreira Cardia Haddad","doi":"10.1007/s40745-024-00548-x","DOIUrl":"https://doi.org/10.1007/s40745-024-00548-x","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":" 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141672260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1007/s40745-024-00536-1
Vahideh Ahrari, P. Hasanalipour
{"title":"Power Evaluation of Some Tests for Inverse Rayleigh Distribution","authors":"Vahideh Ahrari, P. Hasanalipour","doi":"10.1007/s40745-024-00536-1","DOIUrl":"https://doi.org/10.1007/s40745-024-00536-1","url":null,"abstract":"","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141675008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1007/s40745-024-00544-1
Jun Li, Chenyang Zhang, Wei Zhu, Yawei Ren
In recent years, generative artificial intelligence has been developing rapidly. In the image domain, image generation models based on deep learning have made remarkable achievements. Early frameworks for image generation models were dominated by generative adversarial networks (GANs) and variational autoencoders (VAEs). Nowadays, large-scale generative models based on diffusion models have become mainstream, and the quality of their generated images is significantly improved. We will review the research and development of image generation models and delve into the significant progress made in the field in recent years. Initially, we revisit the development of traditional image generation models like GANs and VAEs, emphasizing their contributions and challenges. We also introduce diffusion models, which have received much attention in the field of image generation due to their unique generative process and excellent generative performance. Subsequently, we emphasized the large vision models with SAM as the focal point. We also pay special attention to large-scale generative models like Stable Diffusion, which have demonstrated unprecedented capabilities in high-quality image generation tasks. Additionally, we explore target models and respective fine-tuning methods for domain-oriented image generation tasks, predicts future directions in image generation, and proposes potential research focuses and challenges.
{"title":"A Comprehensive Survey of Image Generation Models Based on Deep Learning","authors":"Jun Li, Chenyang Zhang, Wei Zhu, Yawei Ren","doi":"10.1007/s40745-024-00544-1","DOIUrl":"10.1007/s40745-024-00544-1","url":null,"abstract":"<div><p>In recent years, generative artificial intelligence has been developing rapidly. In the image domain, image generation models based on deep learning have made remarkable achievements. Early frameworks for image generation models were dominated by generative adversarial networks (GANs) and variational autoencoders (VAEs). Nowadays, large-scale generative models based on diffusion models have become mainstream, and the quality of their generated images is significantly improved. We will review the research and development of image generation models and delve into the significant progress made in the field in recent years. Initially, we revisit the development of traditional image generation models like GANs and VAEs, emphasizing their contributions and challenges. We also introduce diffusion models, which have received much attention in the field of image generation due to their unique generative process and excellent generative performance. Subsequently, we emphasized the large vision models with SAM as the focal point. We also pay special attention to large-scale generative models like Stable Diffusion, which have demonstrated unprecedented capabilities in high-quality image generation tasks. Additionally, we explore target models and respective fine-tuning methods for domain-oriented image generation tasks, predicts future directions in image generation, and proposes potential research focuses and challenges.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"141 - 170"},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}