The measurement of network topology through various spatial topological indices like Alpha, Beta and Gamma are widely used for spatial data analysis. However, explaining the classification of the network topology of a city based on Alpha, Beta and Gamma indices is not conclusive, as the result of individual indices are different. To address an efficient classification of network topology, a Modified Synthetic Indicator (MSI) has been proposed and criticised over existing synthetic indicators based on the Composite Weighted Connectivity Index (CWCI), the linear combination of Alpha, Beta and Gamma indices. Application of the proposed MSI in micro-level (ward level) classification of network topology i.e., road network connectivity, has been verified in Agartala City and calibrates the efficiency of CWCI over Alpha, Beta and Gamma indices. The study reveals that the proposed CWCI is more robust than any individual graph-theoretic measure.
{"title":"Spatial Data Analysis for Robust Classification of Network Topology Through Synthetic Combinatorics","authors":"Samrat Hore, Stabak Roy, Malabika Boruah, Saptarshi Mitra","doi":"10.1007/s40745-024-00523-6","DOIUrl":"10.1007/s40745-024-00523-6","url":null,"abstract":"<div><p>The measurement of network topology through various spatial topological indices like Alpha, Beta and Gamma are widely used for spatial data analysis. However, explaining the classification of the network topology of a city based on Alpha, Beta and Gamma indices is not conclusive, as the result of individual indices are different. To address an efficient classification of network topology, a Modified Synthetic Indicator (MSI) has been proposed and criticised over existing synthetic indicators based on the Composite Weighted Connectivity Index (CWCI), the linear combination of Alpha, Beta and Gamma indices. Application of the proposed MSI in micro-level (ward level) classification of network topology i.e., road network connectivity, has been verified in Agartala City and calibrates the efficiency of CWCI over Alpha, Beta and Gamma indices. The study reveals that the proposed CWCI is more robust than any individual graph-theoretic measure.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1341 - 1359"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Safer sexual practice is essential for improving women’s reproductive and sexual health outcomes. The goal of this study is to identify the contributing factors influencing safer sexual negotiations (SSN) through the application of machine learning algorithms. The algorithms include logistic regression (LR), random forest, Naïve Bayes, linear discriminant analysis, classification and regression trees, support vector machines (SVM), and K-nearest neighbors. This study utilized data from the 2017-18 Bangladesh Demographic and Health Survey, encompassing 19,457 married women within the ages of 15–49 years. The analysis reveals that the SVM algorithm achieved the highest classification accuracy (99.66%), along with high sensitivity (99.98%) and the lowest specificity. Conversely, the LR model produced the highest area under the curve statistics (0.6699), indicating good performance in distinguishing SSN among married women. The outcome illustrated that women’s autonomy, engagement with financial institutions, educational attainment, and their partner’s education play a significant role in SSN with their partners. The findings highlight the significance of empowering women, enhancing reproductive health awareness, and improving socio-economic conditions and education to encourage SSN. The government needs to consider all these risk factors to promote greater SSN for preventing sexually transmitted diseases among women in Bangladesh.
{"title":"Evaluating the Performance of Machine Learning Algorithm for Classification of Safer Sexual Negotiation among Married Women in Bangladesh","authors":"Md. Mizanur Rahman, Deluar J. Moloy, Mashfiqul Huq Chowdhury, Arzo Ahmed, Taksina Kabir","doi":"10.1007/s40745-024-00535-2","DOIUrl":"10.1007/s40745-024-00535-2","url":null,"abstract":"<div><p>Safer sexual practice is essential for improving women’s reproductive and sexual health outcomes. The goal of this study is to identify the contributing factors influencing safer sexual negotiations (SSN) through the application of machine learning algorithms. The algorithms include logistic regression (LR), random forest, Naïve Bayes, linear discriminant analysis, classification and regression trees, support vector machines (SVM), and K-nearest neighbors. This study utilized data from the 2017-18 Bangladesh Demographic and Health Survey, encompassing 19,457 married women within the ages of 15–49 years. The analysis reveals that the SVM algorithm achieved the highest classification accuracy (99.66%), along with high sensitivity (99.98%) and the lowest specificity. Conversely, the LR model produced the highest area under the curve statistics (0.6699), indicating good performance in distinguishing SSN among married women. The outcome illustrated that women’s autonomy, engagement with financial institutions, educational attainment, and their partner’s education play a significant role in SSN with their partners. The findings highlight the significance of empowering women, enhancing reproductive health awareness, and improving socio-economic conditions and education to encourage SSN. The government needs to consider all these risk factors to promote greater SSN for preventing sexually transmitted diseases among women in Bangladesh.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"721 - 737"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-11DOI: 10.1007/s40745-024-00531-6
Junjie Hou, Yuqi Zhang, Duo Su
The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.
{"title":"Unified Image Harmonization with Region Augmented Attention Normalization","authors":"Junjie Hou, Yuqi Zhang, Duo Su","doi":"10.1007/s40745-024-00531-6","DOIUrl":"10.1007/s40745-024-00531-6","url":null,"abstract":"<div><p>The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1865 - 1886"},"PeriodicalIF":0.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140989549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1007/s40745-024-00530-7
Zixuan Fan, Yan Xu
In the field of bioinformatics, changes in protein functionality are mainly influenced by protein mutations. Accurately predicting these functional changes can enhance our understanding of evolutionary mechanisms, promote developments in protein engineering-related fields, and accelerate progress in medical research. In this study, we introduced two different models: one based on bidirectional long short-term memory (BiLSTM), and the other based on self-attention. These models were integrated using a weighted fusion method to predict protein functional changes associated with mutation sites. The findings indicate that the model's predictive precision matches that of the current model, along with its capacity for generalization. Furthermore, the ensemble model surpasses the performance of the single models, highlighting the value of utilizing their synergistic capabilities. This finding may improve the accuracy of predicting protein functional changes associated with mutations and has potential applications in protein engineering and drug research. We evaluated the efficacy of our models under different scenarios by comparing the predicted results of protein functional changes across various numbers of mutation sites. As the number of mutation sites increases, the prediction accuracy decreases significantly, highlighting the inherent limitations of these models in handling cases involving more mutation sites.
{"title":"Predicting the Functional Changes in Protein Mutations Through the Application of BiLSTM and the Self-Attention Mechanism","authors":"Zixuan Fan, Yan Xu","doi":"10.1007/s40745-024-00530-7","DOIUrl":"10.1007/s40745-024-00530-7","url":null,"abstract":"<div><p>In the field of bioinformatics, changes in protein functionality are mainly influenced by protein mutations. Accurately predicting these functional changes can enhance our understanding of evolutionary mechanisms, promote developments in protein engineering-related fields, and accelerate progress in medical research. In this study, we introduced two different models: one based on bidirectional long short-term memory (BiLSTM), and the other based on self-attention. These models were integrated using a weighted fusion method to predict protein functional changes associated with mutation sites. The findings indicate that the model's predictive precision matches that of the current model, along with its capacity for generalization. Furthermore, the ensemble model surpasses the performance of the single models, highlighting the value of utilizing their synergistic capabilities. This finding may improve the accuracy of predicting protein functional changes associated with mutations and has potential applications in protein engineering and drug research. We evaluated the efficacy of our models under different scenarios by comparing the predicted results of protein functional changes across various numbers of mutation sites. As the number of mutation sites increases, the prediction accuracy decreases significantly, highlighting the inherent limitations of these models in handling cases involving more mutation sites.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"1077 - 1094"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140656386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1007/s40745-024-00528-1
Huimin Yao, Haiyan Wang
Accurately predicting students’ performance plays a crucial role in achieving the intellectualization of courses. This paper studied intelligent courses in English education based on neural networks and designed a firefly algorithm-back propagation neural network (FA-BPNN) method. The correlation between various features and final grades was calculated using the students’ online learning data. Features with higher correlation were selected as the input for the FA-BPNN algorithm to estimate the final score that students achieved in the “College English” course. It was found that the training time of the FA-BPNN algorithm was 3.42 s, the root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values of the FA-BPNN algorithm were 0.986, 0.622, and 0.205, respectively. They were lower than those of the BPNN, genetic algorithm (GA)-BPNN, and particle swarm optimization (PSO)-BPNN algorithms, as well as the adaptive neuro-fuzzy inference system approach. The results indicated the efficacy of the FA for optimizing the parameters of the BPNN algorithm. The comparison between the predicted results and actual values suggested that the average error of the FA-BPNN algorithm was only 0.5, which was the smallest. The experimental results demonstrate the reliability of the FA-BPNN algorithm for performance prediction and its practical application feasibility.
{"title":"Research on Intelligent Courses in English Education based on Neural Networks","authors":"Huimin Yao, Haiyan Wang","doi":"10.1007/s40745-024-00528-1","DOIUrl":"10.1007/s40745-024-00528-1","url":null,"abstract":"<div><p>Accurately predicting students’ performance plays a crucial role in achieving the intellectualization of courses. This paper studied intelligent courses in English education based on neural networks and designed a firefly algorithm-back propagation neural network (FA-BPNN) method. The correlation between various features and final grades was calculated using the students’ online learning data. Features with higher correlation were selected as the input for the FA-BPNN algorithm to estimate the final score that students achieved in the “College English” course. It was found that the training time of the FA-BPNN algorithm was 3.42 s, the root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values of the FA-BPNN algorithm were 0.986, 0.622, and 0.205, respectively. They were lower than those of the BPNN, genetic algorithm (GA)-BPNN, and particle swarm optimization (PSO)-BPNN algorithms, as well as the adaptive neuro-fuzzy inference system approach. The results indicated the efficacy of the FA for optimizing the parameters of the BPNN algorithm. The comparison between the predicted results and actual values suggested that the average error of the FA-BPNN algorithm was only 0.5, which was the smallest. The experimental results demonstrate the reliability of the FA-BPNN algorithm for performance prediction and its practical application feasibility.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"1095 - 1107"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140653938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1007/s40745-024-00527-2
Adebisi A. Ogunde, Subhankar Dutta, Ehab M. Almetawally
This article introduced a three-parameter extension of the Generalized Rayleigh distribution called half-logistic Generalized Rayleigh distribution, which has submodels the Generalized Rayleigh and Rayleigh distribution. The proposed model is quite flexible and adaptable to model any kind of life-time data. Its probability density function may sometimes be unimodal and its corresponding hazard rate may be of monotone or non-monotone shape. Standard statistical properties such as it ordinary and incomplete moments, quantile function, moment generating function, reliability function, stochastic ordering, order statistics, Renyi, and ({varvec{delta}})-entropy are obtained. The maximum likelihood method is used to obtain the estimates of the model parameters. Two practical examples of hydrological data sets are presented.
{"title":"Half Logistic Generalized Rayleigh Distribution for Modeling Hydrological Data","authors":"Adebisi A. Ogunde, Subhankar Dutta, Ehab M. Almetawally","doi":"10.1007/s40745-024-00527-2","DOIUrl":"10.1007/s40745-024-00527-2","url":null,"abstract":"<div><p>This article introduced a three-parameter extension of the Generalized Rayleigh distribution called half-logistic Generalized Rayleigh distribution, which has submodels the Generalized Rayleigh and Rayleigh distribution. The proposed model is quite flexible and adaptable to model any kind of life-time data. Its probability density function may sometimes be unimodal and its corresponding hazard rate may be of monotone or non-monotone shape. Standard statistical properties such as it ordinary and incomplete moments, quantile function, moment generating function, reliability function, stochastic ordering, order statistics, Renyi, and <span>({varvec{delta}})</span>-entropy are obtained. The maximum likelihood method is used to obtain the estimates of the model parameters. Two practical examples of hydrological data sets are presented.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"667 - 694"},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140686249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1007/s40745-024-00526-3
Mohammad Kafeel Wani, Peer Bilal Ahmad
Agriculture, engineering, public health, sociology, psychology, and epidemiology are just few of the numerous disciplines that find analysis and modeling of zero-truncated count data to be of paramount importance. Very recently, researchers have been paying careful attention to the one-inflation implications of these zero-truncated count statistics. In this regard, we have studied the one-inflated variant of the zero-truncated Poisson distribution. There are few models within the proposed distribution, which itself is a representation of a two-part process. We have calculated crucial statistical characteristics of the suggested model which are not confined to generating functions, moments and associated measures. The parametric estimation has been carried out using the maximum likelihood estimation. Two different simulation studies have been carried out, one to test the performance of maximum likelihood estimates and the other for testing the compatibility of our devised model when data has been simulated from different competing models with considerably higher mass at point one. For the purpose of testing the compatibility of our proposed model, we have used three real life data sets and considered theoretical as well as graphical performance measures. The fitting results have been compared with some other models of interest. Moreover, we have used three different test statistics viz. Likelihood ratio test, Wald’s test, and Rao’s efficient score test for the purpose of testing the significance of one-inflation parameter.
{"title":"One-Inflated Zero-Truncated Poisson Distribution: Statistical Properties and Real Life Applications","authors":"Mohammad Kafeel Wani, Peer Bilal Ahmad","doi":"10.1007/s40745-024-00526-3","DOIUrl":"10.1007/s40745-024-00526-3","url":null,"abstract":"<div><p>Agriculture, engineering, public health, sociology, psychology, and epidemiology are just few of the numerous disciplines that find analysis and modeling of zero-truncated count data to be of paramount importance. Very recently, researchers have been paying careful attention to the one-inflation implications of these zero-truncated count statistics. In this regard, we have studied the one-inflated variant of the zero-truncated Poisson distribution. There are few models within the proposed distribution, which itself is a representation of a two-part process. We have calculated crucial statistical characteristics of the suggested model which are not confined to generating functions, moments and associated measures. The parametric estimation has been carried out using the maximum likelihood estimation. Two different simulation studies have been carried out, one to test the performance of maximum likelihood estimates and the other for testing the compatibility of our devised model when data has been simulated from different competing models with considerably higher mass at point one. For the purpose of testing the compatibility of our proposed model, we have used three real life data sets and considered theoretical as well as graphical performance measures. The fitting results have been compared with some other models of interest. Moreover, we have used three different test statistics viz. Likelihood ratio test, Wald’s test, and Rao’s efficient score test for the purpose of testing the significance of one-inflation parameter.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"639 - 666"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140693209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1007/s40745-024-00525-4
Farhad Soleimanian Gharehchopogh
Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups. It is employed in various domains and applications, including biology, business, and consumer analysis, document clustering, web, banking, and image processing, to name a few. In this paper, two new methods are proposed using hybridization of the Bald Eagle Search (BES) Algorithm with the African Vultures Optimization Algorithm (AVOA) (BESAVOA) and BESAVOA with Opposition Based Learning (BESAVOA-OBL) for data clustering. AVOA is used to find the centers of the clusters and improve the centrality of the groups obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used BESAVOA to search the centers of the clusters. The proposed methods (BESAVOA and BESAVOA-OBL) are evaluated on 16 UCI datasets, based on the number of generations, number of iterations, execution time, and convergence. The results show that the BESAVOA-OBL fits better than the other algorithms. The results show that compared to other algorithms, BESAVOA-OBL is more effective by a ratio of 12.42 percent.
{"title":"An Improved Boosting Bald Eagle Search Algorithm with Improved African Vultures Optimization Algorithm for Data Clustering","authors":"Farhad Soleimanian Gharehchopogh","doi":"10.1007/s40745-024-00525-4","DOIUrl":"10.1007/s40745-024-00525-4","url":null,"abstract":"<div><p>Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups. It is employed in various domains and applications, including biology, business, and consumer analysis, document clustering, web, banking, and image processing, to name a few. In this paper, two new methods are proposed using hybridization of the Bald Eagle Search (BES) Algorithm with the African Vultures Optimization Algorithm (AVOA) (BESAVOA) and BESAVOA with Opposition Based Learning (BESAVOA-OBL) for data clustering. AVOA is used to find the centers of the clusters and improve the centrality of the groups obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used BESAVOA to search the centers of the clusters. The proposed methods (BESAVOA and BESAVOA-OBL) are evaluated on 16 UCI datasets, based on the number of generations, number of iterations, execution time, and convergence. The results show that the BESAVOA-OBL fits better than the other algorithms. The results show that compared to other algorithms, BESAVOA-OBL is more effective by a ratio of 12.42 percent.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"605 - 637"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140692580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose the exponential ratio-type estimator for the elevated estimation of population mean, implying one auxiliary variable in stratified random sampling using the conventional ratio and, Bahl and Tuteja exponential ratio-type estimators. The bias and the Mean Squared Error (MSE) of the proposed estimator are derived up to a first-order approximation and compared with existing estimators. Theoretically, we also compare MSE of the proposed estimator using the linear cost function with the competing estimators. The optimal values of the characterizing scalars are obtained and for these optimal values of characterizing scalars, the minimum MSE is obtained. We find theoretically that the proposed estimator is more efficient than other estimators under restricted conditions by formulating the proposed problem as an optimization problem under linear cost function. The numerical illustration is also included to verify theoretical findings for their practical utility. The estimator with least MSE is recommended for practical utility in different areas of applications of stratified random sampling.
{"title":"Optimal Strategy for Elevated Estimation of Population Mean in Stratified Random Sampling under Linear Cost Function","authors":"Subhash Kumar Yadav, Mukesh Kumar Verma, Rahul Varshney","doi":"10.1007/s40745-024-00520-9","DOIUrl":"10.1007/s40745-024-00520-9","url":null,"abstract":"<div><p>In this paper, we propose the exponential ratio-type estimator for the elevated estimation of population mean, implying one auxiliary variable in stratified random sampling using the conventional ratio and, Bahl and Tuteja exponential ratio-type estimators. The bias and the Mean Squared Error (MSE) of the proposed estimator are derived up to a first-order approximation and compared with existing estimators. Theoretically, we also compare MSE of the proposed estimator using the linear cost function with the competing estimators. The optimal values of the characterizing scalars are obtained and for these optimal values of characterizing scalars, the minimum MSE is obtained. We find theoretically that the proposed estimator is more efficient than other estimators under restricted conditions by formulating the proposed problem as an optimization problem under linear cost function. The numerical illustration is also included to verify theoretical findings for their practical utility. The estimator with least MSE is recommended for practical utility in different areas of applications of stratified random sampling.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"517 - 538"},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00520-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140364077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-20DOI: 10.1007/s40745-024-00521-8
Poonam Samir Jadhav, Gautam M. Borkar
In the era of big data, preserving data privacy has become paramount due to the sheer volume and sensitivity of the information being processed. This research is dedicated to safeguarding data privacy through a novel data sanitization approach centered on optimal key generation. Due to the size and complexity of the big data applications, managing big data with reduced risk and high privacyposes challenges. Many standard privacy-preserving mechanisms are introduced to maintain the volume and velocity of big data since it consists of massive and complex data. To solve this issue, this research developed a data sanitization technique for optimal key generation to preserve the privacy of the sensitive data. The sensitive data is initially identified by the quasi-identifiers and the identified sensitive data is preserved by generating an optimal key using the proposed marine predator whale optimization (MPWO) algorithm. The proposed algorithm is developed by the hybridization of the characteristics of foraging behaviors of the marine predators and the whales are hybridized to determine the optimal key. The optimal key generated using the MPWO algorithm effectively preserves the privacy of the data. The efficiency of the research is proved by measuring the metrics equivalent class size metric values of 0.03, 185.07, and 0.04 for attribute disclosure attack, identity disclosure attack, and identity disclosure attack. Similarly, the Discernibility metrics value is measured as 0.08, 123.38, 0.09 with attribute disclosure attack, identity disclosure attack, identity disclosure attack, and the Normalized certainty penalty is measured as 0.002, 61.69, 0.001 attribute disclosure attack, identity disclosure attack, identity disclosure attack.
{"title":"Optimal Key Generation for Privacy Preservation in Big Data Applications Based on the Marine Predator Whale Optimization Algorithm","authors":"Poonam Samir Jadhav, Gautam M. Borkar","doi":"10.1007/s40745-024-00521-8","DOIUrl":"10.1007/s40745-024-00521-8","url":null,"abstract":"<div><p>In the era of big data, preserving data privacy has become paramount due to the sheer volume and sensitivity of the information being processed. This research is dedicated to safeguarding data privacy through a novel data sanitization approach centered on optimal key generation. Due to the size and complexity of the big data applications, managing big data with reduced risk and high privacyposes challenges. Many standard privacy-preserving mechanisms are introduced to maintain the volume and velocity of big data since it consists of massive and complex data. To solve this issue, this research developed a data sanitization technique for optimal key generation to preserve the privacy of the sensitive data. The sensitive data is initially identified by the quasi-identifiers and the identified sensitive data is preserved by generating an optimal key using the proposed marine predator whale optimization (MPWO) algorithm. The proposed algorithm is developed by the hybridization of the characteristics of foraging behaviors of the marine predators and the whales are hybridized to determine the optimal key. The optimal key generated using the MPWO algorithm effectively preserves the privacy of the data. The efficiency of the research is proved by measuring the metrics equivalent class size metric values of 0.03, 185.07, and 0.04 for attribute disclosure attack, identity disclosure attack, and identity disclosure attack. Similarly, the Discernibility metrics value is measured as 0.08, 123.38, 0.09 with attribute disclosure attack, identity disclosure attack, identity disclosure attack, and the Normalized certainty penalty is measured as 0.002, 61.69, 0.001 attribute disclosure attack, identity disclosure attack, identity disclosure attack.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"539 - 569"},"PeriodicalIF":0.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140225219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}