Annals of Data Science最新文献_第8页

Spatial Data Analysis for Robust Classification of Network Topology Through Synthetic Combinatorics 通过合成组合学对网络拓扑结构进行稳健分类的空间数据分析

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-05-20 DOI: 10.1007/s40745-024-00523-6

Samrat Hore, Stabak Roy, Malabika Boruah, Saptarshi Mitra

The measurement of network topology through various spatial topological indices like Alpha, Beta and Gamma are widely used for spatial data analysis. However, explaining the classification of the network topology of a city based on Alpha, Beta and Gamma indices is not conclusive, as the result of individual indices are different. To address an efficient classification of network topology, a Modified Synthetic Indicator (MSI) has been proposed and criticised over existing synthetic indicators based on the Composite Weighted Connectivity Index (CWCI), the linear combination of Alpha, Beta and Gamma indices. Application of the proposed MSI in micro-level (ward level) classification of network topology i.e., road network connectivity, has been verified in Agartala City and calibrates the efficiency of CWCI over Alpha, Beta and Gamma indices. The study reveals that the proposed CWCI is more robust than any individual graph-theoretic measure.

通过 Alpha、Beta 和 Gamma 等各种空间拓扑指数来测量网络拓扑结构被广泛用于空间数据分析。然而，基于 Alpha、Beta 和 Gamma 指数对城市网络拓扑进行分类的解释并不可靠，因为各个指数的结果各不相同。为了有效地对网络拓扑结构进行分类，提出了一种修正的合成指标（MSI），并对现有的基于综合加权连接指数（CWCI）（Alpha、Beta 和 Gamma 指数的线性组合）的合成指标进行了批评。已在阿加尔塔拉市验证了所提出的 MSI 在网络拓扑（即路网连通性）微观层面（选区层面）分类中的应用，并校准了 CWCI 相对于 Alpha、Beta 和 Gamma 指数的效率。研究结果表明，建议的 CWCI 比任何单独的图论测量方法都更加稳健。

引用次数: 0

Evaluating the Performance of Machine Learning Algorithm for Classification of Safer Sexual Negotiation among Married Women in Bangladesh 评估孟加拉已婚妇女安全性谈判分类机器学习算法的性能

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-05-20 DOI: 10.1007/s40745-024-00535-2

Md. Mizanur Rahman, Deluar J. Moloy, Mashfiqul Huq Chowdhury, Arzo Ahmed, Taksina Kabir

Safer sexual practice is essential for improving women’s reproductive and sexual health outcomes. The goal of this study is to identify the contributing factors influencing safer sexual negotiations (SSN) through the application of machine learning algorithms. The algorithms include logistic regression (LR), random forest, Naïve Bayes, linear discriminant analysis, classification and regression trees, support vector machines (SVM), and K-nearest neighbors. This study utilized data from the 2017-18 Bangladesh Demographic and Health Survey, encompassing 19,457 married women within the ages of 15–49 years. The analysis reveals that the SVM algorithm achieved the highest classification accuracy (99.66%), along with high sensitivity (99.98%) and the lowest specificity. Conversely, the LR model produced the highest area under the curve statistics (0.6699), indicating good performance in distinguishing SSN among married women. The outcome illustrated that women’s autonomy, engagement with financial institutions, educational attainment, and their partner’s education play a significant role in SSN with their partners. The findings highlight the significance of empowering women, enhancing reproductive health awareness, and improving socio-economic conditions and education to encourage SSN. The government needs to consider all these risk factors to promote greater SSN for preventing sexually transmitted diseases among women in Bangladesh.

安全性行为对于改善妇女的生殖健康和性健康结果至关重要。本研究的目的是通过应用机器学习算法来确定影响安全性谈判（SSN）的因素。这些算法包括逻辑回归（LR）、随机森林、Naïve贝叶斯、线性判别分析、分类和回归树、支持向量机（SVM）和k近邻。这项研究利用了2017-18年孟加拉国人口与健康调查的数据，其中包括19,457名年龄在15-49岁之间的已婚妇女。分析表明，SVM算法的分类准确率最高（99.66%），灵敏度最高（99.98%），特异性最低。相反，LR模型在曲线统计下的面积最高（0.6699），表明在区分已婚妇女社会安全系数方面表现良好。结果表明，女性的自主性、与金融机构的接触、教育程度和伴侣的教育程度在与伴侣的社会安全保障中起着重要作用。调查结果强调了赋予妇女权力、提高对生殖健康的认识以及改善社会经济条件和教育以鼓励社会保障生育的重要性。政府需要考虑所有这些风险因素，以促进孟加拉国妇女预防性传播疾病的社会安全保障。

{"title":"Evaluating the Performance of Machine Learning Algorithm for Classification of Safer Sexual Negotiation among Married Women in Bangladesh","authors":"Md. Mizanur Rahman, Deluar J. Moloy, Mashfiqul Huq Chowdhury, Arzo Ahmed, Taksina Kabir","doi":"10.1007/s40745-024-00535-2","DOIUrl":"10.1007/s40745-024-00535-2","url":null,"abstract":"<div><p>Safer sexual practice is essential for improving women’s reproductive and sexual health outcomes. The goal of this study is to identify the contributing factors influencing safer sexual negotiations (SSN) through the application of machine learning algorithms. The algorithms include logistic regression (LR), random forest, Naïve Bayes, linear discriminant analysis, classification and regression trees, support vector machines (SVM), and K-nearest neighbors. This study utilized data from the 2017-18 Bangladesh Demographic and Health Survey, encompassing 19,457 married women within the ages of 15–49 years. The analysis reveals that the SVM algorithm achieved the highest classification accuracy (99.66%), along with high sensitivity (99.98%) and the lowest specificity. Conversely, the LR model produced the highest area under the curve statistics (0.6699), indicating good performance in distinguishing SSN among married women. The outcome illustrated that women’s autonomy, engagement with financial institutions, educational attainment, and their partner’s education play a significant role in SSN with their partners. The findings highlight the significance of empowering women, enhancing reproductive health awareness, and improving socio-economic conditions and education to encourage SSN. The government needs to consider all these risk factors to promote greater SSN for preventing sexually transmitted diseases among women in Bangladesh.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"721 - 737"},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unified Image Harmonization with Region Augmented Attention Normalization 利用区域增强注意力归一化统一图像协调

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-05-11 DOI: 10.1007/s40745-024-00531-6

Junjie Hou, Yuqi Zhang, Duo Su

The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.

图像协调任务致力于在图像合成过程中调整前景信息，通过利用背景信息实现视觉一致性。在学术研究中，这项任务通常使用简单的合成图像和匹配掩码作为输入。然而，在实际应用中，为图像协调获取精确的遮罩是一项巨大的挑战，从而造成了研究成果与实际应用之间的明显差距。为了缩小这种差距，我们建议将图像协调任务重新定义为 "统一图像协调"，即输入只包括一张图像，从而提高其在现实世界中的适用性。为了应对这一挑战，我们开发了一个新颖的框架。在这一框架内，我们首先利用不和谐区域定位来检测掩码，然后利用掩码进行协调任务。协调过程的关键在于归一化，它负责信息传递。然而，目前从背景到前景的信息传输和引导机制受到单层引导的限制，从而制约了其有效性。为了克服这一局限性，我们引入了区域增强注意归一化（RA2N），它增强了前景特征配准的注意机制，从而提高了配准和传输能力。通过在 iHarmony4 数据集上进行定性和定量比较，我们的模型不仅在统一图像协调方面，而且在传统图像协调任务中都表现出了卓越的性能。

{"title":"Unified Image Harmonization with Region Augmented Attention Normalization","authors":"Junjie Hou, Yuqi Zhang, Duo Su","doi":"10.1007/s40745-024-00531-6","DOIUrl":"10.1007/s40745-024-00531-6","url":null,"abstract":"<div><p>The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1865 - 1886"},"PeriodicalIF":0.0,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140989549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting the Functional Changes in Protein Mutations Through the Application of BiLSTM and the Self-Attention Mechanism 通过应用 BiLSTM 和自注意机制预测蛋白质突变的功能变化

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-04-25 DOI: 10.1007/s40745-024-00530-7

Zixuan Fan, Yan Xu

In the field of bioinformatics, changes in protein functionality are mainly influenced by protein mutations. Accurately predicting these functional changes can enhance our understanding of evolutionary mechanisms, promote developments in protein engineering-related fields, and accelerate progress in medical research. In this study, we introduced two different models: one based on bidirectional long short-term memory (BiLSTM), and the other based on self-attention. These models were integrated using a weighted fusion method to predict protein functional changes associated with mutation sites. The findings indicate that the model's predictive precision matches that of the current model, along with its capacity for generalization. Furthermore, the ensemble model surpasses the performance of the single models, highlighting the value of utilizing their synergistic capabilities. This finding may improve the accuracy of predicting protein functional changes associated with mutations and has potential applications in protein engineering and drug research. We evaluated the efficacy of our models under different scenarios by comparing the predicted results of protein functional changes across various numbers of mutation sites. As the number of mutation sites increases, the prediction accuracy decreases significantly, highlighting the inherent limitations of these models in handling cases involving more mutation sites.

在生物信息学领域，蛋白质功能的变化主要受蛋白质突变的影响。准确预测这些功能变化可以加深我们对进化机制的理解，促进蛋白质工程相关领域的发展，并加快医学研究的进展。在这项研究中，我们引入了两种不同的模型：一种是基于双向长短期记忆（BiLSTM）的模型，另一种是基于自我注意的模型。使用加权融合法将这些模型整合在一起，预测与突变位点相关的蛋白质功能变化。研究结果表明，该模型的预测精度与当前模型相匹配，同时还具有泛化能力。此外，组合模型的性能还超过了单一模型，突出了利用其协同能力的价值。这一发现可能会提高预测与突变相关的蛋白质功能变化的准确性，并有可能应用于蛋白质工程和药物研究。我们通过比较不同突变位点数量下蛋白质功能变化的预测结果，评估了我们的模型在不同情况下的功效。随着突变位点数量的增加，预测准确率明显下降，这凸显出这些模型在处理涉及更多突变位点的情况时存在固有的局限性。

{"title":"Predicting the Functional Changes in Protein Mutations Through the Application of BiLSTM and the Self-Attention Mechanism","authors":"Zixuan Fan, Yan Xu","doi":"10.1007/s40745-024-00530-7","DOIUrl":"10.1007/s40745-024-00530-7","url":null,"abstract":"<div><p>In the field of bioinformatics, changes in protein functionality are mainly influenced by protein mutations. Accurately predicting these functional changes can enhance our understanding of evolutionary mechanisms, promote developments in protein engineering-related fields, and accelerate progress in medical research. In this study, we introduced two different models: one based on bidirectional long short-term memory (BiLSTM), and the other based on self-attention. These models were integrated using a weighted fusion method to predict protein functional changes associated with mutation sites. The findings indicate that the model's predictive precision matches that of the current model, along with its capacity for generalization. Furthermore, the ensemble model surpasses the performance of the single models, highlighting the value of utilizing their synergistic capabilities. This finding may improve the accuracy of predicting protein functional changes associated with mutations and has potential applications in protein engineering and drug research. We evaluated the efficacy of our models under different scenarios by comparing the predicted results of protein functional changes across various numbers of mutation sites. As the number of mutation sites increases, the prediction accuracy decreases significantly, highlighting the inherent limitations of these models in handling cases involving more mutation sites.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"1077 - 1094"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140656386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on Intelligent Courses in English Education based on Neural Networks 基于神经网络的英语教育智能课程研究

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-04-25 DOI: 10.1007/s40745-024-00528-1

Huimin Yao, Haiyan Wang

Accurately predicting students’ performance plays a crucial role in achieving the intellectualization of courses. This paper studied intelligent courses in English education based on neural networks and designed a firefly algorithm-back propagation neural network (FA-BPNN) method. The correlation between various features and final grades was calculated using the students’ online learning data. Features with higher correlation were selected as the input for the FA-BPNN algorithm to estimate the final score that students achieved in the “College English” course. It was found that the training time of the FA-BPNN algorithm was 3.42 s, the root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values of the FA-BPNN algorithm were 0.986, 0.622, and 0.205, respectively. They were lower than those of the BPNN, genetic algorithm (GA)-BPNN, and particle swarm optimization (PSO)-BPNN algorithms, as well as the adaptive neuro-fuzzy inference system approach. The results indicated the efficacy of the FA for optimizing the parameters of the BPNN algorithm. The comparison between the predicted results and actual values suggested that the average error of the FA-BPNN algorithm was only 0.5, which was the smallest. The experimental results demonstrate the reliability of the FA-BPNN algorithm for performance prediction and its practical application feasibility.

准确预测学生成绩对实现课程智能化起着至关重要的作用。本文研究了基于神经网络的英语教育智能课程，设计了一种萤火虫算法-反向传播神经网络（FA-BPNN）方法。利用学生的在线学习数据计算了各种特征与最终成绩之间的相关性。选择相关性较高的特征作为 FA-BPNN 算法的输入，以估计学生在 "大学英语 "课程中取得的最终成绩。结果发现，FA-BPNN 算法的训练时间为 3.42 s，FA-BPNN 算法的均方根误差（RMSE）、平均绝对误差（MAE）和平均绝对百分比误差（MAPE）值分别为 0.986、0.622 和 0.205。它们分别低于 BPNN、遗传算法（GA）-BPNN 和粒子群优化（PSO）-BPNN 算法以及自适应神经模糊推理系统方法。结果表明，FA 在优化 BPNN 算法参数方面效果显著。预测结果与实际值的比较表明，FA-BPNN 算法的平均误差仅为 0.5，是最小的。实验结果证明了 FA-BPNN 算法在性能预测方面的可靠性和实际应用的可行性。

{"title":"Research on Intelligent Courses in English Education based on Neural Networks","authors":"Huimin Yao, Haiyan Wang","doi":"10.1007/s40745-024-00528-1","DOIUrl":"10.1007/s40745-024-00528-1","url":null,"abstract":"<div><p>Accurately predicting students’ performance plays a crucial role in achieving the intellectualization of courses. This paper studied intelligent courses in English education based on neural networks and designed a firefly algorithm-back propagation neural network (FA-BPNN) method. The correlation between various features and final grades was calculated using the students’ online learning data. Features with higher correlation were selected as the input for the FA-BPNN algorithm to estimate the final score that students achieved in the “College English” course. It was found that the training time of the FA-BPNN algorithm was 3.42 s, the root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values of the FA-BPNN algorithm were 0.986, 0.622, and 0.205, respectively. They were lower than those of the BPNN, genetic algorithm (GA)-BPNN, and particle swarm optimization (PSO)-BPNN algorithms, as well as the adaptive neuro-fuzzy inference system approach. The results indicated the efficacy of the FA for optimizing the parameters of the BPNN algorithm. The comparison between the predicted results and actual values suggested that the average error of the FA-BPNN algorithm was only 0.5, which was the smallest. The experimental results demonstrate the reliability of the FA-BPNN algorithm for performance prediction and its practical application feasibility.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"1095 - 1107"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140653938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Half Logistic Generalized Rayleigh Distribution for Modeling Hydrological Data 用于水文数据建模的半对数广义瑞利分布

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-04-18 DOI: 10.1007/s40745-024-00527-2

Adebisi A. Ogunde, Subhankar Dutta, Ehab M. Almetawally

This article introduced a three-parameter extension of the Generalized Rayleigh distribution called half-logistic Generalized Rayleigh distribution, which has submodels the Generalized Rayleigh and Rayleigh distribution. The proposed model is quite flexible and adaptable to model any kind of life-time data. Its probability density function may sometimes be unimodal and its corresponding hazard rate may be of monotone or non-monotone shape. Standard statistical properties such as it ordinary and incomplete moments, quantile function, moment generating function, reliability function, stochastic ordering, order statistics, Renyi, and ({varvec{delta}})-entropy are obtained. The maximum likelihood method is used to obtain the estimates of the model parameters. Two practical examples of hydrological data sets are presented.

本文介绍了广义瑞利分布的一个三参数扩展，即半逻辑广义瑞利分布，它有广义瑞利分布和瑞利分布两个子模型。所提出的模型非常灵活，可适应于建模任何类型的生命周期数据。它的概率密度函数有时可能是单峰的，其相应的危险率可能是单调或非单调形状。得到了普通矩和不完全矩、分位数函数、矩生成函数、可靠度函数、随机排序、有序统计、任义、({varvec{delta}}) -熵等标准统计性质。采用极大似然法对模型参数进行估计。给出了两个水文数据集的实例。

引用次数: 0

One-Inflated Zero-Truncated Poisson Distribution: Statistical Properties and Real Life Applications 单充气零截断泊松分布：统计特性与现实应用

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-04-17 DOI: 10.1007/s40745-024-00526-3

Mohammad Kafeel Wani, Peer Bilal Ahmad

Agriculture, engineering, public health, sociology, psychology, and epidemiology are just few of the numerous disciplines that find analysis and modeling of zero-truncated count data to be of paramount importance. Very recently, researchers have been paying careful attention to the one-inflation implications of these zero-truncated count statistics. In this regard, we have studied the one-inflated variant of the zero-truncated Poisson distribution. There are few models within the proposed distribution, which itself is a representation of a two-part process. We have calculated crucial statistical characteristics of the suggested model which are not confined to generating functions, moments and associated measures. The parametric estimation has been carried out using the maximum likelihood estimation. Two different simulation studies have been carried out, one to test the performance of maximum likelihood estimates and the other for testing the compatibility of our devised model when data has been simulated from different competing models with considerably higher mass at point one. For the purpose of testing the compatibility of our proposed model, we have used three real life data sets and considered theoretical as well as graphical performance measures. The fitting results have been compared with some other models of interest. Moreover, we have used three different test statistics viz. Likelihood ratio test, Wald’s test, and Rao’s efficient score test for the purpose of testing the significance of one-inflation parameter.

农业、工程、公共卫生、社会学、心理学和流行病学只是发现零截断计数数据的分析和建模至关重要的众多学科中的一小部分。最近，研究人员一直在仔细关注这些零截断计数统计数据的一种通货膨胀含义。在这方面，我们研究了零截断泊松分布的一膨胀型。在建议的分布中有几个模型，它本身是一个由两部分组成的过程的表示。我们计算了所建议模型的关键统计特征，这些特征不局限于生成函数、矩和相关度量。利用极大似然估计进行了参数估计。我们进行了两种不同的模拟研究，一种是测试最大似然估计的性能，另一种是测试我们设计的模型的兼容性，当数据从不同的竞争模型中模拟时，在第一点具有相当高的质量。为了测试我们提出的模型的兼容性，我们使用了三个实际数据集，并考虑了理论和图形性能度量。将拟合结果与其他一些感兴趣的模型进行了比较。此外，为了检验单通货膨胀参数的显著性，我们使用了三种不同的检验统计量，即似然比检验、Wald检验和Rao有效分数检验。

{"title":"One-Inflated Zero-Truncated Poisson Distribution: Statistical Properties and Real Life Applications","authors":"Mohammad Kafeel Wani, Peer Bilal Ahmad","doi":"10.1007/s40745-024-00526-3","DOIUrl":"10.1007/s40745-024-00526-3","url":null,"abstract":"<div><p>Agriculture, engineering, public health, sociology, psychology, and epidemiology are just few of the numerous disciplines that find analysis and modeling of zero-truncated count data to be of paramount importance. Very recently, researchers have been paying careful attention to the one-inflation implications of these zero-truncated count statistics. In this regard, we have studied the one-inflated variant of the zero-truncated Poisson distribution. There are few models within the proposed distribution, which itself is a representation of a two-part process. We have calculated crucial statistical characteristics of the suggested model which are not confined to generating functions, moments and associated measures. The parametric estimation has been carried out using the maximum likelihood estimation. Two different simulation studies have been carried out, one to test the performance of maximum likelihood estimates and the other for testing the compatibility of our devised model when data has been simulated from different competing models with considerably higher mass at point one. For the purpose of testing the compatibility of our proposed model, we have used three real life data sets and considered theoretical as well as graphical performance measures. The fitting results have been compared with some other models of interest. Moreover, we have used three different test statistics viz. Likelihood ratio test, Wald’s test, and Rao’s efficient score test for the purpose of testing the significance of one-inflation parameter.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"639 - 666"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140693209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Improved Boosting Bald Eagle Search Algorithm with Improved African Vultures Optimization Algorithm for Data Clustering 用于数据聚类的改进型秃鹰搜索算法与改进型非洲秃鹰优化算法

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-04-17 DOI: 10.1007/s40745-024-00525-4

Farhad Soleimanian Gharehchopogh

Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups. It is employed in various domains and applications, including biology, business, and consumer analysis, document clustering, web, banking, and image processing, to name a few. In this paper, two new methods are proposed using hybridization of the Bald Eagle Search (BES) Algorithm with the African Vultures Optimization Algorithm (AVOA) (BESAVOA) and BESAVOA with Opposition Based Learning (BESAVOA-OBL) for data clustering. AVOA is used to find the centers of the clusters and improve the centrality of the groups obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used BESAVOA to search the centers of the clusters. The proposed methods (BESAVOA and BESAVOA-OBL) are evaluated on 16 UCI datasets, based on the number of generations, number of iterations, execution time, and convergence. The results show that the BESAVOA-OBL fits better than the other algorithms. The results show that compared to other algorithms, BESAVOA-OBL is more effective by a ratio of 12.42 percent.

数据聚类是优化问题中的主要问题之一。它是将一组项目聚类成几个组的过程。每组中的项目与其他组中的事物具有最大的相似性和最小的相似性。它被用于各种领域和应用，包括生物学、商业、消费者分析、文档聚类、web、银行和图像处理等等。本文提出了将白头鹰搜索（BES）算法与非洲秃鹫优化算法（BESAVOA）和BESAVOA算法与基于反对的学习（BESAVOA- obl）相结合的数据聚类方法。利用AVOA来寻找聚类的中心，提高BES算法得到的聚类的中心性。根据鹰的数量创建主向量，然后使用BESAVOA来搜索集群的中心。基于生成次数、迭代次数、执行时间和收敛性，在16个UCI数据集上对所提出的方法（BESAVOA和BESAVOA- obl）进行了评估。结果表明，BESAVOA-OBL算法的拟合效果优于其他算法。结果表明，与其他算法相比，BESAVOA-OBL算法的效率提高了12.42%。

{"title":"An Improved Boosting Bald Eagle Search Algorithm with Improved African Vultures Optimization Algorithm for Data Clustering","authors":"Farhad Soleimanian Gharehchopogh","doi":"10.1007/s40745-024-00525-4","DOIUrl":"10.1007/s40745-024-00525-4","url":null,"abstract":"<div><p>Data clustering is one of the main issues in the optimization problem. It is the process of clustering a group of items into several groups. Items within each group have the greatest similarity and the least similarity to things in other groups. It is employed in various domains and applications, including biology, business, and consumer analysis, document clustering, web, banking, and image processing, to name a few. In this paper, two new methods are proposed using hybridization of the Bald Eagle Search (BES) Algorithm with the African Vultures Optimization Algorithm (AVOA) (BESAVOA) and BESAVOA with Opposition Based Learning (BESAVOA-OBL) for data clustering. AVOA is used to find the centers of the clusters and improve the centrality of the groups obtained by the BES algorithm. Primary vectors are created based on the population of eagles, and then each vector is used BESAVOA to search the centers of the clusters. The proposed methods (BESAVOA and BESAVOA-OBL) are evaluated on 16 UCI datasets, based on the number of generations, number of iterations, execution time, and convergence. The results show that the BESAVOA-OBL fits better than the other algorithms. The results show that compared to other algorithms, BESAVOA-OBL is more effective by a ratio of 12.42 percent.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"605 - 637"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140692580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Strategy for Elevated Estimation of Population Mean in Stratified Random Sampling under Linear Cost Function 线性成本函数下分层随机抽样中人口平均值提升估计的最优策略

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-03-30 DOI: 10.1007/s40745-024-00520-9

Subhash Kumar Yadav, Mukesh Kumar Verma, Rahul Varshney

In this paper, we propose the exponential ratio-type estimator for the elevated estimation of population mean, implying one auxiliary variable in stratified random sampling using the conventional ratio and, Bahl and Tuteja exponential ratio-type estimators. The bias and the Mean Squared Error (MSE) of the proposed estimator are derived up to a first-order approximation and compared with existing estimators. Theoretically, we also compare MSE of the proposed estimator using the linear cost function with the competing estimators. The optimal values of the characterizing scalars are obtained and for these optimal values of characterizing scalars, the minimum MSE is obtained. We find theoretically that the proposed estimator is more efficient than other estimators under restricted conditions by formulating the proposed problem as an optimization problem under linear cost function. The numerical illustration is also included to verify theoretical findings for their practical utility. The estimator with least MSE is recommended for practical utility in different areas of applications of stratified random sampling.

本文利用传统的比率和Bahl和Tuteja指数比率估计量，对分层随机抽样中隐含一个辅助变量的总体均值的提高估计提出了指数比率估计量。提出的估计器的偏差和均方误差（MSE）被导出到一阶近似，并与现有估计器进行了比较。从理论上讲，我们还比较了使用线性成本函数的估计器与竞争估计器的MSE。得到了表征标量的最优值，并对这些最优值求出了最小均方差。通过将所提问题表述为线性代价函数下的优化问题，从理论上发现所提估计量在受限条件下比其他估计量更有效。数值说明也包括验证理论结果为实际应用。在分层随机抽样的不同应用领域中，推荐具有最小均方差的估计量。

{"title":"Optimal Strategy for Elevated Estimation of Population Mean in Stratified Random Sampling under Linear Cost Function","authors":"Subhash Kumar Yadav, Mukesh Kumar Verma, Rahul Varshney","doi":"10.1007/s40745-024-00520-9","DOIUrl":"10.1007/s40745-024-00520-9","url":null,"abstract":"<div><p>In this paper, we propose the exponential ratio-type estimator for the elevated estimation of population mean, implying one auxiliary variable in stratified random sampling using the conventional ratio and, Bahl and Tuteja exponential ratio-type estimators. The bias and the Mean Squared Error (MSE) of the proposed estimator are derived up to a first-order approximation and compared with existing estimators. Theoretically, we also compare MSE of the proposed estimator using the linear cost function with the competing estimators. The optimal values of the characterizing scalars are obtained and for these optimal values of characterizing scalars, the minimum MSE is obtained. We find theoretically that the proposed estimator is more efficient than other estimators under restricted conditions by formulating the proposed problem as an optimization problem under linear cost function. The numerical illustration is also included to verify theoretical findings for their practical utility. The estimator with least MSE is recommended for practical utility in different areas of applications of stratified random sampling.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"517 - 538"},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00520-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140364077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Key Generation for Privacy Preservation in Big Data Applications Based on the Marine Predator Whale Optimization Algorithm 基于海洋掠食者鲸鱼优化算法的大数据应用隐私保护最佳密钥生成方法

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2024-03-20 DOI: 10.1007/s40745-024-00521-8

Poonam Samir Jadhav, Gautam M. Borkar

In the era of big data, preserving data privacy has become paramount due to the sheer volume and sensitivity of the information being processed. This research is dedicated to safeguarding data privacy through a novel data sanitization approach centered on optimal key generation. Due to the size and complexity of the big data applications, managing big data with reduced risk and high privacyposes challenges. Many standard privacy-preserving mechanisms are introduced to maintain the volume and velocity of big data since it consists of massive and complex data. To solve this issue, this research developed a data sanitization technique for optimal key generation to preserve the privacy of the sensitive data. The sensitive data is initially identified by the quasi-identifiers and the identified sensitive data is preserved by generating an optimal key using the proposed marine predator whale optimization (MPWO) algorithm. The proposed algorithm is developed by the hybridization of the characteristics of foraging behaviors of the marine predators and the whales are hybridized to determine the optimal key. The optimal key generated using the MPWO algorithm effectively preserves the privacy of the data. The efficiency of the research is proved by measuring the metrics equivalent class size metric values of 0.03, 185.07, and 0.04 for attribute disclosure attack, identity disclosure attack, and identity disclosure attack. Similarly, the Discernibility metrics value is measured as 0.08, 123.38, 0.09 with attribute disclosure attack, identity disclosure attack, identity disclosure attack, and the Normalized certainty penalty is measured as 0.002, 61.69, 0.001 attribute disclosure attack, identity disclosure attack, identity disclosure attack.

在大数据时代，由于所处理信息的庞大数量和敏感性，保护数据隐私变得至关重要。本研究致力于通过一种以最优密钥生成为中心的新型数据消毒方法来保护数据隐私。由于大数据应用的规模和复杂性，对低风险、高隐私的大数据管理提出了挑战。由于大数据包含大量复杂的数据，因此引入了许多标准的隐私保护机制来保持大数据的数量和速度。为了解决这一问题，本研究开发了一种数据消毒技术，用于最优密钥生成，以保护敏感数据的隐私性。利用准标识符对敏感数据进行初始识别，并利用所提出的MPWO算法生成最优密钥对识别出的敏感数据进行保存。该算法将海洋捕食者的觅食行为特征与鲸鱼进行杂交，从而确定最优关键字。使用MPWO算法生成的最优密钥有效地保护了数据的隐私性。通过测量属性披露攻击、身份披露攻击和身份披露攻击的度量等价类大小度量值分别为0.03、185.07和0.04，证明了研究的有效性。同样，属性披露攻击、身份披露攻击、身份披露攻击的可别性度量值分别为0.08、123.38、0.09，属性披露攻击、身份披露攻击、身份披露攻击的归一化确定性惩罚分别为0.002、61.69、0.001。

{"title":"Optimal Key Generation for Privacy Preservation in Big Data Applications Based on the Marine Predator Whale Optimization Algorithm","authors":"Poonam Samir Jadhav, Gautam M. Borkar","doi":"10.1007/s40745-024-00521-8","DOIUrl":"10.1007/s40745-024-00521-8","url":null,"abstract":"<div><p>In the era of big data, preserving data privacy has become paramount due to the sheer volume and sensitivity of the information being processed. This research is dedicated to safeguarding data privacy through a novel data sanitization approach centered on optimal key generation. Due to the size and complexity of the big data applications, managing big data with reduced risk and high privacyposes challenges. Many standard privacy-preserving mechanisms are introduced to maintain the volume and velocity of big data since it consists of massive and complex data. To solve this issue, this research developed a data sanitization technique for optimal key generation to preserve the privacy of the sensitive data. The sensitive data is initially identified by the quasi-identifiers and the identified sensitive data is preserved by generating an optimal key using the proposed marine predator whale optimization (MPWO) algorithm. The proposed algorithm is developed by the hybridization of the characteristics of foraging behaviors of the marine predators and the whales are hybridized to determine the optimal key. The optimal key generated using the MPWO algorithm effectively preserves the privacy of the data. The efficiency of the research is proved by measuring the metrics equivalent class size metric values of 0.03, 185.07, and 0.04 for attribute disclosure attack, identity disclosure attack, and identity disclosure attack. Similarly, the Discernibility metrics value is measured as 0.08, 123.38, 0.09 with attribute disclosure attack, identity disclosure attack, identity disclosure attack, and the Normalized certainty penalty is measured as 0.002, 61.69, 0.001 attribute disclosure attack, identity disclosure attack, identity disclosure attack.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"539 - 569"},"PeriodicalIF":0.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140225219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0