首页 > 最新文献

Annals of Data Science最新文献

英文 中文
Partial Label Learning with Noisy Labels 带噪声标签的部分标签学习
Q1 Decision Sciences Pub Date : 2024-07-31 DOI: 10.1007/s40745-024-00552-1
Pan Zhao, Long Tang, Zhigeng Pan

Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the emergence of label noise can make some candidate sets lose their true labels, leading to a decline in model performance. In this work, a robust training strategy for PLL, derived from the joint training with co-regularization (JoCoR), is proposed to address this issue in PLL. Specifically, the proposed approach constructs two separate PLL models and a joint loss. The joint loss consists of not only two PLL losses but also a co-regularization term measuring the disagreement of the two models. By automatically selecting samples with small joint loss and using them to update the two models, our proposed approach is able to filter more and more suspected samples with noise candidate label sets. Gradually, the robustness of the PLL models to label noise strengthens due to the reduced disagreement of the two models. Experiments are conducted on two state-of-the-art PLL models using benchmark datasets under various noise levels. The results show that the proposed method can effectively stabilize the training process and reduce the model's overfitting to noisy candidate label sets.

部分标签学习(PLL)是弱监督学习中的一个特殊问题。在锁相环中,每个样本对应于一个候选标签集,其中只有一个标签为真。然而,在一些实际应用场景中,标签噪声的出现会使一些候选集失去其真实标签,从而导致模型性能下降。在这项工作中,提出了一种针对PLL的鲁棒训练策略,该策略来源于联合训练和共正则化(JoCoR),以解决PLL中的这个问题。具体来说,该方法构建了两个独立的锁相环模型和一个联合损耗。联合损耗不仅包括两个锁相环损耗,还包括一个度量两个模型不一致的协正则化项。通过自动选择联合损失较小的样本并利用它们更新两个模型,我们提出的方法能够过滤越来越多的带有噪声候选标签集的可疑样本。逐渐地,由于两种模型的分歧减少,锁相环模型对噪声标记的鲁棒性增强。在不同噪声水平下使用基准数据集对两个最先进的锁相环模型进行了实验。结果表明,该方法可以有效地稳定训练过程,减少模型对有噪声候选标签集的过拟合。
{"title":"Partial Label Learning with Noisy Labels","authors":"Pan Zhao,&nbsp;Long Tang,&nbsp;Zhigeng Pan","doi":"10.1007/s40745-024-00552-1","DOIUrl":"10.1007/s40745-024-00552-1","url":null,"abstract":"<div><p>Partial label learning (PLL) is a particular problem setting within weakly supervised learning. In PLL, each sample corresponds to a candidate label set in which only one label is true. However, in some practical application scenarios, the emergence of label noise can make some candidate sets lose their true labels, leading to a decline in model performance. In this work, a robust training strategy for PLL, derived from the joint training with co-regularization (JoCoR), is proposed to address this issue in PLL. Specifically, the proposed approach constructs two separate PLL models and a joint loss. The joint loss consists of not only two PLL losses but also a co-regularization term measuring the disagreement of the two models. By automatically selecting samples with small joint loss and using them to update the two models, our proposed approach is able to filter more and more suspected samples with noise candidate label sets. Gradually, the robustness of the PLL models to label noise strengthens due to the reduced disagreement of the two models. Experiments are conducted on two state-of-the-art PLL models using benchmark datasets under various noise levels. The results show that the proposed method can effectively stabilize the training process and reduce the model's overfitting to noisy candidate label sets.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"199 - 212"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations 使用数值近似法估算马图西塔重叠系数的核方法
Q1 Decision Sciences Pub Date : 2024-07-27 DOI: 10.1007/s40745-024-00563-y
Omar M. Eidous, Enas A. Ananbeh

In this paper, a nonparametric kernel method is introduced to estimate the well-known overlapping coefficient, Matusita (rho (X,Y)), between two random variables (X) and (Y). Due to the complexity of finding the formula expression of this coefficient when using the kernel estimators, we suggest to use the numerical integration method to approximate its integral as a first step. Then the kernel estimators were combined with the new approximation to formulate the proposed estimators. Two numerical integration rules known as trapezoidal and Simpson rules were used to approximate the interesting integral. The proposed technique produces two new estimators for (rho (X,Y)). The resulting estimators are studied and compared with existing estimator developed by Eidous and Al-Talafheh (Commun Stat Simul Comput 51(9):5139–5156, 2022. https://doi.org/10.1080/03610918.2020.1757711) via Monte-Carlo simulation technique. The simulation results demonstrated the usefulness and effectiveness of the new technique for estimating (rho (X,Y)).

本文引入了一种非参数核方法来估计两个随机变量(X)和(Y)之间众所周知的重叠系数Matusita (rho (X,Y))。由于在使用核估计时寻找该系数的公式表达式的复杂性,我们建议使用数值积分方法来近似其积分作为第一步。然后将核估计量与新逼近量结合,形成了所提出的估计量。两种数值积分规则,即梯形规则和辛普森规则,被用来近似这个有趣的积分。提出的技术为(rho (X,Y))产生了两个新的估计器。对所得估计量进行了研究,并与Eidous和Al-Talafheh开发的现有估计量进行了比较(公共统计模拟计算51(9):5139-5156,2022)。https://doi.org/10.1080/03610918.2020.1757711)通过蒙特卡罗模拟技术。仿真结果表明,该方法对(rho (X,Y))的估计是有效的。
{"title":"Kernel Method for Estimating Matusita Overlapping Coefficient Using Numerical Approximations","authors":"Omar M. Eidous,&nbsp;Enas A. Ananbeh","doi":"10.1007/s40745-024-00563-y","DOIUrl":"10.1007/s40745-024-00563-y","url":null,"abstract":"<div><p>In this paper, a nonparametric kernel method is introduced to estimate the well-known overlapping coefficient, Matusita <span>(rho (X,Y))</span>, between two random variables <span>(X)</span> and <span>(Y)</span>. Due to the complexity of finding the formula expression of this coefficient when using the kernel estimators, we suggest to use the numerical integration method to approximate its integral as a first step. Then the kernel estimators were combined with the new approximation to formulate the proposed estimators. Two numerical integration rules known as trapezoidal and Simpson rules were used to approximate the interesting integral. The proposed technique produces two new estimators for <span>(rho (X,Y))</span>. The resulting estimators are studied and compared with existing estimator developed by Eidous and Al-Talafheh (Commun Stat Simul Comput 51(9):5139–5156, 2022. https://doi.org/10.1080/03610918.2020.1757711) via Monte-Carlo simulation technique. The simulation results demonstrated the usefulness and effectiveness of the new technique for estimating <span>(rho (X,Y))</span>.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1265 - 1283"},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions 广义膨胀幂级数分布的最大似然估计
Q1 Decision Sciences Pub Date : 2024-07-23 DOI: 10.1007/s40745-024-00560-1
Robert L. Paige

In this paper we first define the class of Generalized Inflated Power Series Distributions (GIPSDs) which contain the inflated discrete distributions most often seen in practice as special cases. We describe the hitherto unkown exponential family structure of GIPSDs and use this to derive closed-form, easy to program, conditional and unconditional maximum likelihood estimators for essentially any number of parameters. We also show how the GIPSD exponential family can be extended to model deflated mass points. Our results provide easy access to likelihood-based inference and automated model selection procedures for GIPSDs that only involve one-dimensional numerical root-finding problems that are easily solved with simple routines. We consider four real-data examples which illustrate the utility and scope of our results.

本文首先定义了一类广义膨胀幂级数分布(Generalized Inflated幂级数分布,GIPSDs),它包含了实践中最常见的作为特例的膨胀离散分布。我们描述了迄今为止未知的gipsd的指数族结构,并利用它推导出本质上任意数量参数的封闭形式,易于编程,条件和无条件的最大似然估计。我们还展示了如何将GIPSD指数族扩展到压缩质量点的模型。我们的结果为gipsd提供了基于似然的推理和自动模型选择程序,这些程序只涉及一维数值寻根问题,可以通过简单的例程轻松解决。我们考虑了四个实际数据示例,这些示例说明了我们的结果的实用性和范围。
{"title":"Maximum Likelihood Estimation for Generalized Inflated Power Series Distributions","authors":"Robert L. Paige","doi":"10.1007/s40745-024-00560-1","DOIUrl":"10.1007/s40745-024-00560-1","url":null,"abstract":"<div><p>In this paper we first define the class of Generalized Inflated Power Series Distributions (GIPSDs) which contain the inflated discrete distributions most often seen in practice as special cases. We describe the hitherto unkown exponential family structure of GIPSDs and use this to derive closed-form, easy to program, conditional and unconditional maximum likelihood estimators for essentially any number of parameters. We also show how the GIPSD exponential family can be extended to model deflated mass points. Our results provide easy access to likelihood-based inference and automated model selection procedures for GIPSDs that only involve one-dimensional numerical root-finding problems that are easily solved with simple routines. We consider four real-data examples which illustrate the utility and scope of our results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1189 - 1209"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141812645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Human Word Association Based Model for Topic Detection in Social Networks 基于人类词关联的社交网络主题检测模型
Q1 Decision Sciences Pub Date : 2024-07-20 DOI: 10.1007/s40745-024-00561-0
Mehrdad Ranjbar-Khadivi, Shahin Akbarpour, Mohammad-Reza Feizi-Derakhshi, Babak Anari

With the widespread use of social networks, detecting the topics discussed on these platforms has become a significant challenge. Current approaches primarily rely on frequent pattern mining or semantic relations, often neglecting the structure of the language. Language structural methods aim to discover the relationships between words and how humans understand them. Therefore, this paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association. This framework employs the Human Word Association method and includes a specially designed extraction algorithm. The performance of this method is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection. The results indicate that the proposed method significantly improves topic detection compared to other methods, as evidenced by Topic-recall and the keyword F1 measure. Additionally, to assess the applicability and generalizability of the proposed method, a dataset of Telegram posts in the Persian language is used. The results demonstrate that this method outperforms other topic detection methods.

随着社交网络的广泛使用,检测这些平台上讨论的话题已成为一项重大挑战。目前的方法主要依赖于频繁的模式挖掘或语义关系,往往忽略了语言的结构。语言结构方法旨在发现单词之间的关系以及人类如何理解它们。因此,本文引入了一种基于模仿词联想心理能力概念的社交网络主题检测框架。该框架采用了人类词关联方法,并包含了一个专门设计的提取算法。使用主题检测领域的基准FA-CUP数据集对该方法的性能进行了评估。结果表明,与其他方法相比,该方法显著提高了主题检测,主题召回和关键词F1测度证明了这一点。此外,为了评估所提出方法的适用性和泛化性,使用了波斯语电报帖子的数据集。结果表明,该方法优于其他主题检测方法。
{"title":"A Human Word Association Based Model for Topic Detection in Social Networks","authors":"Mehrdad Ranjbar-Khadivi,&nbsp;Shahin Akbarpour,&nbsp;Mohammad-Reza Feizi-Derakhshi,&nbsp;Babak Anari","doi":"10.1007/s40745-024-00561-0","DOIUrl":"10.1007/s40745-024-00561-0","url":null,"abstract":"<div><p>With the widespread use of social networks, detecting the topics discussed on these platforms has become a significant challenge. Current approaches primarily rely on frequent pattern mining or semantic relations, often neglecting the structure of the language. Language structural methods aim to discover the relationships between words and how humans understand them. Therefore, this paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association. This framework employs the Human Word Association method and includes a specially designed extraction algorithm. The performance of this method is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection. The results indicate that the proposed method significantly improves topic detection compared to other methods, as evidenced by Topic-recall and the keyword F1 measure. Additionally, to assess the applicability and generalizability of the proposed method, a dataset of Telegram posts in the Persian language is used. The results demonstrate that this method outperforms other topic detection methods.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1211 - 1235"},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145166932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Farm-Level Smart Crop Recommendation Framework Using Machine Learning 利用机器学习的农场级智能作物推荐框架
Q1 Decision Sciences Pub Date : 2024-07-20 DOI: 10.1007/s40745-024-00534-3
Amit Bhola, Prabhat Kumar

Agriculture is the primary source of food, fuel, and raw materials and is vital to any country’s economy. Farmers, the backbone of agriculture, primarily rely on instinct to determine what crops to plant in any given season. They are comfortable following customary farming practices and standards and are oblivious to the fact that crop yield is highly dependent on current environmental and soil conditions. Crop recommendations involve multifaceted factors such as weather, soil quality, crop production, market demand, and prices, making it crucial for farmers to make well-informed decisions. An improper or imprudent crop recommendation can affect them, their families, and the entire agricultural sector. Modern technologies like artificial intelligence, machine learning, and data science have emerged as efficient solutions to combat issues like declining crop production and lower profits. This research proposes a Smart Crop Recommendation framework that leverages machine learning to empower farmers to make informed decisions about optimal crop selection. The framework consists of two phases: crop filtration and yield prediction. Crops are filtered in the first phase using an artificial neural network based on local input parameters. The second phase estimates yield for filtered crops, considering the season, farm area, and location data. The final recommendation provides farmers with crops aimed at maximizing profit. The remarkable 99.10% accuracy of the framework is demonstrated through experimentation using artificial neural networks and the 0.99 (text {R}^{text {2}}) error metric for the random forest. The uniqueness of this framework lies in its distinctive focus on the farm level and its consideration of the challenges and various agricultural features that change over time. The experimental results affirm the effectiveness of the framework, and its lightweight nature enhances its practicality, making it an efficient real-time recommendation solution.

农业是食物、燃料和原材料的主要来源,对任何国家的经济都至关重要。农民是农业的支柱,他们主要依靠本能来决定在特定季节种植什么作物。他们乐于遵循传统的耕作方法和标准,而忽略了作物产量高度依赖于当前的环境和土壤条件这一事实。作物建议涉及多方面因素,如天气、土壤质量、作物产量、市场需求和价格,因此农民做出明智的决定至关重要。不当或轻率的作物推荐会影响她们、她们的家庭和整个农业部门。人工智能、机器学习和数据科学等现代技术已经成为应对作物产量下降和利润下降等问题的有效解决方案。这项研究提出了一个智能作物推荐框架,利用机器学习使农民能够在最佳作物选择方面做出明智的决定。该框架包括作物筛选和产量预测两个阶段。第一阶段使用基于局部输入参数的人工神经网络对作物进行过滤。第二阶段根据季节、农场面积和位置数据估算过滤作物的产量。最后的建议是为农民提供利润最大化的作物。非凡的99.10% accuracy of the framework is demonstrated through experimentation using artificial neural networks and the 0.99 (text {R}^{text {2}}) error metric for the random forest. The uniqueness of this framework lies in its distinctive focus on the farm level and its consideration of the challenges and various agricultural features that change over time. The experimental results affirm the effectiveness of the framework, and its lightweight nature enhances its practicality, making it an efficient real-time recommendation solution.
{"title":"Farm-Level Smart Crop Recommendation Framework Using Machine Learning","authors":"Amit Bhola,&nbsp;Prabhat Kumar","doi":"10.1007/s40745-024-00534-3","DOIUrl":"10.1007/s40745-024-00534-3","url":null,"abstract":"<div><p>Agriculture is the primary source of food, fuel, and raw materials and is vital to any country’s economy. Farmers, the backbone of agriculture, primarily rely on instinct to determine what crops to plant in any given season. They are comfortable following customary farming practices and standards and are oblivious to the fact that crop yield is highly dependent on current environmental and soil conditions. Crop recommendations involve multifaceted factors such as weather, soil quality, crop production, market demand, and prices, making it crucial for farmers to make well-informed decisions. An improper or imprudent crop recommendation can affect them, their families, and the entire agricultural sector. Modern technologies like artificial intelligence, machine learning, and data science have emerged as efficient solutions to combat issues like declining crop production and lower profits. This research proposes a Smart Crop Recommendation framework that leverages machine learning to empower farmers to make informed decisions about optimal crop selection. The framework consists of two phases: crop filtration and yield prediction. Crops are filtered in the first phase using an artificial neural network based on local input parameters. The second phase estimates yield for filtered crops, considering the season, farm area, and location data. The final recommendation provides farmers with crops aimed at maximizing profit. The remarkable 99.10% accuracy of the framework is demonstrated through experimentation using artificial neural networks and the 0.99 <span>(text {R}^{text {2}})</span> error metric for the random forest. The uniqueness of this framework lies in its distinctive focus on the farm level and its consideration of the challenges and various agricultural features that change over time. The experimental results affirm the effectiveness of the framework, and its lightweight nature enhances its practicality, making it an efficient real-time recommendation solution.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"117 - 140"},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141819448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reaction Function for Financial Market Reacting to Events or Information 金融市场对事件或信息的反应函数
Q1 Decision Sciences Pub Date : 2024-07-17 DOI: 10.1007/s40745-024-00565-w
Bo Li, Guangle Du

Observations indicate that the distributions of stock returns in financial markets usually do not conform to normal distributions, but rather exhibit characteristics of high peaks, fat tails and biases. In this work, we assume that the effects of events or information on prices obey normal distribution, while financial markets often overreact or underreact to events or information, resulting in non normal distributions of stock returns. Based on the above assumptions, we for the first time propose a reaction function for a financial market reacting to events or information, and a model based on it to describe the distribution of real stock returns. Our analysis of the returns of China Securities Index 300 (CSI 300), the Standard & Poor’s 500 Index (SPX or S &P 500) and the Nikkei 225 Index (N225) at different time scales shows that financial markets often underreact to events or information with minor impacts, overreact to events or information with relatively significant impacts, and react slightly stronger to positive events or information than to negative ones. In addition, differences in financial markets and time scales of returns can also affect the shapes of the reaction functions.

观察表明,金融市场中股票收益的分布通常不符合正态分布,而是表现出峰值高、尾部肥大和偏差等特征。在本文中,我们假设事件或信息对价格的影响服从正态分布,而金融市场往往对事件或信息反应过度或反应不足,从而导致股票收益率的非正态分布。基于上述假设,我们首次提出了金融市场对事件或信息的反应函数,并在此基础上建立了描述实际股票收益率分布的模型。我们对中国证券指数 300(沪深 300)、标准普尔 500 指数(SPX 或 S&P 500)和日经 225 指数(N225)在不同时间尺度上的收益率进行分析后发现,金融市场往往对影响较小的事件或信息反应不足,对影响相对较大的事件或信息反应过度,对正面事件或信息的反应略强于负面事件或信息。此外,金融市场和回报时间尺度的不同也会影响反应函数的形状。
{"title":"Reaction Function for Financial Market Reacting to Events or Information","authors":"Bo Li,&nbsp;Guangle Du","doi":"10.1007/s40745-024-00565-w","DOIUrl":"10.1007/s40745-024-00565-w","url":null,"abstract":"<div><p>Observations indicate that the distributions of stock returns in financial markets usually do not conform to normal distributions, but rather exhibit characteristics of high peaks, fat tails and biases. In this work, we assume that the effects of events or information on prices obey normal distribution, while financial markets often overreact or underreact to events or information, resulting in non normal distributions of stock returns. Based on the above assumptions, we for the first time propose a reaction function for a financial market reacting to events or information, and a model based on it to describe the distribution of real stock returns. Our analysis of the returns of China Securities Index 300 (CSI 300), the Standard &amp; Poor’s 500 Index (SPX or S &amp;P 500) and the Nikkei 225 Index (N225) at different time scales shows that financial markets often underreact to events or information with minor impacts, overreact to events or information with relatively significant impacts, and react slightly stronger to positive events or information than to negative ones. In addition, differences in financial markets and time scales of returns can also affect the shapes of the reaction functions.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1265 - 1290"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141830830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transmuted Shifted Lindley Distribution: Characterizations, Classical and Bayesian Estimation with Applications 变换的移位林德利分布:特征、经典和贝叶斯估计及其应用
Q1 Decision Sciences Pub Date : 2024-07-16 DOI: 10.1007/s40745-024-00562-z
A. Chakraborty, S. Rana, S. I. Maiti

In this article, we propose the quadratic rank transmutation map approach on shifted Lindley distribution to improve the existing distribution further. An additional skewness parameter (lambda ) is incorporated to transmute the distribution. The distribution, hence introduced, is called the Transmuted Shifted Lindley distribution. We provide a comprehensive description of this distribution’s statistical properties and its reliability behavior. The heat maps on the associated parameters are presented. In the estimation section, both maximum likelihood and Bayesian estimation of parameters are discussed. A detailed simulation study is performed. Finally, a real data application illustrates the performance of fitting to the proposed distribution.

为了进一步改进现有的林德利分布,本文提出了移位林德利分布的二次秩变换映射方法。加入一个额外的偏度参数(lambda )来改变分布。由此引入的分布称为变形移位林德利分布。我们提供了该分布的统计特性及其可靠性行为的全面描述。给出了相关参数的热图。在估计部分,讨论了参数的极大似然估计和贝叶斯估计。进行了详细的仿真研究。最后,一个实际数据应用说明了拟合所提出分布的性能。
{"title":"Transmuted Shifted Lindley Distribution: Characterizations, Classical and Bayesian Estimation with Applications","authors":"A. Chakraborty,&nbsp;S. Rana,&nbsp;S. I. Maiti","doi":"10.1007/s40745-024-00562-z","DOIUrl":"10.1007/s40745-024-00562-z","url":null,"abstract":"<div><p>In this article, we propose the quadratic rank transmutation map approach on shifted Lindley distribution to improve the existing distribution further. An additional skewness parameter <span>(lambda )</span> is incorporated to transmute the distribution. The distribution, hence introduced, is called the Transmuted Shifted Lindley distribution. We provide a comprehensive description of this distribution’s statistical properties and its reliability behavior. The heat maps on the associated parameters are presented. In the estimation section, both maximum likelihood and Bayesian estimation of parameters are discussed. A detailed simulation study is performed. Finally, a real data application illustrates the performance of fitting to the proposed distribution.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1237 - 1264"},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141641587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Apple Leaf Disease Detection Using Transfer Learning 利用迁移学习技术检测苹果叶病
Q1 Decision Sciences Pub Date : 2024-07-13 DOI: 10.1007/s40745-024-00555-y
Ozair Ahmad Wani, Umer Zahoor, Syed Zubair Ahmad Shah, Rijwan Khan

Automated detection of plant diseases is crucial as it simplifies the task of monitoring large farms and identifies diseases at their early stages to mitigate further plant degradation. Besides the decline in plant health, reduced production severely impacts the country’s economy. Traditional disease identification methods, relying on human experts, are slow, time-consuming, and impractical for large farms. Our proposed model utilizes a combination of pre-trained Resnet18, Alexnet, GoogLeNet, and VGG16 networks to classify apple tree leaves into categories such as healthy, black rot, apple cedar rust, and apple scab based on images. Various image enhancement techniques were employed to enhance the model’s accuracy. Ultimately, our model achieved an accuracy of 97.25% on the validation dataset, demonstrating excellent performance across various metrics. This suggests its potential for efficient and accurate plant health monitoring in the agricultural sector.

植物病害的自动检测至关重要,因为它简化了监测大型农场的任务,并在疾病的早期阶段识别疾病,以减轻植物的进一步退化。除了植物健康状况下降外,减产还严重影响了该国的经济。传统的疾病鉴定方法依赖于人类专家,速度慢、耗时长,而且对大型农场来说不切实际。我们提出的模型结合了预训练的Resnet18、Alexnet、GoogLeNet和VGG16网络,根据图像将苹果树叶分为健康、黑腐、苹果雪松锈病和苹果痂等类别。采用了多种图像增强技术来提高模型的精度。最终,我们的模型在验证数据集上实现了97.25%的准确率,在各种指标上都表现出出色的性能。这表明它有潜力在农业部门进行有效和准确的植物健康监测。
{"title":"Apple Leaf Disease Detection Using Transfer Learning","authors":"Ozair Ahmad Wani,&nbsp;Umer Zahoor,&nbsp;Syed Zubair Ahmad Shah,&nbsp;Rijwan Khan","doi":"10.1007/s40745-024-00555-y","DOIUrl":"10.1007/s40745-024-00555-y","url":null,"abstract":"<div><p>Automated detection of plant diseases is crucial as it simplifies the task of monitoring large farms and identifies diseases at their early stages to mitigate further plant degradation. Besides the decline in plant health, reduced production severely impacts the country’s economy. Traditional disease identification methods, relying on human experts, are slow, time-consuming, and impractical for large farms. Our proposed model utilizes a combination of pre-trained Resnet18, Alexnet, GoogLeNet, and VGG16 networks to classify apple tree leaves into categories such as healthy, black rot, apple cedar rust, and apple scab based on images. Various image enhancement techniques were employed to enhance the model’s accuracy. Ultimately, our model achieved an accuracy of 97.25% on the validation dataset, demonstrating excellent performance across various metrics. This suggests its potential for efficient and accurate plant health monitoring in the agricultural sector.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"213 - 222"},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Anonymization Algorithms and Methods in Big Data 大数据中的匿名算法和方法综述
Q1 Decision Sciences Pub Date : 2024-07-13 DOI: 10.1007/s40745-024-00557-w
Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani

In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.

在大数据时代,随着数据量的增加和复杂性的增加,如何在使用大数据的同时保护用户的隐私是主要的挑战。这项研究的目的是找到解决这一挑战的办法。在本研究中,我们研究了各种数据匿名化方法,包括差异隐私、高级加密和强访问控制。此外,还研究了这些方法的操作、优缺点和使用,将这些方法应用于大数据的挑战,以及可能的解决方案。研究结果表明,传统的数据匿名化方法缺乏可扩展性,导致隐私泄露和数据丢失。当面对大量数据时,这些方法可能无法完全处理数据。此外,这些方法可能对重新识别攻击、链接攻击和推理攻击无效。我们介绍了能够以最小的数据丢失提供改进的隐私的新兴方法。这些方法对于大数据具有可扩展性。最后,我们研究了未来的研究工作,并提出了有助于改进现有算法或开发新方法的重要问题,以更好地管理非结构化数据的复杂性和规模。
{"title":"A Review of Anonymization Algorithms and Methods in Big Data","authors":"Elham Shamsinejad,&nbsp;Touraj Banirostam,&nbsp;Mir Mohsen Pedram,&nbsp;Amir Masoud Rahmani","doi":"10.1007/s40745-024-00557-w","DOIUrl":"10.1007/s40745-024-00557-w","url":null,"abstract":"<div><p>In the era of big data, with the increase in volume and complexity of data, the main challenge is how to use big data while preserving the privacy of users. This study was conducted with the aim of finding a solution to this challenge. In this study, we examined various data anonymization methods, including differential privacy, advanced encryption, and strong access controls. In addition, the operation, advantages, disadvantages, and use of these methods, the challenges of adapting these methods to big data, and possible solutions for them were also examined. Our results show that traditional data anonymization methods lack scalability, leading to privacy breaches and data loss. When faced with large volumes of data, these methods may not be able to fully process the data. Also, these methods may be ineffective against re-identification attacks, linkage attacks, and inference attacks. We introduced emerging methods that are capable of providing improved privacy with minimal data loss. These methods have scalability for big data. Finally, we examined future research works and raised important questions that can help improve existing algorithms or develop new methods, better manage the complexity and scale of unstructured data.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"253 - 279"},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141650932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing 使用内存处理来表示大数据流匿名化模型
Q1 Decision Sciences Pub Date : 2024-07-13 DOI: 10.1007/s40745-024-00556-x
Elham Shamsinejad, Touraj Banirostam, Mir Mohsen Pedram, Amir Masoud Rahmani

In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency.

针对大数据时代不断升级的隐私风险,本文介绍了一种利用Spark框架内内存处理的大数据流匿名化创新模型。该方法建立在k -匿名原则的基础上,通过批判性地评估各种匿名化方法和算法,对其在时间和空间复杂性方面的性能进行基准测试,推动了该领域的发展。在K-means算法中提出了一种独特的优化聚类确定公式,以及一种新的元组过期时间策略,用于有效地清除聚类。将这些组件集成到Spark的RDD和MLlib模块中,即使在数据量增加的情况下,也能显著减少执行时间和数据损失率。该论文的显著贡献是其方法上的进步,为数据匿名化提供了一个强大的、可扩展的解决方案,在不牺牲数据效用或处理效率的情况下保护用户隐私。
{"title":"Representing a Model for the Anonymization of Big Data Stream Using In-Memory Processing","authors":"Elham Shamsinejad,&nbsp;Touraj Banirostam,&nbsp;Mir Mohsen Pedram,&nbsp;Amir Masoud Rahmani","doi":"10.1007/s40745-024-00556-x","DOIUrl":"10.1007/s40745-024-00556-x","url":null,"abstract":"<div><p>In light of the escalating privacy risks in the big data era, this paper introduces an innovative model for the anonymization of big data streams, leveraging in-memory processing within the Spark framework. The approach is founded on the principle of K-anonymity and propels the field forward by critically evaluating various anonymization methods and algorithms, benchmarking their performance with respect to time and space complexities. A distinctive formula for optimized cluster determination in the K-means algorithm is presented, along with a novel tuple expiration time strategy for the efficient purging of clusters. The integration of these components into Spark’s RDD and MLlib modules results in a significant decrease in execution time and data loss rates, even with increasing data volumes. The paper’s notable contributions are its methodological advancements that offer a robust, scalable solution for data anonymization, safeguarding user privacy without sacrificing data utility or processing efficiency.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"223 - 252"},"PeriodicalIF":0.0,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141651856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1