首页 > 最新文献

Annals of Data Science最新文献

英文 中文
Semiparametric Regression Analysis of Panel Count Data with Multiple Modes of Recurrence 对具有多种复现模式的面板计数数据进行半参数回归分析
Q1 Decision Sciences Pub Date : 2024-03-19 DOI: 10.1007/s40745-024-00522-7
Mathew P. M. Ashlin, P. G. Sankaran, E. P. Sreedevi

Panel count data refers to the information collected in studies focusing on recurrent events, where subjects are observed only at specific time points. If these study subjects are exposed to recurrent events of several types, we obtain panel count data with multiple modes of recurrence. In this article, we present a novel method based on generalized estimating equations for the regression analysis of panel count data exposed to multiple modes of recurrence. A cause specific proportional mean model is developed to analyze the effect of covariates on the underlying counting process due to multiple modes of recurrence. We conduct a detailed investigation on the joint estimation of baseline cumulative mean functions and regression parameters. Simulation studies are carried out to evaluate the finite sample performance of the proposed estimators. The procedures are applied to two real data sets, to demonstrate the practical utility.

面板计数数据是指在关注复发事件的研究中收集的信息,这些研究仅在特定时间点观察受试者。如果这些研究对象暴露于几种类型的复发事件,我们获得具有多种复发模式的面板计数数据。在本文中,我们提出了一种基于广义估计方程的新方法,用于暴露于多个递归模式的面板计数数据的回归分析。一个特定原因的比例平均模型被开发来分析协变量对潜在计数过程的影响,由于多个模式的递归。我们对基线累积平均函数和回归参数的联合估计进行了详细的研究。进行了仿真研究,以评估所提出的估计器的有限样本性能。该程序应用于两个实际数据集,以证明其实际效用。
{"title":"Semiparametric Regression Analysis of Panel Count Data with Multiple Modes of Recurrence","authors":"Mathew P. M. Ashlin,&nbsp;P. G. Sankaran,&nbsp;E. P. Sreedevi","doi":"10.1007/s40745-024-00522-7","DOIUrl":"10.1007/s40745-024-00522-7","url":null,"abstract":"<div><p>Panel count data refers to the information collected in studies focusing on recurrent events, where subjects are observed only at specific time points. If these study subjects are exposed to recurrent events of several types, we obtain panel count data with multiple modes of recurrence. In this article, we present a novel method based on generalized estimating equations for the regression analysis of panel count data exposed to multiple modes of recurrence. A cause specific proportional mean model is developed to analyze the effect of covariates on the underlying counting process due to multiple modes of recurrence. We conduct a detailed investigation on the joint estimation of baseline cumulative mean functions and regression parameters. Simulation studies are carried out to evaluate the finite sample performance of the proposed estimators. The procedures are applied to two real data sets, to demonstrate the practical utility.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"571 - 590"},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140228641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying BERT-Based NLP for Automated Resume Screening and Candidate Ranking 基于bert的自然语言处理在简历筛选和候选人排名中的应用
Q1 Decision Sciences Pub Date : 2024-03-08 DOI: 10.1007/s40745-024-00524-5
Asmita Deshmukh, Anjali Raut

In this research, we introduce an innovative automated resume screening approach that leverages advanced Natural Language Processing (NLP) technology, specifically the Bidirectional Encoder Representations from Transformers (BERT) language model by Google. Our methodology involved collecting 200 resumes from participants with their consent and obtaining ten job descriptions from glassdoor.com for testing. We extracted keywords from the resumes, identified skill sets, and ranked them to focus on crucial attributes. After removing stop words and punctuation, we selected top keywords for analysis. To ensure data precision, we employed stemming and lemmatization to correct tense and meaning. Using the preinstalled BERT model and tokenizer, we generated feature vectors for job descriptions and resume keywords. Our key findings include the calculation of the highest similarity index for each resume, which enabled us to shortlist the most relevant candidates. Notably, the similarity index could reach up to 0.3, and the resume screening speed could reach 1 resume per second. The application of BERT-based NLP techniques significantly improved screening efficiency and accuracy, streamlining talent acquisition and providing valuable insights to HR personnel for informed decision-making. This study underscores the transformative potential of BERT in revolutionizing recruitment through scalable and powerful automated resume screening, demonstrating its efficacy in enhancing the precision and speed of candidate selection.

在这项研究中,我们介绍了一种创新的自动化简历筛选方法,该方法利用了先进的自然语言处理(NLP)技术,特别是谷歌的双向编码器表示(BERT)语言模型。我们的方法包括在参与者同意的情况下收集200份简历,并从glassdoor.com上获取10个职位描述进行测试。我们从简历中提取关键字,确定技能组合,并根据关键属性对其进行排序。在去掉停止词和标点符号后,我们选择了最重要的关键词进行分析。为了保证数据的准确性,我们采用词干化和词形化来纠正时态和意义。使用预安装的BERT模型和标记器,我们生成了职位描述和简历关键词的特征向量。我们的主要发现包括对每份简历的最高相似指数的计算,这使我们能够列出最相关的候选人。值得注意的是,相似度指数可达0.3,简历筛选速度可达1份/秒。基于bert的NLP技术的应用显著提高了筛选的效率和准确性,简化了人才获取流程,并为人力资源人员提供了有价值的见解,以便做出明智的决策。这项研究强调了BERT在通过可扩展和强大的自动化简历筛选革新招聘方面的变革潜力,证明了它在提高候选人选择的准确性和速度方面的有效性。
{"title":"Applying BERT-Based NLP for Automated Resume Screening and Candidate Ranking","authors":"Asmita Deshmukh,&nbsp;Anjali Raut","doi":"10.1007/s40745-024-00524-5","DOIUrl":"10.1007/s40745-024-00524-5","url":null,"abstract":"<div><p>In this research, we introduce an innovative automated resume screening approach that leverages advanced Natural Language Processing (NLP) technology, specifically the Bidirectional Encoder Representations from Transformers (BERT) language model by Google. Our methodology involved collecting 200 resumes from participants with their consent and obtaining ten job descriptions from glassdoor.com for testing. We extracted keywords from the resumes, identified skill sets, and ranked them to focus on crucial attributes. After removing stop words and punctuation, we selected top keywords for analysis. To ensure data precision, we employed stemming and lemmatization to correct tense and meaning. Using the preinstalled BERT model and tokenizer, we generated feature vectors for job descriptions and resume keywords. Our key findings include the calculation of the highest similarity index for each resume, which enabled us to shortlist the most relevant candidates. Notably, the similarity index could reach up to 0.3, and the resume screening speed could reach 1 resume per second. The application of BERT-based NLP techniques significantly improved screening efficiency and accuracy, streamlining talent acquisition and providing valuable insights to HR personnel for informed decision-making. This study underscores the transformative potential of BERT in revolutionizing recruitment through scalable and powerful automated resume screening, demonstrating its efficacy in enhancing the precision and speed of candidate selection.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"591 - 603"},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Inference for the Entropy of the Rayleigh Model Based on Ordered Ranked Set Sampling 基于有序排序集合采样的雷利模型熵的贝叶斯推断
Q1 Decision Sciences Pub Date : 2024-02-27 DOI: 10.1007/s40745-024-00514-7
Mohammed S. Kotb, Haidy A. Newer, Marwa M. Mohie El-Din

Recently, ranked set samples schemes have become quite popular in reliability analysis and life-testing problems. Based on ordered ranked set sample, the Bayesian estimators and credible intervals for the entropy of the Rayleigh model are studied and compared with the corresponding estimators based on simple random sampling. These Bayes estimators for entropy are developed and computed with various loss functions, such as square error, linear-exponential, Al-Bayyati, and general entropy loss functions. A comparison study for various estimates of entropy based on mean squared error is done. A real-life data set and simulation are applied to illustrate our procedures.

近来,有序集合样本方案在可靠性分析和寿命测试问题中颇受欢迎。基于有序排序集合样本,研究了雷利模型熵的贝叶斯估计值和可信区间,并与基于简单随机抽样的相应估计值进行了比较。这些贝叶斯熵估计器是用各种损失函数(如平方误差、线性-指数、Al-Bayyati 和一般熵损失函数)开发和计算的。对基于均方误差的各种熵估计值进行了比较研究。为了说明我们的程序,我们应用了真实数据集和模拟。
{"title":"Bayesian Inference for the Entropy of the Rayleigh Model Based on Ordered Ranked Set Sampling","authors":"Mohammed S. Kotb,&nbsp;Haidy A. Newer,&nbsp;Marwa M. Mohie El-Din","doi":"10.1007/s40745-024-00514-7","DOIUrl":"10.1007/s40745-024-00514-7","url":null,"abstract":"<div><p>Recently, ranked set samples schemes have become quite popular in reliability analysis and life-testing problems. Based on ordered ranked set sample, the Bayesian estimators and credible intervals for the entropy of the Rayleigh model are studied and compared with the corresponding estimators based on simple random sampling. These Bayes estimators for entropy are developed and computed with various loss functions, such as square error, linear-exponential, Al-Bayyati, and general entropy loss functions. A comparison study for various estimates of entropy based on mean squared error is done. A real-life data set and simulation are applied to illustrate our procedures.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1435 - 1458"},"PeriodicalIF":0.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140427345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Joint Cognitive Latent Variable Model for Binary Decision-making Tasks and Reaction Time Outcomes 二元决策任务和反应时间结果的联合认知潜变量模型
Q1 Decision Sciences Pub Date : 2024-02-27 DOI: 10.1007/s40745-024-00519-2
Mahdi Mollakazemiha, Ehsan Bahrami Samani

Traditionally, in cognitive modeling for binary decision-making tasks, stochastic differential equations, particularly a family of diffusion decision models, are applied. These models suffer from difficulties in parameter estimation and forecasting due to the non-existence of analytical solutions for the differential equations. In this paper, we introduce a joint latent variable model for binary decision-making tasks and reaction time outcomes. Additionally, accelerated Failure Time models can be used for the analysis of reaction time to estimate the effects of covariates on acceleration/deceleration of the survival time. A full likelihood-based approach is used to obtain maximum likelihood estimates of the parameters of the model.To illustrate the utility of the proposed models, a simulation study and real data are analyzed.

传统上,在二元决策任务的认知建模中,采用了随机微分方程,特别是一组扩散决策模型。由于微分方程不存在解析解,这些模型在参数估计和预测方面存在困难。在本文中,我们引入了一个二元决策任务和反应时间结果的联合潜变量模型。此外,加速失效时间模型可用于分析反应时间,以估计协变量对生存时间加速/减速的影响。基于全似然的方法用于获得模型参数的最大似然估计。为了说明所提出的模型的实用性,对仿真研究和实际数据进行了分析。
{"title":"A Joint Cognitive Latent Variable Model for Binary Decision-making Tasks and Reaction Time Outcomes","authors":"Mahdi Mollakazemiha,&nbsp;Ehsan Bahrami Samani","doi":"10.1007/s40745-024-00519-2","DOIUrl":"10.1007/s40745-024-00519-2","url":null,"abstract":"<div><p>Traditionally, in cognitive modeling for binary decision-making tasks, stochastic differential equations, particularly a family of diffusion decision models, are applied. These models suffer from difficulties in parameter estimation and forecasting due to the non-existence of analytical solutions for the differential equations. In this paper, we introduce a joint latent variable model for binary decision-making tasks and reaction time outcomes. Additionally, accelerated Failure Time models can be used for the analysis of reaction time to estimate the effects of covariates on acceleration/deceleration of the survival time. A full likelihood-based approach is used to obtain maximum likelihood estimates of the parameters of the model.To illustrate the utility of the proposed models, a simulation study and real data are analyzed.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"499 - 516"},"PeriodicalIF":0.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140424420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised Feature Selection via Quadratic Surface Regression with (l_{2,1})-Norm Regularization 通过带 $$l_{2,1}$ 正则化的二次表面回归进行监督特征选择
Q1 Decision Sciences Pub Date : 2024-02-15 DOI: 10.1007/s40745-024-00518-3
Changlin Wang, Zhixia Yang, Junyou Ye, Xue Yang, Manchen Ding

This paper proposes a supervised kernel-free quadratic surface regression method for feature selection (QSR-FS). The method is to find a quadratic function in each class and incorporates it into the least squares loss function. The (l_{2,1})-norm regularization term is introduced to obtain a sparse solution, and a feature weight vector is constructed by the coefficients of the quadratic functions in all classes to explain the importance of each feature. An alternating iteration algorithm is designed to solve the optimization problem of this model. The computational complexity of the algorithm is provided, and the iterative formula is reformulated to further accelerate computation. In the experimental part, feature selection and its downstream classification tasks are performed on eight datasets from different domains, and the experimental results are analyzed by relevant evaluation index. Furthermore, feature selection interpretability and parameter sensitivity analysis are provided. The experimental results demonstrate the feasibility and effectiveness of our method.

本文提出了一种用于特征选择的有监督无核二次曲面回归方法(QSR-FS)。该方法是在每个类别中找到一个二次函数,并将其纳入最小二乘损失函数。引入(l_{2,1})正则化项以获得稀疏解,并通过所有类别中二次函数的系数构建特征权重向量,以解释每个特征的重要性。设计了一种交替迭代算法来解决该模型的优化问题。提供了算法的计算复杂度,并重新制定了迭代公式以进一步加快计算速度。在实验部分,对来自不同领域的八个数据集进行了特征选择及其下游分类任务,并通过相关评价指标对实验结果进行了分析。此外,还提供了特征选择的可解释性和参数敏感性分析。实验结果证明了我们方法的可行性和有效性。
{"title":"Supervised Feature Selection via Quadratic Surface Regression with (l_{2,1})-Norm Regularization","authors":"Changlin Wang,&nbsp;Zhixia Yang,&nbsp;Junyou Ye,&nbsp;Xue Yang,&nbsp;Manchen Ding","doi":"10.1007/s40745-024-00518-3","DOIUrl":"10.1007/s40745-024-00518-3","url":null,"abstract":"<div><p>This paper proposes a supervised kernel-free quadratic surface regression method for feature selection (QSR-FS). The method is to find a quadratic function in each class and incorporates it into the least squares loss function. The <span>(l_{2,1})</span>-norm regularization term is introduced to obtain a sparse solution, and a feature weight vector is constructed by the coefficients of the quadratic functions in all classes to explain the importance of each feature. An alternating iteration algorithm is designed to solve the optimization problem of this model. The computational complexity of the algorithm is provided, and the iterative formula is reformulated to further accelerate computation. In the experimental part, feature selection and its downstream classification tasks are performed on eight datasets from different domains, and the experimental results are analyzed by relevant evaluation index. Furthermore, feature selection interpretability and parameter sensitivity analysis are provided. The experimental results demonstrate the feasibility and effectiveness of our method.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 2","pages":"647 - 675"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139836443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Hyperbolic Tangent Family of Distributions: Properties and Applications 新的双曲切线分布族:特性与应用
Q1 Decision Sciences Pub Date : 2024-02-15 DOI: 10.1007/s40745-024-00516-5
Shahid Mohammad, Isabel Mendoza

This paper introduces a new family of distributions called the hyperbolic tangent (HT) family. The cumulative distribution function of this model is defined using the standard hyperbolic tangent function. The fundamental properties of the distribution are thoroughly examined and presented. Additionally, an inverse exponential distribution is employed as a sub-model within the HT family, and its properties are also derived. The parameters of the HT family are estimated using the maximum likelihood method, and the performance of these estimators is assessed using a simulation approach. To demonstrate the significance and flexibility of the newly introduced family of distributions, two real data sets are utilized. These data sets serve as practical examples that showcase the applicability and usefulness of the HT family in real-world scenarios. By introducing the HT family, exploring its properties, employing the maximum likelihood estimation, and conducting simulations and real data analyses, this paper contributes to the advancement of statistical modeling and distribution theory.

本文介绍了一种新的分布族,称为双曲正切(HT)族。该模型的累积分布函数用标准双曲正切函数定义。分布的基本性质进行了彻底的检查和提出。此外,采用逆指数分布作为HT族的子模型,并推导了其性质。利用极大似然方法估计了HT族的参数,并利用仿真方法评估了这些估计器的性能。为了证明新引入的分布族的重要性和灵活性,使用了两个真实数据集。这些数据集作为实际示例,展示了HT系列在实际场景中的适用性和有用性。本文通过引入HT族,探索其性质,采用极大似然估计,并进行模拟和实际数据分析,为统计建模和分布理论的发展做出了贡献。
{"title":"A New Hyperbolic Tangent Family of Distributions: Properties and Applications","authors":"Shahid Mohammad,&nbsp;Isabel Mendoza","doi":"10.1007/s40745-024-00516-5","DOIUrl":"10.1007/s40745-024-00516-5","url":null,"abstract":"<div><p>This paper introduces a new family of distributions called the hyperbolic tangent (HT) family. The cumulative distribution function of this model is defined using the standard hyperbolic tangent function. The fundamental properties of the distribution are thoroughly examined and presented. Additionally, an inverse exponential distribution is employed as a sub-model within the HT family, and its properties are also derived. The parameters of the HT family are estimated using the maximum likelihood method, and the performance of these estimators is assessed using a simulation approach. To demonstrate the significance and flexibility of the newly introduced family of distributions, two real data sets are utilized. These data sets serve as practical examples that showcase the applicability and usefulness of the HT family in real-world scenarios. By introducing the HT family, exploring its properties, employing the maximum likelihood estimation, and conducting simulations and real data analyses, this paper contributes to the advancement of statistical modeling and distribution theory.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"457 - 480"},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139835200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Risk of Bitcoin Futures Market: New Evidence 评估比特币期货市场的风险:新证据
Q1 Decision Sciences Pub Date : 2024-02-14 DOI: 10.1007/s40745-024-00517-4
Anupam Dutta

The main objective of this paper is to forecast the realized volatility (RV) of Bitcoin futures (BTCF) market. To serve our purpose, we propose an augmented heterogenous autoregressive (HAR) model to consider the information on time-varying jumps observed in BTCF returns. Specifically, we estimate the jump-induced volatility using the GARCH-jump process and then consider this information in the HAR model. Both the in-sample and out-of-sample analyses show that jumps offer added information which is not provided by the existing HAR models. In addition, a novel finding is that the jump-induced volatility offers incremental information relative to the Bitcoin implied volatility index. In sum, our results indicate that the HAR-RV process comprising the leverage effects and jump volatility would predict the RV more precisely compared to the standard HAR-type models. These findings have important implications to cryptocurrency investors.

本文的主要目的是预测比特币期货市场的实现波动率(RV)。为了达到我们的目的,我们提出了一个增强的异质自回归(HAR)模型来考虑在BTCF回报中观察到的时变跳跃信息。具体来说,我们使用GARCH-jump过程估计跳跃引起的波动,然后在HAR模型中考虑这些信息。样本内和样本外分析都表明,跳跃提供了现有HAR模型没有提供的附加信息。此外,一个新的发现是,跳跃引起的波动提供了相对于比特币隐含波动率指数的增量信息。综上所述,我们的研究结果表明,与标准的HAR-type模型相比,包含杠杆效应和跳跃波动的HAR-RV过程可以更准确地预测RV。这些发现对加密货币投资者具有重要意义。
{"title":"Assessing the Risk of Bitcoin Futures Market: New Evidence","authors":"Anupam Dutta","doi":"10.1007/s40745-024-00517-4","DOIUrl":"10.1007/s40745-024-00517-4","url":null,"abstract":"<div><p>The main objective of this paper is to forecast the realized volatility (RV) of Bitcoin futures (BTCF) market. To serve our purpose, we propose an augmented heterogenous autoregressive (HAR) model to consider the information on time-varying jumps observed in BTCF returns. Specifically, we estimate the jump-induced volatility using the GARCH-jump process and then consider this information in the HAR model. Both the in-sample and out-of-sample analyses show that jumps offer added information which is not provided by the existing HAR models. In addition, a novel finding is that the jump-induced volatility offers incremental information relative to the Bitcoin implied volatility index. In sum, our results indicate that the HAR-RV process comprising the leverage effects and jump volatility would predict the RV more precisely compared to the standard HAR-type models. These findings have important implications to cryptocurrency investors.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"481 - 497"},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-024-00517-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139778431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Innovative Technique for Generating Probability Distributions: A Study on Lomax Distribution with Applications in Medical and Engineering Fields 生成概率分布的创新技术:洛马克斯分布在医学和工程领域的应用研究
Q1 Decision Sciences Pub Date : 2024-02-13 DOI: 10.1007/s40745-024-00515-6
Shamshad Ur Rasool, M. A. Lone, S. P. Ahmad

In this paper, we propose and investigate a novel approach for generating the probability distributions. The novel method is known as the SMP transformation technique. By using the SMP Transformation technique, we have developed a new model of the Lomax distribution known as SMP Lomax (SMPL) distribution. The SMPL distribution, which is comparable to the Sine Power Lomax distribution, Power Length BiasedWeighted Lomax Distribution, Exponentiated Lomax and Lomax distribution have the desirable attribute of allowing the superiority and the flexibility over other well known existing models. Furthermore, the research article examines various aspects related to the SMPL , including the statistical properties along with the maximum likelihood estimation procedure to estimate the parameters. An extensive simulation study is carried out to illustrate the behaviour of MLEs on the basis of Mean Square Errors. To evaluate the effectiveness and flexibility of the proposed distribution, two real-life data sets are employed and it is observed that SMPL outperforms base model of Lomax distribution as well as other mentioned competing models based on Akaike Information Criterion, Akaike Information criterion Corrected, Hannan–Quinn information criterion and other goodness of fit measures.

在本文中,我们提出并研究了一种生成概率分布的新方法。这种新方法被称为SMP转换技术。通过使用SMP变换技术,我们开发了一种新的Lomax分布模型,称为SMP Lomax (SMPL)分布。SMPL分布可与正弦功率Lomax分布、功率长度偏加权Lomax分布、幂次Lomax分布和Lomax分布相媲美,具有比其他已知的现有模型更优越和更灵活的理想属性。此外,本文还研究了与SMPL相关的各个方面,包括统计性质以及估计参数的最大似然估计程序。在均方误差的基础上,进行了广泛的仿真研究,以说明mle的行为。为了评价所提出的分布的有效性和灵活性,采用了两个实际数据集,观察到SMPL优于Lomax分布的基本模型以及基于Akaike信息准则、Akaike信息准则Corrected、Hannan-Quinn信息准则和其他拟合优度度量的其他竞争模型。
{"title":"An Innovative Technique for Generating Probability Distributions: A Study on Lomax Distribution with Applications in Medical and Engineering Fields","authors":"Shamshad Ur Rasool,&nbsp;M. A. Lone,&nbsp;S. P. Ahmad","doi":"10.1007/s40745-024-00515-6","DOIUrl":"10.1007/s40745-024-00515-6","url":null,"abstract":"<div><p>In this paper, we propose and investigate a novel approach for generating the probability distributions. The novel method is known as the SMP transformation technique. By using the SMP Transformation technique, we have developed a new model of the Lomax distribution known as SMP Lomax (SMPL) distribution. The SMPL distribution, which is comparable to the Sine Power Lomax distribution, Power Length BiasedWeighted Lomax Distribution, Exponentiated Lomax and Lomax distribution have the desirable attribute of allowing the superiority and the flexibility over other well known existing models. Furthermore, the research article examines various aspects related to the SMPL , including the statistical properties along with the maximum likelihood estimation procedure to estimate the parameters. An extensive simulation study is carried out to illustrate the behaviour of MLEs on the basis of Mean Square Errors. To evaluate the effectiveness and flexibility of the proposed distribution, two real-life data sets are employed and it is observed that SMPL outperforms base model of Lomax distribution as well as other mentioned competing models based on Akaike Information Criterion, Akaike Information criterion Corrected, Hannan–Quinn information criterion and other goodness of fit measures.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 2","pages":"439 - 455"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139782016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter Estimation for Geometric Lévy Processes with Constant Volatility 具有恒定波动性的几何莱维过程的参数估计
Q1 Decision Sciences Pub Date : 2024-01-30 DOI: 10.1007/s40745-024-00513-8
Sher Chhetri, Hongwei Long, Cory Ball

In finance, various stochastic models have been used to describe price movements of financial instruments. Following the seminal work of Robert Merton, several jump-diffusion models have been proposed for option pricing and risk management. In this study, we augment the process related to the dynamics of log returns in the Black–Scholes model by incorporating alpha-stable Lévy motion with constant volatility. We employ the sample characteristic function approach to investigate parameter estimation for discretely observed stochastic differential equations driven by Lévy noises. Furthermore, we discuss the consistency and asymptotic properties of the proposed estimators and establish a Central Limit Theorem. To further demonstrate the validity of the estimators, we present simulation results for the model. The utility of the proposed model is demonstrated using the Dow Jones Industrial Average data, and all parameters involved in the model are estimated. In addition, we delved into the broader implications of our work, discussing the relevance of our methods to big data-driven research, particularly in the fields of financial data modeling and climate models. We also highlight the importance of optimization and data mining in these contexts, referencing key works in the field. This study thus contributes to the specific area of finance and beyond to the wider scientific community engaged in data science research and analysis.

在金融领域,各种随机模型被用来描述金融工具的价格变动。继Robert Merton的开创性工作之后,一些跳跃-扩散模型被提出用于期权定价和风险管理。在本研究中,我们通过纳入具有恒定波动率的α稳定lsamvy运动来增加与Black-Scholes模型中对数回报动力学相关的过程。本文采用样本特征函数方法研究了由lsamvy噪声驱动的离散观测随机微分方程的参数估计问题。进一步讨论了所提估计量的相合性和渐近性,并建立了中心极限定理。为了进一步证明估计器的有效性,我们给出了模型的仿真结果。利用道琼斯工业平均指数数据证明了所提出模型的效用,并对模型中涉及的所有参数进行了估计。此外,我们还深入探讨了我们工作的更广泛意义,讨论了我们的方法与大数据驱动研究的相关性,特别是在金融数据建模和气候模型领域。我们还强调了优化和数据挖掘在这些背景下的重要性,参考了该领域的关键工作。因此,这项研究对金融的特定领域以及从事数据科学研究和分析的更广泛的科学界做出了贡献。
{"title":"Parameter Estimation for Geometric Lévy Processes with Constant Volatility","authors":"Sher Chhetri,&nbsp;Hongwei Long,&nbsp;Cory Ball","doi":"10.1007/s40745-024-00513-8","DOIUrl":"10.1007/s40745-024-00513-8","url":null,"abstract":"<div><p>In finance, various stochastic models have been used to describe price movements of financial instruments. Following the seminal work of Robert Merton, several jump-diffusion models have been proposed for option pricing and risk management. In this study, we augment the process related to the dynamics of log returns in the Black–Scholes model by incorporating alpha-stable Lévy motion with constant volatility. We employ the sample characteristic function approach to investigate parameter estimation for discretely observed stochastic differential equations driven by Lévy noises. Furthermore, we discuss the consistency and asymptotic properties of the proposed estimators and establish a Central Limit Theorem. To further demonstrate the validity of the estimators, we present simulation results for the model. The utility of the proposed model is demonstrated using the Dow Jones Industrial Average data, and all parameters involved in the model are estimated. In addition, we delved into the broader implications of our work, discussing the relevance of our methods to big data-driven research, particularly in the fields of financial data modeling and climate models. We also highlight the importance of optimization and data mining in these contexts, referencing key works in the field. This study thus contributes to the specific area of finance and beyond to the wider scientific community engaged in data science research and analysis.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 1","pages":"63 - 93"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Road Traffic Speed Prediction Using Data Augmentation: A Deep Generative Models-based Approach 利用数据增强改进道路交通速度预测:基于深度生成模型的方法
Q1 Decision Sciences Pub Date : 2024-01-30 DOI: 10.1007/s40745-023-00508-x
Redouane Benabdallah Benarmas, Kadda Beghdad Bey

Deep learning prediction models have emerged as the most widely used for the development of intelligent transportation systems (ITS), and their success is strongly reliant on the volume and quality of training data. However, traffic datasets are often small due to the limitations of the resources used to collect and store traffic flow data. Data Augmentation (DA) is a key method to improve the amount of the training dataset before applying a prediction model. In this paper, we demonstrate the effectiveness of data augmentation for predicting traffic speed by using a Deep Generative Model-based approach (DGM). We empirically evaluate the ability of time series-appropriate architectures to improve traffic prediction over a Train on Synthetic Test on Real(TSTR) process. A Time Series-based Generative Adversarial Network model is used to transform an original road traffic dataset into a synthetic dataset to improve traffic prediction. Experiments were carried out using the 6th Beijing and PeMS datasets to show that the transformation improves the prediction model’s accuracy using both parametric and non-parametric methods. Original datasets are compared with the generated ones using statistical analysis methods to measure the fidelity and behavior of the produced data.

深度学习预测模型已成为智能交通系统(ITS)开发中应用最广泛的模型,其成功与否在很大程度上取决于训练数据的数量和质量。然而,由于用于收集和存储交通流数据的资源有限,交通数据集通常较小。数据扩增(DA)是在应用预测模型前提高训练数据集数量的一种关键方法。在本文中,我们利用基于深度生成模型的方法(DGM)展示了数据扩增在预测交通速度方面的有效性。我们通过实证方法评估了时间序列适当架构在 "实测合成训练"(TSTR)过程中改进交通预测的能力。我们使用基于时间序列的生成对抗网络模型将原始道路交通数据集转换为合成数据集,以改进交通预测。使用第六次北京和 PeMS 数据集进行了实验,结果表明,使用参数和非参数方法,转换提高了预测模型的准确性。使用统计分析方法将原始数据集与生成的数据集进行比较,以衡量生成数据的保真度和行为。
{"title":"Improving Road Traffic Speed Prediction Using Data Augmentation: A Deep Generative Models-based Approach","authors":"Redouane Benabdallah Benarmas,&nbsp;Kadda Beghdad Bey","doi":"10.1007/s40745-023-00508-x","DOIUrl":"10.1007/s40745-023-00508-x","url":null,"abstract":"<div><p>Deep learning prediction models have emerged as the most widely used for the development of intelligent transportation systems (ITS), and their success is strongly reliant on the volume and quality of training data. However, traffic datasets are often small due to the limitations of the resources used to collect and store traffic flow data. Data Augmentation (DA) is a key method to improve the amount of the training dataset before applying a prediction model. In this paper, we demonstrate the effectiveness of data augmentation for predicting traffic speed by using a Deep Generative Model-based approach (DGM). We empirically evaluate the ability of time series-appropriate architectures to improve traffic prediction over a Train on Synthetic Test on Real(TSTR) process. A Time Series-based Generative Adversarial Network model is used to transform an original road traffic dataset into a synthetic dataset to improve traffic prediction. Experiments were carried out using the 6th Beijing and PeMS datasets to show that the transformation improves the prediction model’s accuracy using both parametric and non-parametric methods. Original datasets are compared with the generated ones using statistical analysis methods to measure the fidelity and behavior of the produced data.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2199 - 2216"},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140485444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1