首页 > 最新文献

Applied Stochastic Models in Business and Industry最新文献

英文 中文
Causal Forests for Discovering Diagnostic Language in Electronic Health Records 在电子健康记录中发现诊断语言的因果林
IF 1.5 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-25 DOI: 10.1002/asmb.70038
Alessandro Albano, Chiara Di Maria, Mariangela Sciandra, Antonella Plaia

Textual analysis has gained significant interest in medical research, particularly for automated patient diagnosis based on clinical narratives. While traditional approaches often focus on associational methods, this paper explores the application of causal forests to analyze textual data from electronic health records (EHRs), aiming to identify causal relationships between specific words and the likelihood of receiving certain medical diagnoses. Utilizing the MIMIC-III dataset, we assess how linguistic factors influence diagnosis probabilities for three conditions: diabetes, hypothyroidism, and adrenal gland disorders. Our findings reveal significant causal links between certain clinical terms and diagnosis probabilities, emphasizing the potential of causal inference techniques to improve the analysis of language in clinical narratives. Additionally, we uncover heterogeneity in treatment effects, demonstrating that specific words can identify high-risk patient subgroups. This study highlights the importance of integrating causal inference in natural language processing within healthcare settings.

文本分析在医学研究中获得了极大的兴趣,特别是在基于临床叙述的患者自动诊断方面。传统方法通常侧重于关联方法,而本文探索了因果森林的应用,以分析电子健康记录(EHRs)的文本数据,旨在确定特定单词与接受某些医学诊断的可能性之间的因果关系。利用MIMIC-III数据集,我们评估了语言因素如何影响三种疾病的诊断概率:糖尿病、甲状腺功能减退和肾上腺疾病。我们的研究结果揭示了某些临床术语与诊断概率之间的重要因果关系,强调了因果推理技术在改善临床叙述中语言分析方面的潜力。此外,我们发现治疗效果的异质性,证明特定的单词可以识别高危患者亚组。本研究强调了在医疗环境中整合自然语言处理中的因果推理的重要性。
{"title":"Causal Forests for Discovering Diagnostic Language in Electronic Health Records","authors":"Alessandro Albano,&nbsp;Chiara Di Maria,&nbsp;Mariangela Sciandra,&nbsp;Antonella Plaia","doi":"10.1002/asmb.70038","DOIUrl":"https://doi.org/10.1002/asmb.70038","url":null,"abstract":"<p>Textual analysis has gained significant interest in medical research, particularly for automated patient diagnosis based on clinical narratives. While traditional approaches often focus on associational methods, this paper explores the application of causal forests to analyze textual data from electronic health records (EHRs), aiming to identify causal relationships between specific words and the likelihood of receiving certain medical diagnoses. Utilizing the MIMIC-III dataset, we assess how linguistic factors influence diagnosis probabilities for three conditions: diabetes, hypothyroidism, and adrenal gland disorders. Our findings reveal significant causal links between certain clinical terms and diagnosis probabilities, emphasizing the potential of causal inference techniques to improve the analysis of language in clinical narratives. Additionally, we uncover heterogeneity in treatment effects, demonstrating that specific words can identify high-risk patient subgroups. This study highlights the importance of integrating causal inference in natural language processing within healthcare settings.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 5","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144897465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability Inference in GLFP Models Based on EM Algorithm With Related Application 基于EM算法的GLFP模型可靠性推断及其应用
IF 1.5 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-03 DOI: 10.1002/asmb.70030
Chih-Ying Tai, Tsai-Hung Fan

During the manufacturing processes for the integrated circuit (IC) products, defective units may not be screened out by the quality inspections. The defective units often lead to infant mortality failure in the early stages of operation, while non-defective units will eventually fail due to wear-out failure. The general limited failure population (GLFP) model can be used to describe such a phenomenon in which defective units induce failure affected by both failure mechanisms, but failure of non-defective units is only due to wear-out. Besides, when a failure occurs, it is not known whether it is defective and yet which failure mode causes the failure. This article proposes an EM algorithm along with the missing information principle for the GLFP models under multiply censored Weibull distributions to simplify the maximum likelihood (ML) inference. It resolves the computational instability and provides more accurate reliability inference. With the embedded latent variables, failure mode detection and defect identification are also made for masked data, consequently. Furthermore, the proposed method can be extended to the GLFP models of interval data. The simulation study shows that the proposed method provides more accurate results. Two illustrative examples highlight the feasibility and advantages of the proposed approach.

在集成电路(IC)产品的生产过程中,质量检测可能无法筛选出有缺陷的部件。缺陷单元往往导致婴儿在操作的早期阶段死亡故障,而非缺陷单元最终会因磨损故障而失效。一般有限失效群体(GLFP)模型可以用来描述这样一种现象,即缺陷单元在两种失效机制的影响下诱发失效,而非缺陷单元的失效仅仅是由于磨损。此外,当发生故障时,不知道它是否有缺陷,也不知道是哪种故障模式导致了故障。为了简化最大似然(ML)推理,本文提出了一种基于缺失信息原理的多删节威布尔分布下GLFP模型的EM算法。它解决了计算的不稳定性,提供了更准确的可靠性推断。利用嵌入的潜在变量,对屏蔽数据进行故障模式检测和缺陷识别。此外,该方法还可以推广到区间数据的GLFP模型。仿真研究表明,该方法能提供更精确的结果。两个说明性的例子突出了所提出方法的可行性和优点。
{"title":"Reliability Inference in GLFP Models Based on EM Algorithm With Related Application","authors":"Chih-Ying Tai,&nbsp;Tsai-Hung Fan","doi":"10.1002/asmb.70030","DOIUrl":"https://doi.org/10.1002/asmb.70030","url":null,"abstract":"<div>\u0000 \u0000 <p>During the manufacturing processes for the integrated circuit (IC) products, defective units may not be screened out by the quality inspections. The defective units often lead to infant mortality failure in the early stages of operation, while non-defective units will eventually fail due to wear-out failure. The general limited failure population (GLFP) model can be used to describe such a phenomenon in which defective units induce failure affected by both failure mechanisms, but failure of non-defective units is only due to wear-out. Besides, when a failure occurs, it is not known whether it is defective and yet which failure mode causes the failure. This article proposes an EM algorithm along with the missing information principle for the GLFP models under multiply censored Weibull distributions to simplify the maximum likelihood (ML) inference. It resolves the computational instability and provides more accurate reliability inference. With the embedded latent variables, failure mode detection and defect identification are also made for masked data, consequently. Furthermore, the proposed method can be extended to the GLFP models of interval data. The simulation study shows that the proposed method provides more accurate results. Two illustrative examples highlight the feasibility and advantages of the proposed approach.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144767709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Modeling of Cyber Risk Insurance by Hawkes Processes With Loss Covariate 带损失协变量的网络风险保险Hawkes过程模型
IF 1.5 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-03 DOI: 10.1002/asmb.70026
Na Ren, Xin Zhang

The complexity and dynamic nature of cyber risks pose considerable challenges to risk management. From an actuarial perspective, we propose an advanced aggregate loss process using a variant of the Hawkes process as its frequency model. The refined Hawkes process first considers the impact of loss magnitude on the frequency of risk occurrences by integrating the loss covariate into the conditional intensity function. Second, we employ a more flexible kernel function in place of the classical exponential case. By incorporating the concept of age-dependent population structure, we calculate the probabilistic properties (mean, variance) for the proposed aggregate loss process. Furthermore, numerical simulations for cyber insurance pricing are conducted based on two pricing principles. Finally, we verify the feasibility of the proposed model based on a publicly available cyber breach data set. Considering the complex and dynamic nature of cyber risks, the efficiency of the proposed model is still limited by some factors, such as the authenticity and accuracy of the data. These are worthy of further consideration in future studies.

网络风险的复杂性和动态性给风险管理带来了巨大挑战。从精算的角度来看,我们提出了一种先进的汇总损失过程,使用Hawkes过程的变体作为其频率模型。改进的Hawkes过程首先通过将损失协变量整合到条件强度函数中来考虑损失幅度对风险发生频率的影响。其次,我们采用更灵活的核函数来代替经典的指数情况。通过结合年龄相关人口结构的概念,我们计算了所提出的总体损失过程的概率属性(均值,方差)。此外,基于两种定价原则对网络保险定价进行了数值模拟。最后,我们基于公开可用的网络泄露数据集验证了所提出模型的可行性。考虑到网络风险的复杂性和动态性,所提出模型的效率仍然受到一些因素的限制,如数据的真实性和准确性。这些值得在今后的研究中进一步考虑。
{"title":"The Modeling of Cyber Risk Insurance by Hawkes Processes With Loss Covariate","authors":"Na Ren,&nbsp;Xin Zhang","doi":"10.1002/asmb.70026","DOIUrl":"https://doi.org/10.1002/asmb.70026","url":null,"abstract":"<div>\u0000 \u0000 <p>The complexity and dynamic nature of cyber risks pose considerable challenges to risk management. From an actuarial perspective, we propose an advanced aggregate loss process using a variant of the Hawkes process as its frequency model. The refined Hawkes process first considers the impact of loss magnitude on the frequency of risk occurrences by integrating the loss covariate into the conditional intensity function. Second, we employ a more flexible kernel function in place of the classical exponential case. By incorporating the concept of age-dependent population structure, we calculate the probabilistic properties (mean, variance) for the proposed aggregate loss process. Furthermore, numerical simulations for cyber insurance pricing are conducted based on two pricing principles. Finally, we verify the feasibility of the proposed model based on a publicly available cyber breach data set. Considering the complex and dynamic nature of cyber risks, the efficiency of the proposed model is still limited by some factors, such as the authenticity and accuracy of the data. These are worthy of further consideration in future studies.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144767710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rejoinder to Next Generation Models for Subsequent Sports Injuries by Wu et al. 吴等人对下一代运动损伤模型的反驳。
IF 1.5 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-30 DOI: 10.1002/asmb.70035
Paul Pao-Yen Wu, Yu Yi Yu, Liam A. Toohey, Michael Drew, Scott A. Sisson, Clara Grazian, Kerrie Mengersen
<p>We greatly appreciate the commentary and positive feedback of discussants Prof. Jialiang Li and Dr. Rhythm Grover to enrich our paper and its context.</p><p>As noted by Prof. Li, survival models are highly applicable to the subsequent sports injury problem given the temporal dimension of injury data. In the sporting context, censoring can arise, for example, from finite surveillance windows associated with a sporting season, athletes joining and leaving a team, or even extended absence due to injury [<span>1, 2</span>]. However, given the complex systems nature of individual athletes and potentially changing dynamics and susceptibility to injury over time, it is also important to capture the changing state of the athlete explicitly [<span>3</span>]. For example, increasing strength with training over a season could reduce injury risk; however, a serious injury such as an ACL injury could lead to increased susceptibility to subsequent injuries.</p><p>Our paper presented a pragmatic approach, as noted by Dr. Grover, to tackle the challenges of modeling subsequent injury, reducing dimensionality through a time-varying Cox Proportional Hazards (PH) model, and using a discrete-time HMM to capture changes in susceptibility and covariate effects over time. Both Prof. Li and Dr. Grover note the potential computational challenge associated with Hidden Markov Models (HMMs) especially in the presence of large-scale and high-dimensional datasets. Hence, the need for dimension reduction, which was undertaken using survival modeling to explicitly cater for the time-to-event nature of injury data and censoring. The appropriateness of using the survival model was supported by checks of the assumptions of the PH model (e.g., proportional hazards, Schoenfeld residuals) and validation results (concordance index) as reported in our paper.</p><p>In addition to computational complexity, however, is the somewhat associated challenge of model convergence. Greater model complexity, such as more HMM states or more model covariates, can lead to challenges with model identifiability, estimation, computation, and thus model convergence [<span>4</span>]. This is a current research challenge when faced with limited data as in our subsequent injury application, which is limited to 33 players and 2523 training and competition sessions over one season. Computationally, the proposed discrete-time HMM fitted with Expectation Maximization (EM) took approximately 155 s to converge for the entire team of players over one season, compared to less than a second for the Cox PH model. However, model convergence with more than two states could not be achieved with this limited dataset. Therefore, although the computational cost is feasible in this case study, the data available can limit the level of model complexity that can be achieved. Hence, it highlights the utility of the proposed combination of dimension reduction and state space modelling as a more generalizable approach, and th
我们非常感谢讨论者李家良教授和格罗弗博士的评论和积极反馈,他们丰富了我们的论文及其背景。正如李教授所指出的,考虑到损伤数据的时间维度,生存模型非常适用于随后的运动损伤问题。在体育环境中,审查可能会出现,例如,与体育赛季相关的有限监视窗口,运动员加入和离开球队,甚至由于受伤而长期缺席[1,2]。然而,考虑到运动员个体复杂的系统特性,以及随着时间的推移可能发生的动态变化和对损伤的易感性,明确地捕捉运动员不断变化的状态也很重要。例如,在一个赛季的训练中增加力量可以降低受伤的风险;然而,像前交叉韧带损伤这样的严重损伤可能会导致对后续损伤的易感性增加。正如Grover博士所指出的那样,我们的论文提出了一种实用的方法来解决建模后续损伤的挑战,通过时变Cox比例风险(PH)模型降低维数,并使用离散时间HMM来捕获随时间变化的易感性和协变量效应。李教授和Grover博士都注意到与隐马尔可夫模型(hmm)相关的潜在计算挑战,特别是在大规模和高维数据集的存在下。因此,需要使用生存模型进行降维,以明确地满足损伤数据和审查的时间到事件性质。通过对PH模型的假设(如比例风险、舍恩菲尔德残差)和验证结果(一致性指数)的检查,支持了使用生存模型的适当性。然而,除了计算复杂性之外,还有与模型收敛相关的挑战。更高的模型复杂性,例如更多的HMM状态或更多的模型协变量,可能导致模型可识别性、估计、计算以及模型收敛[4]方面的挑战。这是当前的研究挑战,因为在我们随后的伤病申请中,数据有限,一个赛季仅限于33名球员和2523次训练和比赛。在计算上,与期望最大化(EM)相匹配的离散时间HMM在一个赛季内对整个团队的球员进行收敛大约需要155秒,而Cox PH模型只需要不到1秒。然而,在这个有限的数据集上,不能实现两个以上状态的模型收敛。因此,尽管在本案例研究中计算成本是可行的,但可用的数据可能会限制可以实现的模型复杂性水平。因此,它强调了将降维和状态空间建模相结合作为一种更通用的方法的实用性,以及在这一领域进行更多研究的必要性。沿着这些思路,李教授和Grover博士讨论了计算效率推断的挑战和该领域的未来研究,包括自举、加性危险模型和剔除数据的无模型降维。此外,或与自举相结合,HMM的贝叶斯推理[5,6]可能是研究有限数据推理的另一种途径,并有助于克服EM局部最大值的潜在挑战。此外,变量选择和推理的组合方法可能会捕获HMM背景下的有影响力的变量,这些变量在生存模型中没有边际影响,正如Grover博士所指出的那样。由于上面提到的计算和推理的复杂性,这是具有挑战性的;然而,从高维数据(hmm的广义形式)中学习动态贝叶斯网络的方法可能适用于后续损伤[7,8]。另一个研究途径可能是连续时间HMM (CTHMM),作为一种潜在的更好地捕获观测之间非均匀时间间隔的方法。然而,与离散HMM相比,CTHMM产生了额外的模型复杂性,因为状态转换时间和观测值之间的状态转换次数都需要估计。在这项研究中,数据仅限于一个澳大利亚足球联盟(AFL)俱乐部一个赛季的过程。正如格罗弗博士所指出的那样,有了更多赛季和/或俱乐部的额外数据,我们可以更好地评估和研究模型的可转移性和普遍性。然而,提议的HMM能够评估不同位置、暴露和负荷的个体球员的受伤风险,这表明了某种程度的普遍性。更大的数据集还可以应用现代机器学习方法,包括递归神经网络和时间卷积网络,迄今为止,这些方法在后续损伤领域的研究还不够充分。 一般来说,神经网络需要大量的数据集,并且很难解释,但可以产生非常高的预测性能。未来研究的一种潜在方法可以帮助解决精英运动中有限数据的实际挑战,即在更大的损伤数据集上训练模型,并为特定的体育俱乐部重新训练模型。这一点很重要,因为由于体育运动的竞争性和运动员在俱乐部之间的流动,获得运动员个人长期的完整数据是具有挑战性的。作者声明无利益冲突。
{"title":"Rejoinder to Next Generation Models for Subsequent Sports Injuries by Wu et al.","authors":"Paul Pao-Yen Wu,&nbsp;Yu Yi Yu,&nbsp;Liam A. Toohey,&nbsp;Michael Drew,&nbsp;Scott A. Sisson,&nbsp;Clara Grazian,&nbsp;Kerrie Mengersen","doi":"10.1002/asmb.70035","DOIUrl":"https://doi.org/10.1002/asmb.70035","url":null,"abstract":"&lt;p&gt;We greatly appreciate the commentary and positive feedback of discussants Prof. Jialiang Li and Dr. Rhythm Grover to enrich our paper and its context.&lt;/p&gt;&lt;p&gt;As noted by Prof. Li, survival models are highly applicable to the subsequent sports injury problem given the temporal dimension of injury data. In the sporting context, censoring can arise, for example, from finite surveillance windows associated with a sporting season, athletes joining and leaving a team, or even extended absence due to injury [&lt;span&gt;1, 2&lt;/span&gt;]. However, given the complex systems nature of individual athletes and potentially changing dynamics and susceptibility to injury over time, it is also important to capture the changing state of the athlete explicitly [&lt;span&gt;3&lt;/span&gt;]. For example, increasing strength with training over a season could reduce injury risk; however, a serious injury such as an ACL injury could lead to increased susceptibility to subsequent injuries.&lt;/p&gt;&lt;p&gt;Our paper presented a pragmatic approach, as noted by Dr. Grover, to tackle the challenges of modeling subsequent injury, reducing dimensionality through a time-varying Cox Proportional Hazards (PH) model, and using a discrete-time HMM to capture changes in susceptibility and covariate effects over time. Both Prof. Li and Dr. Grover note the potential computational challenge associated with Hidden Markov Models (HMMs) especially in the presence of large-scale and high-dimensional datasets. Hence, the need for dimension reduction, which was undertaken using survival modeling to explicitly cater for the time-to-event nature of injury data and censoring. The appropriateness of using the survival model was supported by checks of the assumptions of the PH model (e.g., proportional hazards, Schoenfeld residuals) and validation results (concordance index) as reported in our paper.&lt;/p&gt;&lt;p&gt;In addition to computational complexity, however, is the somewhat associated challenge of model convergence. Greater model complexity, such as more HMM states or more model covariates, can lead to challenges with model identifiability, estimation, computation, and thus model convergence [&lt;span&gt;4&lt;/span&gt;]. This is a current research challenge when faced with limited data as in our subsequent injury application, which is limited to 33 players and 2523 training and competition sessions over one season. Computationally, the proposed discrete-time HMM fitted with Expectation Maximization (EM) took approximately 155 s to converge for the entire team of players over one season, compared to less than a second for the Cox PH model. However, model convergence with more than two states could not be achieved with this limited dataset. Therefore, although the computational cost is feasible in this case study, the data available can limit the level of model complexity that can be achieved. Hence, it highlights the utility of the proposed combination of dimension reduction and state space modelling as a more generalizable approach, and th","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Tail Probability of Renewal Models of Dependent Heavy-Tailed Random Variables With Applications to Systemic Risk Measures 相关重尾随机变量更新模型的联合尾概率及其在系统风险度量中的应用
IF 1.5 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-29 DOI: 10.1002/asmb.70028
Lei Zou, Jiangyan Peng, Chenghao Xu

Consider a non-standard renewal risk model in which claims arrive in pairs {(X1i,X2i);i}$$ left{left({X}_{1i},{X}_{2i}right);iin mathbb{N}right} $$ and the stochastic discounting process is given by {eξ(t);t0}$$ left{{e}^{-xi (t)};tge 0right} $$, where ξ(·)$$ xi left(cdotp right) $$ is a Lévy process. We are interested in the joint tail probability of L1(t)$$ {L}_1(t) $$ and L

我们导出了L 1 (t) $$ {L}_1(t) $$和L的联合尾概率的渐近公式2 (t) $$ {L}_2(t) $$。然后将这些结果应用于评估两种系统性风险措施。最后,我们进行了数值研究来说明理论结果。
{"title":"Joint Tail Probability of Renewal Models of Dependent Heavy-Tailed Random Variables With Applications to Systemic Risk Measures","authors":"Lei Zou,&nbsp;Jiangyan Peng,&nbsp;Chenghao Xu","doi":"10.1002/asmb.70028","DOIUrl":"https://doi.org/10.1002/asmb.70028","url":null,"abstract":"<div>\u0000 \u0000 <p>Consider a non-standard renewal risk model in which claims arrive in pairs <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>{</mo>\u0000 <mo>(</mo>\u0000 <msub>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>1</mn>\u0000 <mi>i</mi>\u0000 </mrow>\u0000 </msub>\u0000 <mo>,</mo>\u0000 <msub>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 <mi>i</mi>\u0000 </mrow>\u0000 </msub>\u0000 <mo>)</mo>\u0000 <mo>;</mo>\u0000 <mi>i</mi>\u0000 <mo>∈</mo>\u0000 <mi>ℕ</mi>\u0000 <mo>}</mo>\u0000 </mrow>\u0000 <annotation>$$ left{left({X}_{1i},{X}_{2i}right);iin mathbb{N}right} $$</annotation>\u0000 </semantics></math> and the stochastic discounting process is given by <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>{</mo>\u0000 <msup>\u0000 <mrow>\u0000 <mi>e</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mo>−</mo>\u0000 <mi>ξ</mi>\u0000 <mo>(</mo>\u0000 <mi>t</mi>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 </msup>\u0000 <mo>;</mo>\u0000 <mi>t</mi>\u0000 <mo>≥</mo>\u0000 <mn>0</mn>\u0000 <mo>}</mo>\u0000 </mrow>\u0000 <annotation>$$ left{{e}^{-xi (t)};tge 0right} $$</annotation>\u0000 </semantics></math>, where <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>ξ</mi>\u0000 <mo>(</mo>\u0000 <mo>·</mo>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>$$ xi left(cdotp right) $$</annotation>\u0000 </semantics></math> is a Lévy process. We are interested in the joint tail probability of <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 </msub>\u0000 <mo>(</mo>\u0000 <mi>t</mi>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>$$ {L}_1(t) $$</annotation>\u0000 </semantics></math> and <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 ","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Credit Risk Management Through Integration of Multiple Imputation Methodology and Long-Term Survival Modelling 结合多重归算方法和长期生存模型加强信用风险管理
IF 1.3 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-21 DOI: 10.1002/asmb.70027
Jacob Majakwara, Patrick L. Mthisi, Honest W. Chipoyera

Credit risk management plays a crucial role in financial institutions by identifying, assessing and controlling the credit risks arising from lending activities. However, missing data pose a common problem in credit risk modelling, leading to biased estimates and a loss of statistical power. To address this issue and improve predictive accuracy, multiple imputation methods are increasingly employed. This study evaluates the performance of the Multivariate Imputation by Chained Equations (MICE) method in identifying factors associated with time to default, using the publicly available Prosper personal loan data. The analysis is conducted within the framework of mixture cure rate models based on the generalised gamma family of distributions. This research is the first of its kind to integrate the MICE approach into mixture cure rate modelling. The flexibility of the generalised gamma distribution was utilised to select the optimal mixture cure rate model. The estimated cure rate using complete cases (CC) was higher than that obtained using MICE imputation. This highlights the potential pitfalls of solely relying on CC analysis in survival analysis.

信贷风险管理通过识别、评估和控制贷款活动产生的信贷风险,在金融机构中起着至关重要的作用。然而,在信用风险建模中,数据缺失是一个常见的问题,它会导致有偏差的估计和统计能力的丧失。为了解决这一问题并提高预测精度,越来越多地采用了多种插值方法。本研究使用公开可用的Prosper个人贷款数据,通过链式方程(MICE)方法评估多元Imputation在识别与违约时间相关因素方面的表现。分析是在基于广义伽玛族分布的混合固化率模型框架内进行的。这项研究是同类研究中首次将MICE方法整合到混合固化速率模型中。利用广义伽玛分布的灵活性选择最优混合固化率模型。使用完整病例(CC)的估计治愈率高于使用小鼠植入获得的治愈率。这突出了在生存分析中单纯依赖CC分析的潜在缺陷。
{"title":"Enhancing Credit Risk Management Through Integration of Multiple Imputation Methodology and Long-Term Survival Modelling","authors":"Jacob Majakwara,&nbsp;Patrick L. Mthisi,&nbsp;Honest W. Chipoyera","doi":"10.1002/asmb.70027","DOIUrl":"https://doi.org/10.1002/asmb.70027","url":null,"abstract":"<p>Credit risk management plays a crucial role in financial institutions by identifying, assessing and controlling the credit risks arising from lending activities. However, missing data pose a common problem in credit risk modelling, leading to biased estimates and a loss of statistical power. To address this issue and improve predictive accuracy, multiple imputation methods are increasingly employed. This study evaluates the performance of the Multivariate Imputation by Chained Equations (MICE) method in identifying factors associated with time to default, using the publicly available Prosper personal loan data. The analysis is conducted within the framework of mixture cure rate models based on the generalised gamma family of distributions. This research is the first of its kind to integrate the MICE approach into mixture cure rate modelling. The flexibility of the generalised gamma distribution was utilised to select the optimal mixture cure rate model. The estimated cure rate using complete cases (CC) was higher than that obtained using MICE imputation. This highlights the potential pitfalls of solely relying on CC analysis in survival analysis.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144673029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forecasting Inflation From Disaggregated Data 从分类数据预测通胀
IF 1.3 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-06 DOI: 10.1002/asmb.70023
Wilmer Martínez-Rivera, Eliana González-Molano, Edgar Caicedo-Garcia

We forecast inflation aggregates for the United States, the United Kingdom, and Colombia using forecasts aggregation of disaggregates and forecasts obtained directly from the aggregate. We implement helpful models for many predictors, such as dimension reduction, shrinkage methods, machine learning models, and traditional time-series models (ARIMA and TAR). We evaluate out-sample forecasts for the period before COVID-19 and the period afterward. It was found that the aggregation of forecasts performs as well as the forecast using the aggregate directly. In some cases, there is a reduction in the forecast error from the disaggregate analysis.

我们预测了美国、英国和哥伦比亚的通货膨胀总量,使用的是分类汇总的预测和直接从总量中获得的预测。我们为许多预测器实现了有用的模型,例如降维、收缩方法、机器学习模型和传统的时间序列模型(ARIMA和TAR)。我们评估了COVID-19之前和之后时期的样本外预测。结果表明,集合预测的效果与直接使用集合预测的效果相当。在某些情况下,通过分解分析可以减少预测误差。
{"title":"Forecasting Inflation From Disaggregated Data","authors":"Wilmer Martínez-Rivera,&nbsp;Eliana González-Molano,&nbsp;Edgar Caicedo-Garcia","doi":"10.1002/asmb.70023","DOIUrl":"https://doi.org/10.1002/asmb.70023","url":null,"abstract":"<div>\u0000 \u0000 <p>We forecast inflation aggregates for the United States, the United Kingdom, and Colombia using forecasts aggregation of disaggregates and forecasts obtained directly from the aggregate. We implement helpful models for many predictors, such as dimension reduction, shrinkage methods, machine learning models, and traditional time-series models (ARIMA and TAR). We evaluate out-sample forecasts for the period before COVID-19 and the period afterward. It was found that the aggregation of forecasts performs as well as the forecast using the aggregate directly. In some cases, there is a reduction in the forecast error from the disaggregate analysis.</p>\u0000 </div>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144573576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Quality: What if Deming Were Born Today? 数据质量:如果戴明出生在今天会怎样?
IF 1.3 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-29 DOI: 10.1002/asmb.70025
Dennis K. J. Lin, Nicholas Rios

If Francis Bacon were born today, he might have said “data is power” instead of his original saying, “knowledge is power.” In modern society, data is everywhere. In memory of Deming (a guru in quality), this paper attempts to address the fundamental issue of data quality and how Deming would handle it. Specifically, we attempt to explain what data quality really means, and the critical impact that it has on data science. Statisticians, who understand how to collect high quality data, have much more to contribute to both the intellectual vitality and the practical utility of data science. At the same time, data science challenges statisticians to move out of some familiar habits to engage less structured problems, to become more comfortable with ambiguity, and to engage more scientists in a fruitful discussion on what various parties can bring to this new mode of investigation. Some potential avenues for future research in the collection of high-quality data will be proposed.

如果弗朗西斯·培根出生在今天,他可能会说“数据就是力量”,而不是他最初所说的“知识就是力量”。在现代社会,数据无处不在。为了纪念戴明(质量大师),本文试图解决数据质量的基本问题以及戴明将如何处理它。具体来说,我们试图解释数据质量的真正含义,以及它对数据科学的关键影响。统计学家,谁知道如何收集高质量的数据,有更多的贡献,无论是智力活力和数据科学的实际应用。与此同时,数据科学挑战统计学家摆脱一些熟悉的习惯,去处理不那么结构化的问题,更适应模棱两可,并让更多的科学家参与到富有成效的讨论中,讨论各方可以为这种新的调查模式带来什么。本文还将提出未来高质量数据收集研究的一些潜在途径。
{"title":"Data Quality: What if Deming Were Born Today?","authors":"Dennis K. J. Lin,&nbsp;Nicholas Rios","doi":"10.1002/asmb.70025","DOIUrl":"https://doi.org/10.1002/asmb.70025","url":null,"abstract":"<p>If Francis Bacon were born today, he might have said “data is power” instead of his original saying, “knowledge is power.” In modern society, data is everywhere. In memory of Deming (a guru in quality), this paper attempts to address the fundamental issue of data quality and how Deming would handle it. Specifically, we attempt to explain what data quality really means, and the critical impact that it has on data science. Statisticians, who understand how to collect high quality data, have much more to contribute to both the intellectual vitality and the practical utility of data science. At the same time, data science challenges statisticians to move out of some familiar habits to engage less structured problems, to become more comfortable with ambiguity, and to engage more scientists in a fruitful discussion on what various parties can bring to this new mode of investigation. Some potential avenues for future research in the collection of high-quality data will be proposed.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144514730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topic-Sentiment Hybrid Networks for Explainable Document Clustering: A Probabilistic Multi-Dimensional Similarity Analysis 主题-情感混合网络在可解释文档聚类中的应用:一个概率多维相似度分析
IF 1.3 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-22 DOI: 10.1002/asmb.70024
Marco Ortu
<p>This study introduces a statistical methodology for document clustering that integrates multiple dimensions of textual similarity through network topology analysis. The proposed methodology, which we call Multi-dimensional Similarity Network Analysis (MSNA), extends traditional document-clustering approaches by combining semantic embeddings, topic probability distributions, and emotional probability distribution into a unified similarity measure. We formalize this through a weighted combination of Jensen-Shannon divergences across different probability spaces, creating a comprehensive similarity network. The clustering is achieved through a community detection algorithm that optimizes a multi-objective modularity function, accounting for the different similarity dimensions. We prove the statistical consistency of our approach and derive bounds for the clustering performance under mild regularity conditions. The methodology is validated on a large-scale data set of Airbnb reviews <span></span><math> <semantics> <mrow> <mo>(</mo> <mi>n</mi> <mo>=</mo> <mn>114</mn> <mo>,</mo> <mn>000</mn> <mo>)</mo> </mrow> <annotation>$$ left(n=114,000right) $$</annotation> </semantics></math> from Sardinia, Italy, containing text content, topic distributions, and emotional features. Results show significant improvements in both clustering quality (average silhouette score increased) and interpretability compared to traditional single-dimension approaches. From an empirical perspective, the synthetic data validation demonstrates robust performance with topic strength in the range <span></span><math> <semantics> <mrow> <mo>[</mo> <mn>0</mn> <mo>.</mo> <mn>4</mn> <mo>,</mo> <mn>1</mn> <mo>.</mo> <mn>0</mn> <mo>]</mo> </mrow> <annotation>$$ left[0.4,1.0right] $$</annotation> </semantics></math> and emotion strength in <span></span><math> <semantics> <mrow> <mo>[</mo> <mn>0</mn> <mo>.</mo> <mn>2</mn> <mo>,</mo> <mn>1</mn> <mo>.</mo> <mn>0</mn> <mo>]</mo> </mrow> <annotation>$$ left[0.2,1.0right] $$</annotation> </semantics></math>, achieving mean Adjusted Rand Index scores of 0.44. The application to real-world data identifies five distinct clusters through PROCSIMA (PRObabilistic Clustering SIMilarity A
本文介绍了一种通过网络拓扑分析整合多个维度文本相似度的文档聚类统计方法。本文提出的方法被称为多维相似网络分析(MSNA),它通过将语义嵌入、主题概率分布和情感概率分布结合到一个统一的相似度量中,扩展了传统的文档聚类方法。我们通过跨不同概率空间的Jensen-Shannon散度的加权组合将其形式化,从而创建了一个综合的相似性网络。聚类是通过社区检测算法来实现的,该算法优化了多目标模块化函数,考虑了不同的相似度维度。我们证明了我们的方法的统计一致性,并推导了在温和正则性条件下聚类性能的界限。该方法在来自意大利撒丁岛的Airbnb评论(n = 114,000) $$ left(n=114,000right) $$的大规模数据集上进行了验证,该数据集包含文本内容、主题分布和情感特征。结果表明,与传统的单维度方法相比,聚类质量(平均轮廓分数增加)和可解释性都有显著改善。从实证的角度来看,合成数据验证在主题强度[0]范围内表现出稳健的性能。4,1。[0] $$ left[0.4,1.0right] $$和情感强度[0。2,1。0] $$ left[0.2,1.0right] $$,调整后的Rand Index平均得分为0.44。对现实世界数据的应用通过PROCSIMA(概率聚类相似性分析)识别出五个不同的集群,随后的SMARTS(评论主题和情感的语义分析)分析揭示了每个集群中可解释的社区结构。该框架能够同时捕获文本的语义、主题和情感方面,这使得它对客户体验分析和服务质量监控中的应用程序特别有价值。
{"title":"Topic-Sentiment Hybrid Networks for Explainable Document Clustering: A Probabilistic Multi-Dimensional Similarity Analysis","authors":"Marco Ortu","doi":"10.1002/asmb.70024","DOIUrl":"https://doi.org/10.1002/asmb.70024","url":null,"abstract":"&lt;p&gt;This study introduces a statistical methodology for document clustering that integrates multiple dimensions of textual similarity through network topology analysis. The proposed methodology, which we call Multi-dimensional Similarity Network Analysis (MSNA), extends traditional document-clustering approaches by combining semantic embeddings, topic probability distributions, and emotional probability distribution into a unified similarity measure. We formalize this through a weighted combination of Jensen-Shannon divergences across different probability spaces, creating a comprehensive similarity network. The clustering is achieved through a community detection algorithm that optimizes a multi-objective modularity function, accounting for the different similarity dimensions. We prove the statistical consistency of our approach and derive bounds for the clustering performance under mild regularity conditions. The methodology is validated on a large-scale data set of Airbnb reviews &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mo&gt;(&lt;/mo&gt;\u0000 &lt;mi&gt;n&lt;/mi&gt;\u0000 &lt;mo&gt;=&lt;/mo&gt;\u0000 &lt;mn&gt;114&lt;/mn&gt;\u0000 &lt;mo&gt;,&lt;/mo&gt;\u0000 &lt;mn&gt;000&lt;/mn&gt;\u0000 &lt;mo&gt;)&lt;/mo&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ left(n=114,000right) $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; from Sardinia, Italy, containing text content, topic distributions, and emotional features. Results show significant improvements in both clustering quality (average silhouette score increased) and interpretability compared to traditional single-dimension approaches. From an empirical perspective, the synthetic data validation demonstrates robust performance with topic strength in the range &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mo&gt;[&lt;/mo&gt;\u0000 &lt;mn&gt;0&lt;/mn&gt;\u0000 &lt;mo&gt;.&lt;/mo&gt;\u0000 &lt;mn&gt;4&lt;/mn&gt;\u0000 &lt;mo&gt;,&lt;/mo&gt;\u0000 &lt;mn&gt;1&lt;/mn&gt;\u0000 &lt;mo&gt;.&lt;/mo&gt;\u0000 &lt;mn&gt;0&lt;/mn&gt;\u0000 &lt;mo&gt;]&lt;/mo&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ left[0.4,1.0right] $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; and emotion strength in &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mo&gt;[&lt;/mo&gt;\u0000 &lt;mn&gt;0&lt;/mn&gt;\u0000 &lt;mo&gt;.&lt;/mo&gt;\u0000 &lt;mn&gt;2&lt;/mn&gt;\u0000 &lt;mo&gt;,&lt;/mo&gt;\u0000 &lt;mn&gt;1&lt;/mn&gt;\u0000 &lt;mo&gt;.&lt;/mo&gt;\u0000 &lt;mn&gt;0&lt;/mn&gt;\u0000 &lt;mo&gt;]&lt;/mo&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ left[0.2,1.0right] $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, achieving mean Adjusted Rand Index scores of 0.44. The application to real-world data identifies five distinct clusters through PROCSIMA (PRObabilistic Clustering SIMilarity A","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 4","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144339419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Adaptive Learning Approach to Multivariate Time Forecasting in Industrial Processes 工业过程中多元时间预测的自适应学习方法
IF 1.3 4区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-09 DOI: 10.1002/asmb.70016
Fernando Miguelez, Josu Doncel, M. D. Ugarte

Industrial processes generate a massive amount of monitoring data that can be exploited to uncover hidden time losses in the system. This can be used to enhance the accuracy of maintenance policies and increase the effectiveness of the equipment. In this work, we propose a method for one-step probabilistic multivariate forecasting of time variables involved in a production process. The method is based on an Input-Output Hidden Markov Model (IO-HMM), in which the parameters of interest are the state transition probabilities and the parameters of the observations' joint density. The ultimate goal of the method is to predict operational process times in the near future, which enables the identification of hidden losses and the location of improvement areas in the process. The input stream in the IO-HMM model includes past values of the response variables and other process features, such as calendar variables, that can have an impact on the model's parameters. The discrete part of the IO-HMM models the operational mode of the process. The state transition probabilities are supposed to change over time and are updated using Bayesian principles. The continuous part of the IO-HMM models the joint density of the response variables. The estimate of the continuous model parameters is recursively computed through an adaptive algorithm that also admits a Bayesian interpretation. The adaptive algorithm allows for efficient updating of the current parameter estimates as soon as new information is available. We evaluate the method's performance using a real data set obtained from a company in a particular sector, and the results are compared with a collection of benchmark models.

工业流程会产生大量的监控数据,可以利用这些数据来发现系统中隐藏的时间损失。这可以用来提高维护政策的准确性,提高设备的有效性。在这项工作中,我们提出了一种生产过程中涉及的时间变量的一步概率多元预测方法。该方法基于输入-输出隐马尔可夫模型(IO-HMM),其中感兴趣的参数是状态转移概率和观测值的联合密度参数。该方法的最终目标是在不久的将来预测操作过程时间,从而能够识别隐藏的损失并确定过程中改进区域的位置。IO-HMM模型中的输入流包括响应变量和其他过程特征(如日历变量)的过去值,它们会对模型的参数产生影响。IO-HMM的离散部分对过程的运行模式进行建模。状态转移概率应该随时间变化,并使用贝叶斯原理进行更新。IO-HMM的连续部分模拟了响应变量的联合密度。连续模型参数的估计是通过一种自适应算法递归计算的,该算法也承认贝叶斯解释。自适应算法允许在新信息可用时有效地更新当前参数估计。我们使用从特定行业的公司获得的真实数据集来评估该方法的性能,并将结果与一组基准模型进行比较。
{"title":"An Adaptive Learning Approach to Multivariate Time Forecasting in Industrial Processes","authors":"Fernando Miguelez,&nbsp;Josu Doncel,&nbsp;M. D. Ugarte","doi":"10.1002/asmb.70016","DOIUrl":"https://doi.org/10.1002/asmb.70016","url":null,"abstract":"<p>Industrial processes generate a massive amount of monitoring data that can be exploited to uncover hidden time losses in the system. This can be used to enhance the accuracy of maintenance policies and increase the effectiveness of the equipment. In this work, we propose a method for one-step probabilistic multivariate forecasting of time variables involved in a production process. The method is based on an Input-Output Hidden Markov Model (IO-HMM), in which the parameters of interest are the state transition probabilities and the parameters of the observations' joint density. The ultimate goal of the method is to predict operational process times in the near future, which enables the identification of hidden losses and the location of improvement areas in the process. The input stream in the IO-HMM model includes past values of the response variables and other process features, such as calendar variables, that can have an impact on the model's parameters. The discrete part of the IO-HMM models the operational mode of the process. The state transition probabilities are supposed to change over time and are updated using Bayesian principles. The continuous part of the IO-HMM models the joint density of the response variables. The estimate of the continuous model parameters is recursively computed through an adaptive algorithm that also admits a Bayesian interpretation. The adaptive algorithm allows for efficient updating of the current parameter estimates as soon as new information is available. We evaluate the method's performance using a real data set obtained from a company in a particular sector, and the results are compared with a collection of benchmark models.</p>","PeriodicalId":55495,"journal":{"name":"Applied Stochastic Models in Business and Industry","volume":"41 3","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asmb.70016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144244145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Stochastic Models in Business and Industry
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1