arXiv - STAT - Applications最新文献

英文中文

A Multi-objective Economic Statistical Design of the CUSUM chart: NSGA II Approach CUSUM 图表的多目标经济统计设计：NSGA II 方法

arXiv - STAT - Applications

Pub Date : 2024-09-07 DOI: arxiv-2409.04673

Sandeep, Arup Ranjan Mukhopadhyay

This paper presents an approach for the economic statistical design of theCumulative Sum (CUSUM) control chart in a multi-objective optimizationframework. The proposed methodology integrates economic considerations withstatistical aspects to optimize the design parameters like the sample size($n$), sampling interval ($h$), and decision interval ($H$) of the CUSUM chart.The Non-dominated Sorting Genetic Algorithm II (NSGA II) is employed to solvethe multi-objective optimization problem, aiming to minimize both the averagecost per cycle ($C_E$) and the out-of-control Average Run Length ($ARL_delta$)simultaneously. The effectiveness of the proposed approach is demonstratedthrough a numerical example by determining the optimized CUSUM chart parametersusing NSGA II. Additionally, sensitivity analysis is conducted to assess theimpact of variations in input parameters. The corresponding results indicatethat the proposed methodology significantly reduces the expected cost per cycleby about 43% when compared to the findings of the article by M. Lee in theyear 2011. A more extensive comparison with respect to both $C_E$ and$ARL_delta$ has also been provided for justifying the methodology proposed inthis article. This highlights the practical relevance and potential of thisstudy for the right application of the technique of the CUSUM chart for processcontrol purposes in industries.

本文提出了一种在多目标优化框架下对累积总和（CUSUM）控制图进行经济统计设计的方法。本文采用非优势排序遗传算法 II（NSGA II）来解决多目标优化问题，旨在同时最小化每个周期的平均成本（$C_E$）和失控平均运行长度（$ARL_delta$）。通过使用 NSGA II 确定优化的 CUSUM 图表参数的数值示例，证明了所提方法的有效性。此外，还进行了敏感性分析，以评估输入参数变化的影响。相应的结果表明，与 M. Lee 在 2011 年发表的文章中得出的结论相比，所提出的方法大大降低了每个周期的预期成本，降幅约为 43%。为了证明本文所提方法的合理性，还对 $C_E$ 和 $ARL_delta$ 进行了更广泛的比较。这凸显了本研究的实用性和潜力，有助于在工业过程控制中正确应用 CUSUM 图表技术。

{"title":"A Multi-objective Economic Statistical Design of the CUSUM chart: NSGA II Approach","authors":"Sandeep, Arup Ranjan Mukhopadhyay","doi":"arxiv-2409.04673","DOIUrl":"https://doi.org/arxiv-2409.04673","url":null,"abstract":"This paper presents an approach for the economic statistical design of the\u0000Cumulative Sum (CUSUM) control chart in a multi-objective optimization\u0000framework. The proposed methodology integrates economic considerations with\u0000statistical aspects to optimize the design parameters like the sample size\u0000($n$), sampling interval ($h$), and decision interval ($H$) of the CUSUM chart.\u0000The Non-dominated Sorting Genetic Algorithm II (NSGA II) is employed to solve\u0000the multi-objective optimization problem, aiming to minimize both the average\u0000cost per cycle ($C_E$) and the out-of-control Average Run Length ($ARL_delta$)\u0000simultaneously. The effectiveness of the proposed approach is demonstrated\u0000through a numerical example by determining the optimized CUSUM chart parameters\u0000using NSGA II. Additionally, sensitivity analysis is conducted to assess the\u0000impact of variations in input parameters. The corresponding results indicate\u0000that the proposed methodology significantly reduces the expected cost per cycle\u0000by about 43% when compared to the findings of the article by M. Lee in the\u0000year 2011. A more extensive comparison with respect to both $C_E$ and\u0000$ARL_delta$ has also been provided for justifying the methodology proposed in\u0000this article. This highlights the practical relevance and potential of this\u0000study for the right application of the technique of the CUSUM chart for process\u0000control purposes in industries.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Privacy risk from synthetic data: practical proposals 合成数据的隐私风险：实用建议

arXiv - STAT - Applications

Pub Date : 2024-09-06 DOI: arxiv-2409.04257

Gillian M Raab

This paper proposes and compares measures of identity and attributedisclosure risk for synthetic data. Data custodians can use the methodsproposed here to inform the decision as to whether to release syntheticversions of confidential data. Different measures are evaluated on two datasets. Insight into the measures is obtained by examining the details of therecords identified as posing a disclosure risk. This leads to methods toidentify, and possibly exclude, apparently risky records where theidentification or attribution would be expected by someone with backgroundknowledge of the data. The methods described are available as part of thetextbf{synthpop} package for textbf{R}.

本文提出并比较了合成数据的身份和归属披露风险测量方法。数据保管人可以使用本文提出的方法来决定是否发布机密数据的合成版本。在两个数据集上对不同的测量方法进行了评估。通过检查被识别为具有披露风险的记录的详细信息，可以深入了解这些措施。这就产生了一些方法，用于识别并在可能的情况下排除明显存在风险的记录，而这些记录的识别或归属是了解数据背景知识的人所预料到的。所述方法可作为 textbf{R} 的 textbf{synthpop} 软件包的一部分。

引用次数: 0

Enhancing Electrocardiography Data Classification Confidence: A Robust Gaussian Process Approach (MuyGPs) 增强心电图数据分类的可信度：鲁棒高斯过程方法 (MuyGPs)

arXiv - STAT - Applications

Pub Date : 2024-09-06 DOI: arxiv-2409.04642

Ukamaka V. Nnyaba, Hewan M. Shemtaga, David W. Collins, Amanda L. Muyskens, Benjamin W. Priest, Nedret Billor

Analyzing electrocardiography (ECG) data is essential for diagnosing andmonitoring various heart diseases. The clinical adoption of automated methodsrequires accurate confidence measurements, which are largely absent fromexisting classification methods. In this paper, we present a robust GaussianProcess classification hyperparameter training model (MuyGPs) for discerningnormal heartbeat signals from the signals affected by different arrhythmias andmyocardial infarction. We compare the performance of MuyGPs with traditionalGaussian process classifier as well as conventional machine learning models,such as, Random Forest, Extra Trees, k-Nearest Neighbors and ConvolutionalNeural Network. Comparing these models reveals MuyGPs as the most performantmodel for making confident predictions on individual patient ECGs. Furthermore,we explore the posterior distribution obtained from the Gaussian process tointerpret the prediction and quantify uncertainty. In addition, we provide aguideline on obtaining the prediction confidence of the machine learning modelsand quantitatively compare the uncertainty measures of these models.Particularly, we identify a class of less-accurate (ambiguous) signals forfurther diagnosis by an expert.

分析心电图（ECG）数据对于诊断和监测各种心脏疾病至关重要。自动方法的临床应用需要精确的置信度测量，而现有的分类方法大多不具备这种能力。在本文中，我们提出了一种鲁棒高斯过程分类超参数训练模型（MuyGPs），用于从受不同心律失常和心肌梗塞影响的信号中分辨出正常心跳信号。我们将 MuyGPs 的性能与传统高斯过程分类器以及随机森林、额外树、k-近邻和卷积神经网络等传统机器学习模型进行了比较。对这些模型进行比较后发现，MuyGPs 是对单个患者心电图进行可靠预测的性能最好的模型。此外，我们还探索了从高斯过程中获得的后验分布，以解释预测结果并量化不确定性。此外，我们还提供了获得机器学习模型预测置信度的指南，并定量比较了这些模型的不确定性度量。

{"title":"Enhancing Electrocardiography Data Classification Confidence: A Robust Gaussian Process Approach (MuyGPs)","authors":"Ukamaka V. Nnyaba, Hewan M. Shemtaga, David W. Collins, Amanda L. Muyskens, Benjamin W. Priest, Nedret Billor","doi":"arxiv-2409.04642","DOIUrl":"https://doi.org/arxiv-2409.04642","url":null,"abstract":"Analyzing electrocardiography (ECG) data is essential for diagnosing and\u0000monitoring various heart diseases. The clinical adoption of automated methods\u0000requires accurate confidence measurements, which are largely absent from\u0000existing classification methods. In this paper, we present a robust Gaussian\u0000Process classification hyperparameter training model (MuyGPs) for discerning\u0000normal heartbeat signals from the signals affected by different arrhythmias and\u0000myocardial infarction. We compare the performance of MuyGPs with traditional\u0000Gaussian process classifier as well as conventional machine learning models,\u0000such as, Random Forest, Extra Trees, k-Nearest Neighbors and Convolutional\u0000Neural Network. Comparing these models reveals MuyGPs as the most performant\u0000model for making confident predictions on individual patient ECGs. Furthermore,\u0000we explore the posterior distribution obtained from the Gaussian process to\u0000interpret the prediction and quantify uncertainty. In addition, we provide a\u0000guideline on obtaining the prediction confidence of the machine learning models\u0000and quantitatively compare the uncertainty measures of these models.\u0000Particularly, we identify a class of less-accurate (ambiguous) signals for\u0000further diagnosis by an expert.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Measuring Sell Side Outcomes in Buy Side Marketplace Experiments using In-Experiment Bipartite Graph 在买方市场实验中使用实验内双方图衡量卖方结果

arXiv - STAT - Applications

Pub Date : 2024-09-06 DOI: arxiv-2409.04174

Vaiva Pilkauskaitė, Jevgenij Gamper, Rasa Giniūnaitė, Agne Reklaitė

In this study, we evaluate causal inference estimators for online controlledbipartite graph experiments in a real marketplace setting. Our novelcontribution is constructing a bipartite graph using in-experiment data, ratherthan relying on prior knowledge or historical data, the common approach in theliterature published to date. We build the bipartite graph from variousinteractions between buyers and sellers in the marketplace, establishing anovel research direction at the intersection of bipartite experiments andmediation analysis. This approach is crucial for modern marketplaces aiming toevaluate seller-side causal effects in buyer-side experiments, or vice versa.We demonstrate our method using historical buyer-side experiments conducted atVinted, the largest second-hand marketplace in Europe with over 80M users.

在本研究中，我们评估了真实市场环境中在线控制双方图实验的因果推理估计器。我们的新贡献是使用实验中的数据构建双方图，而不是依赖于先验知识或历史数据，这是迄今为止发表的文献中的常见方法。我们从市场中买卖双方的各种互动中构建双向图，在双向实验和中介分析的交叉点上确立了一个新的研究方向。这种方法对于旨在评估买方实验中卖方因果效应的现代市场至关重要，反之亦然。我们使用在欧洲最大的二手市场、拥有 8000 多万用户的 Vinted 进行的历史买方实验来演示我们的方法。

引用次数: 0

Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning 通过异构迁移学习对遥感场景进行深度聚类

arXiv - STAT - Applications

Pub Date : 2024-09-05 DOI: arxiv-2409.03938

Isaac Ray, Alexei Skurikhin

This paper proposes a method for unsupervised whole-image clustering of atarget dataset of remote sensing scenes with no labels. The method consists ofthree main steps: (1) finetuning a pretrained deep neural network (DINOv2) on alabelled source remote sensing imagery dataset and using it to extract afeature vector from each image in the target dataset, (2) reducing thedimension of these deep features via manifold projection into a low-dimensionalEuclidean space, and (3) clustering the embedded features using a Bayesiannonparametric technique to infer the number and membership of clusterssimultaneously. The method takes advantage of heterogeneous transfer learningto cluster unseen data with different feature and label distributions. Wedemonstrate the performance of this approach outperforming state-of-the-artzero-shot classification methods on several remote sensing scene classificationdatasets.

本文提出了一种对无标签遥感场景目标数据集进行无监督全图像聚类的方法。该方法包括三个主要步骤：(1) 在有标签的源遥感图像数据集上微调预训练的深度神经网络（DINOv2），并用它从目标数据集中的每幅图像中提取特征向量；(2) 通过流形投影将这些深度特征的维度降低到低维欧几里得空间；(3) 使用贝叶斯非参数技术对嵌入特征进行聚类，以同时推断聚类的数量和成员资格。该方法利用异质迁移学习的优势，对具有不同特征和标签分布的未见数据进行聚类。我们在几个遥感场景分类数据集上演示了这种方法的性能，其性能优于石英零点分类方法。

引用次数: 0

Causal effect of the infield shift in the MLB MLB 内野转移的因果效应

arXiv - STAT - Applications

Pub Date : 2024-09-05 DOI: arxiv-2409.03940

Sonia Markes, Linbo Wang, Jessica Gronsbell, Katherine Evans

The infield shift has been increasingly used as a defensive strategy inbaseball in recent years. Along with the upward trend in its usage, thenotoriety of the shift has grown, as it is believed to be responsible for therecent decline in offence. In the 2023 season, Major League Baseball (MLB)implemented a rule change prohibiting the infield shift. However, there hasbeen no systematic analysis of the effectiveness of infield shift to determineif it is a cause of the cooling in offence. We used publicly available data onMLB from 2015-2022 to evaluate the causal effect of the infield shift on theexpected runs scored. We employed three methods for drawing causal conclusionsfrom observational data -- nearest neighbour matching, inverse probability oftreatment weighting, and instrumental variable analysis -- and evaluated thecausal effect in subgroups defined by batter-handedness. The results of allmethods showed the shift is effective at preventing runs, but primarily forleft-handed batters.

近年来，内野移位作为一种防守策略在棒球比赛中被越来越多地使用。随着其使用率呈上升趋势，内野转移的名声也越来越大，因为人们认为它是近期进攻下降的罪魁祸首。2023 赛季，美国职业棒球大联盟（MLB）对规则进行了修改，禁止内野转移。然而，目前还没有对内野转移的有效性进行系统分析，以确定它是否是导致进攻下降的原因。我们利用美国职业棒球联盟 2015-2022 年的公开数据，评估了内野转移对预期得分的因果效应。我们采用了三种从观察数据中得出因果结论的方法--近邻匹配法、逆概率处理加权法和工具变量分析法--并评估了根据打者手感定义的亚组的因果效应。所有方法的结果都表明，变速能有效防止跑垒，但主要针对左撇子击球手。

引用次数: 0

A Stochastic Weather Model: A Case of Bono Region of Ghana 随机天气模型：加纳博诺地区案例

arXiv - STAT - Applications

Pub Date : 2024-09-04 DOI: arxiv-2409.06731

Bernard Gyamfi

The paper sought to fit an Ornstein Uhlenbeck model with seasonal mean andvolatility, where the residuals are generated by a Brownian motion for Ghaniandaily average temperature. This paper employed the modified Ornstein Uhlenbeckmodel proposed by Bhowan which has a seasonal mean and stochastic volatilityprocess. The findings revealed that, the Bono region experiences warmtemperatures and maximum precipitation up to 32.67 degree celsius and 126.51mmrespectively. It was observed that the Daily Average Temperature (DAT) of theregion reverts to a temperature of approximately 26 degree celsius at a rate of18.72% with maximum and minimum temperatures of 32.67degree celsius and19.75degree celsius respectively. Although the region is in the middle belt ofGhana, it still experiences warm(hot) temperatures daily and experiences dryseasons relatively more than wet seasons in the number of years considered forour analysis. Our model explained approximately 50% of the variations in thedaily average temperature of the region which can be regarded as relatively agood model. The findings of this paper are relevant in the pricing of weatherderivatives with temperature as an underlying variable in the Ghanaianfinancial and agricultural sector. Furthermore, it would assist in thedevelopment and design of tailored agriculture/crop insurance models whichwould incorporate temperature dynamics rather than extreme weatherconditions/events such as floods, drought and wildfires.

本文试图拟合一个具有季节平均值和波动率的 Ornstein Uhlenbeck 模型，其中残差由加纳日平均气温的布朗运动产生。本文采用了 Bhowan 提出的改进型 Ornstein Uhlenbeck 模型，该模型具有季节平均值和随机波动过程。研究结果表明，博诺地区气温较高，降水量最大，分别达到 32.67 摄氏度和 126.51 毫米。据观察，该地区的日平均气温（DAT）以 18.72% 的速率回升至约 26 摄氏度，最高和最低气温分别为 32.67 摄氏度和 19.75 摄氏度。虽然该地区位于加纳的中间地带，但每天的气温仍然很高（炎热），在我们分析的年份中，旱季相对多于雨季。我们的模型解释了该地区日平均气温变化的大约 50%，可以说是一个相对较好的模型。本文的研究结果与加纳金融和农业部门以气温为基础变量的天气衍生品定价相关。此外，本文还有助于开发和设计量身定制的农业/农作物保险模型，这些模型将纳入气温动态而非洪水、干旱和野火等极端天气条件/事件。

{"title":"A Stochastic Weather Model: A Case of Bono Region of Ghana","authors":"Bernard Gyamfi","doi":"arxiv-2409.06731","DOIUrl":"https://doi.org/arxiv-2409.06731","url":null,"abstract":"The paper sought to fit an Ornstein Uhlenbeck model with seasonal mean and\u0000volatility, where the residuals are generated by a Brownian motion for Ghanian\u0000daily average temperature. This paper employed the modified Ornstein Uhlenbeck\u0000model proposed by Bhowan which has a seasonal mean and stochastic volatility\u0000process. The findings revealed that, the Bono region experiences warm\u0000temperatures and maximum precipitation up to 32.67 degree celsius and 126.51mm\u0000respectively. It was observed that the Daily Average Temperature (DAT) of the\u0000region reverts to a temperature of approximately 26 degree celsius at a rate of\u000018.72% with maximum and minimum temperatures of 32.67degree celsius and\u000019.75degree celsius respectively. Although the region is in the middle belt of\u0000Ghana, it still experiences warm(hot) temperatures daily and experiences dry\u0000seasons relatively more than wet seasons in the number of years considered for\u0000our analysis. Our model explained approximately 50% of the variations in the\u0000daily average temperature of the region which can be regarded as relatively a\u0000good model. The findings of this paper are relevant in the pricing of weather\u0000derivatives with temperature as an underlying variable in the Ghanaian\u0000financial and agricultural sector. Furthermore, it would assist in the\u0000development and design of tailored agriculture/crop insurance models which\u0000would incorporate temperature dynamics rather than extreme weather\u0000conditions/events such as floods, drought and wildfires.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meal-taking activity monitoring in the elderly based on sensor data: Comparison of unsupervised classification methods 基于传感器数据的老年人进餐活动监测：无监督分类方法的比较

arXiv - STAT - Applications

Pub Date : 2024-09-04 DOI: arxiv-2409.02971

Abderrahim DerouicheLAAS-S4M, UT3, Damien BrulinLAAS-S4M, UT2J, Eric CampoLAAS-S4M, UT2J, Antoine Piau

In an era marked by a demographic change towards an older population, thereis an urgent need to improve nutritional monitoring in view of the increase infrailty. This research aims to enhance the identification of meal-takingactivities by combining K-Means, GMM, and DBSCAN techniques. Using theDavies-Bouldin Index (DBI) for the optimal meal taking activity clustering, theresults show that K-Means seems to be the best solution, thanks to itsunrivalled efficiency in data demarcation, compared with the capabilities ofGMM and DBSCAN. Although capable of identifying complex patterns and outliers,the latter methods are limited by their operational complexities and dependenceon precise parameter configurations. In this paper, we have processed data from4 houses equipped with sensors. The findings indicate that applying the K-Meansmethod results in high performance, evidenced by a particularly lowDavies-Bouldin Index (DBI), illustrating optimal cluster separation andcohesion. Calculating the average duration of each activity using the GMMalgorithm allows distinguishing various categories of meal-taking activities.Alternatively, this can correspond to different times of the day fitting toeach meal-taking activity. Using K-Means, GMM, and DBSCAN clusteringalgorithms, the study demonstrates an effective strategy for thoroughlyunderstanding the data. This approach facilitates the comparison and selectionof the most suitable method for optimal meal-taking activity clustering.

在人口结构向老龄化转变的时代，鉴于体弱人数的增加，迫切需要改进营养监测。本研究旨在结合 K-Means、GMM 和 DBSCAN 技术，加强对进餐活动的识别。使用戴维斯-博尔丁指数（DBI）进行最佳进餐活动聚类，结果表明，与 GMM 和 DBSCAN 的能力相比，K-Means 在数据划分方面具有无与伦比的效率，因此似乎是最佳解决方案。后两种方法虽然能够识别复杂模式和异常值，但受限于其操作复杂性和对精确参数配置的依赖性。在本文中，我们处理了装有传感器的 4 所房屋的数据。研究结果表明，K-均值法的性能很高，戴维斯-博尔丁指数（DBI）特别低，说明聚类分离和聚合效果最佳。使用 GMM 算法计算每项活动的平均持续时间可以区分不同类别的进餐活动。本研究使用 K-Means、GMM 和 DBSCAN 聚类算法，展示了一种彻底理解数据的有效策略。这种方法有助于比较和选择最合适的方法来优化进餐活动聚类。

{"title":"Meal-taking activity monitoring in the elderly based on sensor data: Comparison of unsupervised classification methods","authors":"Abderrahim DerouicheLAAS-S4M, UT3, Damien BrulinLAAS-S4M, UT2J, Eric CampoLAAS-S4M, UT2J, Antoine Piau","doi":"arxiv-2409.02971","DOIUrl":"https://doi.org/arxiv-2409.02971","url":null,"abstract":"In an era marked by a demographic change towards an older population, there\u0000is an urgent need to improve nutritional monitoring in view of the increase in\u0000frailty. This research aims to enhance the identification of meal-taking\u0000activities by combining K-Means, GMM, and DBSCAN techniques. Using the\u0000Davies-Bouldin Index (DBI) for the optimal meal taking activity clustering, the\u0000results show that K-Means seems to be the best solution, thanks to its\u0000unrivalled efficiency in data demarcation, compared with the capabilities of\u0000GMM and DBSCAN. Although capable of identifying complex patterns and outliers,\u0000the latter methods are limited by their operational complexities and dependence\u0000on precise parameter configurations. In this paper, we have processed data from\u00004 houses equipped with sensors. The findings indicate that applying the K-Means\u0000method results in high performance, evidenced by a particularly low\u0000Davies-Bouldin Index (DBI), illustrating optimal cluster separation and\u0000cohesion. Calculating the average duration of each activity using the GMM\u0000algorithm allows distinguishing various categories of meal-taking activities.\u0000Alternatively, this can correspond to different times of the day fitting to\u0000each meal-taking activity. Using K-Means, GMM, and DBSCAN clustering\u0000algorithms, the study demonstrates an effective strategy for thoroughly\u0000understanding the data. This approach facilitates the comparison and selection\u0000of the most suitable method for optimal meal-taking activity clustering.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fundamental properties of linear factor models 线性因子模型的基本特性

arXiv - STAT - Applications

Pub Date : 2024-09-04 DOI: arxiv-2409.02521

Damir Filipovic, Paul Schneider

We study conditional linear factor models in the context of asset pricingpanels. Our analysis focuses on conditional means and covariances tocharacterize the cross-sectional and inter-temporal properties of returns andfactors as well as their interrelationships. We also review the conditionsoutlined in Kozak and Nagel (2024) and show how the conditional mean-varianceefficient portfolio of an unbalanced panel can be spanned by low-dimensionalfactor portfolios, even without assuming invertibility of the conditionalcovariance matrices. Our analysis provides a comprehensive foundation for thespecification and estimation of conditional linear factor models.

我们研究了资产定价面板背景下的条件线性因子模型。我们的分析侧重于条件均值和协方差，以描述收益和因子的横截面和跨期属性及其相互关系。我们还回顾了 Kozak 和 Nagel（2024 年）中列出的条件，并展示了即使不假设条件协方差矩阵的可逆性，非平衡面板的条件均值-协方差系数投资组合也能被低维因子投资组合跨越。我们的分析为条件线性因子模型的指定和估计提供了全面的基础。

引用次数: 0

Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon 使用 LSTM 和 GRU 的神经网络模拟亚马逊活跃火灾

arXiv - STAT - Applications

Pub Date : 2024-09-04 DOI: arxiv-2409.02681

Ramon Tavares

This study presents a comprehensive methodology for modeling and forecastingthe historical time series of fire spots detected by the AQUA_M-T satellite inthe Amazon, Brazil. The approach utilizes a mixed Recurrent Neural Network(RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit(GRU) architectures to predict monthly accumulations of daily detected firespots. A summary of the data revealed a consistent seasonality over time, withannual maximum and minimum fire spot values tending to repeat at the sameperiods each year. The primary objective is to verify whether the forecastscapture this inherent seasonality through rigorous statistical analysis. Themethodology involved careful data preparation, model configuration, andtraining using cross-validation with two seeds, ensuring that the datageneralizes well to the test and validation sets, and confirming theconvergence of the model parameters. The results indicate that the mixed LSTMand GRU model offers improved accuracy in forecasting 12 months ahead,demonstrating its effectiveness in capturing complex temporal patterns andmodeling the observed time series. This research significantly contributes tothe application of deep learning techniques in environmental monitoring,specifically in fire spot forecasting. In addition to improving forecastaccuracy, the proposed approach highlights the potential for adaptation toother time series forecasting challenges, opening new avenues for research anddevelopment in machine learning and natural phenomenon prediction. Keywords:Time Series Forecasting, Recurrent Neural Networks, Deep Learning.

本研究提出了一种综合方法，用于模拟和预测巴西亚马逊地区 AQUA_M-T 卫星探测到的火点历史时间序列。该方法利用混合递归神经网络（RNN）模型，结合长短期记忆（LSTM）和门控递归单元（GRU）架构，预测每日检测到的火点的月度累积量。对数据的总结显示，随着时间的推移，火点具有一致的季节性，每年的最大和最小火点值往往在同一时期重复出现。主要目的是通过严格的统计分析来验证预测是否捕捉到了这种固有的季节性。该方法包括仔细的数据准备、模型配置和使用两个种子进行交叉验证的训练，确保数据能很好地概括到测试集和验证集，并确认模型参数的收敛性。结果表明，LSTM 和 GRU 混合模型提高了对未来 12 个月预测的准确性，证明了其在捕捉复杂时间模式和模拟观测时间序列方面的有效性。这项研究极大地促进了深度学习技术在环境监测领域的应用，特别是在火点预测方面。除了提高预测准确性，所提出的方法还突出了适应其他时间序列预测挑战的潜力，为机器学习和自然现象预测的研究与发展开辟了新途径。关键词：时间序列预测、循环神经网络、深度学习。

{"title":"Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon","authors":"Ramon Tavares","doi":"arxiv-2409.02681","DOIUrl":"https://doi.org/arxiv-2409.02681","url":null,"abstract":"This study presents a comprehensive methodology for modeling and forecasting\u0000the historical time series of fire spots detected by the AQUA_M-T satellite in\u0000the Amazon, Brazil. The approach utilizes a mixed Recurrent Neural Network\u0000(RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit\u0000(GRU) architectures to predict monthly accumulations of daily detected fire\u0000spots. A summary of the data revealed a consistent seasonality over time, with\u0000annual maximum and minimum fire spot values tending to repeat at the same\u0000periods each year. The primary objective is to verify whether the forecasts\u0000capture this inherent seasonality through rigorous statistical analysis. The\u0000methodology involved careful data preparation, model configuration, and\u0000training using cross-validation with two seeds, ensuring that the data\u0000generalizes well to the test and validation sets, and confirming the\u0000convergence of the model parameters. The results indicate that the mixed LSTM\u0000and GRU model offers improved accuracy in forecasting 12 months ahead,\u0000demonstrating its effectiveness in capturing complex temporal patterns and\u0000modeling the observed time series. This research significantly contributes to\u0000the application of deep learning techniques in environmental monitoring,\u0000specifically in fire spot forecasting. In addition to improving forecast\u0000accuracy, the proposed approach highlights the potential for adaptation to\u0000other time series forecasting challenges, opening new avenues for research and\u0000development in machine learning and natural phenomenon prediction. Keywords:\u0000Time Series Forecasting, Recurrent Neural Networks, Deep Learning.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - STAT - Applications

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀