This paper presents an approach for the economic statistical design of the Cumulative Sum (CUSUM) control chart in a multi-objective optimization framework. The proposed methodology integrates economic considerations with statistical aspects to optimize the design parameters like the sample size ($n$), sampling interval ($h$), and decision interval ($H$) of the CUSUM chart. The Non-dominated Sorting Genetic Algorithm II (NSGA II) is employed to solve the multi-objective optimization problem, aiming to minimize both the average cost per cycle ($C_E$) and the out-of-control Average Run Length ($ARL_delta$) simultaneously. The effectiveness of the proposed approach is demonstrated through a numerical example by determining the optimized CUSUM chart parameters using NSGA II. Additionally, sensitivity analysis is conducted to assess the impact of variations in input parameters. The corresponding results indicate that the proposed methodology significantly reduces the expected cost per cycle by about 43% when compared to the findings of the article by M. Lee in the year 2011. A more extensive comparison with respect to both $C_E$ and $ARL_delta$ has also been provided for justifying the methodology proposed in this article. This highlights the practical relevance and potential of this study for the right application of the technique of the CUSUM chart for process control purposes in industries.
本文提出了一种在多目标优化框架下对累积总和(CUSUM)控制图进行经济统计设计的方法。本文采用非优势排序遗传算法 II(NSGA II)来解决多目标优化问题,旨在同时最小化每个周期的平均成本($C_E$)和失控平均运行长度($ARL_delta$)。通过使用 NSGA II 确定优化的 CUSUM 图表参数的数值示例,证明了所提方法的有效性。此外,还进行了敏感性分析,以评估输入参数变化的影响。相应的结果表明,与 M. Lee 在 2011 年发表的文章中得出的结论相比,所提出的方法大大降低了每个周期的预期成本,降幅约为 43%。为了证明本文所提方法的合理性,还对 $C_E$ 和 $ARL_delta$ 进行了更广泛的比较。这凸显了本研究的实用性和潜力,有助于在工业过程控制中正确应用 CUSUM 图表技术。
{"title":"A Multi-objective Economic Statistical Design of the CUSUM chart: NSGA II Approach","authors":"Sandeep, Arup Ranjan Mukhopadhyay","doi":"arxiv-2409.04673","DOIUrl":"https://doi.org/arxiv-2409.04673","url":null,"abstract":"This paper presents an approach for the economic statistical design of the\u0000Cumulative Sum (CUSUM) control chart in a multi-objective optimization\u0000framework. The proposed methodology integrates economic considerations with\u0000statistical aspects to optimize the design parameters like the sample size\u0000($n$), sampling interval ($h$), and decision interval ($H$) of the CUSUM chart.\u0000The Non-dominated Sorting Genetic Algorithm II (NSGA II) is employed to solve\u0000the multi-objective optimization problem, aiming to minimize both the average\u0000cost per cycle ($C_E$) and the out-of-control Average Run Length ($ARL_delta$)\u0000simultaneously. The effectiveness of the proposed approach is demonstrated\u0000through a numerical example by determining the optimized CUSUM chart parameters\u0000using NSGA II. Additionally, sensitivity analysis is conducted to assess the\u0000impact of variations in input parameters. The corresponding results indicate\u0000that the proposed methodology significantly reduces the expected cost per cycle\u0000by about 43% when compared to the findings of the article by M. Lee in the\u0000year 2011. A more extensive comparison with respect to both $C_E$ and\u0000$ARL_delta$ has also been provided for justifying the methodology proposed in\u0000this article. This highlights the practical relevance and potential of this\u0000study for the right application of the technique of the CUSUM chart for process\u0000control purposes in industries.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes and compares measures of identity and attribute disclosure risk for synthetic data. Data custodians can use the methods proposed here to inform the decision as to whether to release synthetic versions of confidential data. Different measures are evaluated on two data sets. Insight into the measures is obtained by examining the details of the records identified as posing a disclosure risk. This leads to methods to identify, and possibly exclude, apparently risky records where the identification or attribution would be expected by someone with background knowledge of the data. The methods described are available as part of the textbf{synthpop} package for textbf{R}.
{"title":"Privacy risk from synthetic data: practical proposals","authors":"Gillian M Raab","doi":"arxiv-2409.04257","DOIUrl":"https://doi.org/arxiv-2409.04257","url":null,"abstract":"This paper proposes and compares measures of identity and attribute\u0000disclosure risk for synthetic data. Data custodians can use the methods\u0000proposed here to inform the decision as to whether to release synthetic\u0000versions of confidential data. Different measures are evaluated on two data\u0000sets. Insight into the measures is obtained by examining the details of the\u0000records identified as posing a disclosure risk. This leads to methods to\u0000identify, and possibly exclude, apparently risky records where the\u0000identification or attribution would be expected by someone with background\u0000knowledge of the data. The methods described are available as part of the\u0000textbf{synthpop} package for textbf{R}.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ukamaka V. Nnyaba, Hewan M. Shemtaga, David W. Collins, Amanda L. Muyskens, Benjamin W. Priest, Nedret Billor
Analyzing electrocardiography (ECG) data is essential for diagnosing and monitoring various heart diseases. The clinical adoption of automated methods requires accurate confidence measurements, which are largely absent from existing classification methods. In this paper, we present a robust Gaussian Process classification hyperparameter training model (MuyGPs) for discerning normal heartbeat signals from the signals affected by different arrhythmias and myocardial infarction. We compare the performance of MuyGPs with traditional Gaussian process classifier as well as conventional machine learning models, such as, Random Forest, Extra Trees, k-Nearest Neighbors and Convolutional Neural Network. Comparing these models reveals MuyGPs as the most performant model for making confident predictions on individual patient ECGs. Furthermore, we explore the posterior distribution obtained from the Gaussian process to interpret the prediction and quantify uncertainty. In addition, we provide a guideline on obtaining the prediction confidence of the machine learning models and quantitatively compare the uncertainty measures of these models. Particularly, we identify a class of less-accurate (ambiguous) signals for further diagnosis by an expert.
{"title":"Enhancing Electrocardiography Data Classification Confidence: A Robust Gaussian Process Approach (MuyGPs)","authors":"Ukamaka V. Nnyaba, Hewan M. Shemtaga, David W. Collins, Amanda L. Muyskens, Benjamin W. Priest, Nedret Billor","doi":"arxiv-2409.04642","DOIUrl":"https://doi.org/arxiv-2409.04642","url":null,"abstract":"Analyzing electrocardiography (ECG) data is essential for diagnosing and\u0000monitoring various heart diseases. The clinical adoption of automated methods\u0000requires accurate confidence measurements, which are largely absent from\u0000existing classification methods. In this paper, we present a robust Gaussian\u0000Process classification hyperparameter training model (MuyGPs) for discerning\u0000normal heartbeat signals from the signals affected by different arrhythmias and\u0000myocardial infarction. We compare the performance of MuyGPs with traditional\u0000Gaussian process classifier as well as conventional machine learning models,\u0000such as, Random Forest, Extra Trees, k-Nearest Neighbors and Convolutional\u0000Neural Network. Comparing these models reveals MuyGPs as the most performant\u0000model for making confident predictions on individual patient ECGs. Furthermore,\u0000we explore the posterior distribution obtained from the Gaussian process to\u0000interpret the prediction and quantify uncertainty. In addition, we provide a\u0000guideline on obtaining the prediction confidence of the machine learning models\u0000and quantitatively compare the uncertainty measures of these models.\u0000Particularly, we identify a class of less-accurate (ambiguous) signals for\u0000further diagnosis by an expert.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vaiva Pilkauskaitė, Jevgenij Gamper, Rasa Giniūnaitė, Agne Reklaitė
In this study, we evaluate causal inference estimators for online controlled bipartite graph experiments in a real marketplace setting. Our novel contribution is constructing a bipartite graph using in-experiment data, rather than relying on prior knowledge or historical data, the common approach in the literature published to date. We build the bipartite graph from various interactions between buyers and sellers in the marketplace, establishing a novel research direction at the intersection of bipartite experiments and mediation analysis. This approach is crucial for modern marketplaces aiming to evaluate seller-side causal effects in buyer-side experiments, or vice versa. We demonstrate our method using historical buyer-side experiments conducted at Vinted, the largest second-hand marketplace in Europe with over 80M users.
{"title":"Towards Measuring Sell Side Outcomes in Buy Side Marketplace Experiments using In-Experiment Bipartite Graph","authors":"Vaiva Pilkauskaitė, Jevgenij Gamper, Rasa Giniūnaitė, Agne Reklaitė","doi":"arxiv-2409.04174","DOIUrl":"https://doi.org/arxiv-2409.04174","url":null,"abstract":"In this study, we evaluate causal inference estimators for online controlled\u0000bipartite graph experiments in a real marketplace setting. Our novel\u0000contribution is constructing a bipartite graph using in-experiment data, rather\u0000than relying on prior knowledge or historical data, the common approach in the\u0000literature published to date. We build the bipartite graph from various\u0000interactions between buyers and sellers in the marketplace, establishing a\u0000novel research direction at the intersection of bipartite experiments and\u0000mediation analysis. This approach is crucial for modern marketplaces aiming to\u0000evaluate seller-side causal effects in buyer-side experiments, or vice versa.\u0000We demonstrate our method using historical buyer-side experiments conducted at\u0000Vinted, the largest second-hand marketplace in Europe with over 80M users.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a method for unsupervised whole-image clustering of a target dataset of remote sensing scenes with no labels. The method consists of three main steps: (1) finetuning a pretrained deep neural network (DINOv2) on a labelled source remote sensing imagery dataset and using it to extract a feature vector from each image in the target dataset, (2) reducing the dimension of these deep features via manifold projection into a low-dimensional Euclidean space, and (3) clustering the embedded features using a Bayesian nonparametric technique to infer the number and membership of clusters simultaneously. The method takes advantage of heterogeneous transfer learning to cluster unseen data with different feature and label distributions. We demonstrate the performance of this approach outperforming state-of-the-art zero-shot classification methods on several remote sensing scene classification datasets.
{"title":"Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning","authors":"Isaac Ray, Alexei Skurikhin","doi":"arxiv-2409.03938","DOIUrl":"https://doi.org/arxiv-2409.03938","url":null,"abstract":"This paper proposes a method for unsupervised whole-image clustering of a\u0000target dataset of remote sensing scenes with no labels. The method consists of\u0000three main steps: (1) finetuning a pretrained deep neural network (DINOv2) on a\u0000labelled source remote sensing imagery dataset and using it to extract a\u0000feature vector from each image in the target dataset, (2) reducing the\u0000dimension of these deep features via manifold projection into a low-dimensional\u0000Euclidean space, and (3) clustering the embedded features using a Bayesian\u0000nonparametric technique to infer the number and membership of clusters\u0000simultaneously. The method takes advantage of heterogeneous transfer learning\u0000to cluster unseen data with different feature and label distributions. We\u0000demonstrate the performance of this approach outperforming state-of-the-art\u0000zero-shot classification methods on several remote sensing scene classification\u0000datasets.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"143 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The infield shift has been increasingly used as a defensive strategy in baseball in recent years. Along with the upward trend in its usage, the notoriety of the shift has grown, as it is believed to be responsible for the recent decline in offence. In the 2023 season, Major League Baseball (MLB) implemented a rule change prohibiting the infield shift. However, there has been no systematic analysis of the effectiveness of infield shift to determine if it is a cause of the cooling in offence. We used publicly available data on MLB from 2015-2022 to evaluate the causal effect of the infield shift on the expected runs scored. We employed three methods for drawing causal conclusions from observational data -- nearest neighbour matching, inverse probability of treatment weighting, and instrumental variable analysis -- and evaluated the causal effect in subgroups defined by batter-handedness. The results of all methods showed the shift is effective at preventing runs, but primarily for left-handed batters.
{"title":"Causal effect of the infield shift in the MLB","authors":"Sonia Markes, Linbo Wang, Jessica Gronsbell, Katherine Evans","doi":"arxiv-2409.03940","DOIUrl":"https://doi.org/arxiv-2409.03940","url":null,"abstract":"The infield shift has been increasingly used as a defensive strategy in\u0000baseball in recent years. Along with the upward trend in its usage, the\u0000notoriety of the shift has grown, as it is believed to be responsible for the\u0000recent decline in offence. In the 2023 season, Major League Baseball (MLB)\u0000implemented a rule change prohibiting the infield shift. However, there has\u0000been no systematic analysis of the effectiveness of infield shift to determine\u0000if it is a cause of the cooling in offence. We used publicly available data on\u0000MLB from 2015-2022 to evaluate the causal effect of the infield shift on the\u0000expected runs scored. We employed three methods for drawing causal conclusions\u0000from observational data -- nearest neighbour matching, inverse probability of\u0000treatment weighting, and instrumental variable analysis -- and evaluated the\u0000causal effect in subgroups defined by batter-handedness. The results of all\u0000methods showed the shift is effective at preventing runs, but primarily for\u0000left-handed batters.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper sought to fit an Ornstein Uhlenbeck model with seasonal mean and volatility, where the residuals are generated by a Brownian motion for Ghanian daily average temperature. This paper employed the modified Ornstein Uhlenbeck model proposed by Bhowan which has a seasonal mean and stochastic volatility process. The findings revealed that, the Bono region experiences warm temperatures and maximum precipitation up to 32.67 degree celsius and 126.51mm respectively. It was observed that the Daily Average Temperature (DAT) of the region reverts to a temperature of approximately 26 degree celsius at a rate of 18.72% with maximum and minimum temperatures of 32.67degree celsius and 19.75degree celsius respectively. Although the region is in the middle belt of Ghana, it still experiences warm(hot) temperatures daily and experiences dry seasons relatively more than wet seasons in the number of years considered for our analysis. Our model explained approximately 50% of the variations in the daily average temperature of the region which can be regarded as relatively a good model. The findings of this paper are relevant in the pricing of weather derivatives with temperature as an underlying variable in the Ghanaian financial and agricultural sector. Furthermore, it would assist in the development and design of tailored agriculture/crop insurance models which would incorporate temperature dynamics rather than extreme weather conditions/events such as floods, drought and wildfires.
{"title":"A Stochastic Weather Model: A Case of Bono Region of Ghana","authors":"Bernard Gyamfi","doi":"arxiv-2409.06731","DOIUrl":"https://doi.org/arxiv-2409.06731","url":null,"abstract":"The paper sought to fit an Ornstein Uhlenbeck model with seasonal mean and\u0000volatility, where the residuals are generated by a Brownian motion for Ghanian\u0000daily average temperature. This paper employed the modified Ornstein Uhlenbeck\u0000model proposed by Bhowan which has a seasonal mean and stochastic volatility\u0000process. The findings revealed that, the Bono region experiences warm\u0000temperatures and maximum precipitation up to 32.67 degree celsius and 126.51mm\u0000respectively. It was observed that the Daily Average Temperature (DAT) of the\u0000region reverts to a temperature of approximately 26 degree celsius at a rate of\u000018.72% with maximum and minimum temperatures of 32.67degree celsius and\u000019.75degree celsius respectively. Although the region is in the middle belt of\u0000Ghana, it still experiences warm(hot) temperatures daily and experiences dry\u0000seasons relatively more than wet seasons in the number of years considered for\u0000our analysis. Our model explained approximately 50% of the variations in the\u0000daily average temperature of the region which can be regarded as relatively a\u0000good model. The findings of this paper are relevant in the pricing of weather\u0000derivatives with temperature as an underlying variable in the Ghanaian\u0000financial and agricultural sector. Furthermore, it would assist in the\u0000development and design of tailored agriculture/crop insurance models which\u0000would incorporate temperature dynamics rather than extreme weather\u0000conditions/events such as floods, drought and wildfires.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abderrahim DerouicheLAAS-S4M, UT3, Damien BrulinLAAS-S4M, UT2J, Eric CampoLAAS-S4M, UT2J, Antoine Piau
In an era marked by a demographic change towards an older population, there is an urgent need to improve nutritional monitoring in view of the increase in frailty. This research aims to enhance the identification of meal-taking activities by combining K-Means, GMM, and DBSCAN techniques. Using the Davies-Bouldin Index (DBI) for the optimal meal taking activity clustering, the results show that K-Means seems to be the best solution, thanks to its unrivalled efficiency in data demarcation, compared with the capabilities of GMM and DBSCAN. Although capable of identifying complex patterns and outliers, the latter methods are limited by their operational complexities and dependence on precise parameter configurations. In this paper, we have processed data from 4 houses equipped with sensors. The findings indicate that applying the K-Means method results in high performance, evidenced by a particularly low Davies-Bouldin Index (DBI), illustrating optimal cluster separation and cohesion. Calculating the average duration of each activity using the GMM algorithm allows distinguishing various categories of meal-taking activities. Alternatively, this can correspond to different times of the day fitting to each meal-taking activity. Using K-Means, GMM, and DBSCAN clustering algorithms, the study demonstrates an effective strategy for thoroughly understanding the data. This approach facilitates the comparison and selection of the most suitable method for optimal meal-taking activity clustering.
{"title":"Meal-taking activity monitoring in the elderly based on sensor data: Comparison of unsupervised classification methods","authors":"Abderrahim DerouicheLAAS-S4M, UT3, Damien BrulinLAAS-S4M, UT2J, Eric CampoLAAS-S4M, UT2J, Antoine Piau","doi":"arxiv-2409.02971","DOIUrl":"https://doi.org/arxiv-2409.02971","url":null,"abstract":"In an era marked by a demographic change towards an older population, there\u0000is an urgent need to improve nutritional monitoring in view of the increase in\u0000frailty. This research aims to enhance the identification of meal-taking\u0000activities by combining K-Means, GMM, and DBSCAN techniques. Using the\u0000Davies-Bouldin Index (DBI) for the optimal meal taking activity clustering, the\u0000results show that K-Means seems to be the best solution, thanks to its\u0000unrivalled efficiency in data demarcation, compared with the capabilities of\u0000GMM and DBSCAN. Although capable of identifying complex patterns and outliers,\u0000the latter methods are limited by their operational complexities and dependence\u0000on precise parameter configurations. In this paper, we have processed data from\u00004 houses equipped with sensors. The findings indicate that applying the K-Means\u0000method results in high performance, evidenced by a particularly low\u0000Davies-Bouldin Index (DBI), illustrating optimal cluster separation and\u0000cohesion. Calculating the average duration of each activity using the GMM\u0000algorithm allows distinguishing various categories of meal-taking activities.\u0000Alternatively, this can correspond to different times of the day fitting to\u0000each meal-taking activity. Using K-Means, GMM, and DBSCAN clustering\u0000algorithms, the study demonstrates an effective strategy for thoroughly\u0000understanding the data. This approach facilitates the comparison and selection\u0000of the most suitable method for optimal meal-taking activity clustering.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study conditional linear factor models in the context of asset pricing panels. Our analysis focuses on conditional means and covariances to characterize the cross-sectional and inter-temporal properties of returns and factors as well as their interrelationships. We also review the conditions outlined in Kozak and Nagel (2024) and show how the conditional mean-variance efficient portfolio of an unbalanced panel can be spanned by low-dimensional factor portfolios, even without assuming invertibility of the conditional covariance matrices. Our analysis provides a comprehensive foundation for the specification and estimation of conditional linear factor models.
{"title":"Fundamental properties of linear factor models","authors":"Damir Filipovic, Paul Schneider","doi":"arxiv-2409.02521","DOIUrl":"https://doi.org/arxiv-2409.02521","url":null,"abstract":"We study conditional linear factor models in the context of asset pricing\u0000panels. Our analysis focuses on conditional means and covariances to\u0000characterize the cross-sectional and inter-temporal properties of returns and\u0000factors as well as their interrelationships. We also review the conditions\u0000outlined in Kozak and Nagel (2024) and show how the conditional mean-variance\u0000efficient portfolio of an unbalanced panel can be spanned by low-dimensional\u0000factor portfolios, even without assuming invertibility of the conditional\u0000covariance matrices. Our analysis provides a comprehensive foundation for the\u0000specification and estimation of conditional linear factor models.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study presents a comprehensive methodology for modeling and forecasting the historical time series of fire spots detected by the AQUA_M-T satellite in the Amazon, Brazil. The approach utilizes a mixed Recurrent Neural Network (RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures to predict monthly accumulations of daily detected fire spots. A summary of the data revealed a consistent seasonality over time, with annual maximum and minimum fire spot values tending to repeat at the same periods each year. The primary objective is to verify whether the forecasts capture this inherent seasonality through rigorous statistical analysis. The methodology involved careful data preparation, model configuration, and training using cross-validation with two seeds, ensuring that the data generalizes well to the test and validation sets, and confirming the convergence of the model parameters. The results indicate that the mixed LSTM and GRU model offers improved accuracy in forecasting 12 months ahead, demonstrating its effectiveness in capturing complex temporal patterns and modeling the observed time series. This research significantly contributes to the application of deep learning techniques in environmental monitoring, specifically in fire spot forecasting. In addition to improving forecast accuracy, the proposed approach highlights the potential for adaptation to other time series forecasting challenges, opening new avenues for research and development in machine learning and natural phenomenon prediction. Keywords: Time Series Forecasting, Recurrent Neural Networks, Deep Learning.
{"title":"Neural Networks with LSTM and GRU in Modeling Active Fires in the Amazon","authors":"Ramon Tavares","doi":"arxiv-2409.02681","DOIUrl":"https://doi.org/arxiv-2409.02681","url":null,"abstract":"This study presents a comprehensive methodology for modeling and forecasting\u0000the historical time series of fire spots detected by the AQUA_M-T satellite in\u0000the Amazon, Brazil. The approach utilizes a mixed Recurrent Neural Network\u0000(RNN) model, combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit\u0000(GRU) architectures to predict monthly accumulations of daily detected fire\u0000spots. A summary of the data revealed a consistent seasonality over time, with\u0000annual maximum and minimum fire spot values tending to repeat at the same\u0000periods each year. The primary objective is to verify whether the forecasts\u0000capture this inherent seasonality through rigorous statistical analysis. The\u0000methodology involved careful data preparation, model configuration, and\u0000training using cross-validation with two seeds, ensuring that the data\u0000generalizes well to the test and validation sets, and confirming the\u0000convergence of the model parameters. The results indicate that the mixed LSTM\u0000and GRU model offers improved accuracy in forecasting 12 months ahead,\u0000demonstrating its effectiveness in capturing complex temporal patterns and\u0000modeling the observed time series. This research significantly contributes to\u0000the application of deep learning techniques in environmental monitoring,\u0000specifically in fire spot forecasting. In addition to improving forecast\u0000accuracy, the proposed approach highlights the potential for adaptation to\u0000other time series forecasting challenges, opening new avenues for research and\u0000development in machine learning and natural phenomenon prediction. Keywords:\u0000Time Series Forecasting, Recurrent Neural Networks, Deep Learning.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}