To effectively monitor biodiversity in streams and rivers, we need to quantify species distribution accurately. Occupancy models are useful for distinguishing between the non-detection of a species and its actual absence. While these models can account for spatial autocorrelation, they are not suited for streams and rivers due to their unique network spatial structure. Here, I propose spatial occupancy models specifically designed for data collected on stream and river networks. I present the statistical developments and illustrate their application using data on a semi-aquatic mammal. Overall, spatial stream network occupancy models offer a robust method for assessing biodiversity in freshwater ecosystems.
{"title":"Spatial occupancy models for data collected on stream networks","authors":"Olivier Gimenez","doi":"arxiv-2409.10017","DOIUrl":"https://doi.org/arxiv-2409.10017","url":null,"abstract":"To effectively monitor biodiversity in streams and rivers, we need to\u0000quantify species distribution accurately. Occupancy models are useful for\u0000distinguishing between the non-detection of a species and its actual absence.\u0000While these models can account for spatial autocorrelation, they are not suited\u0000for streams and rivers due to their unique network spatial structure. Here, I\u0000propose spatial occupancy models specifically designed for data collected on\u0000stream and river networks. I present the statistical developments and\u0000illustrate their application using data on a semi-aquatic mammal. Overall,\u0000spatial stream network occupancy models offer a robust method for assessing\u0000biodiversity in freshwater ecosystems.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this manuscript, we analyze the interaction network on Twitter among members of the 117th U.S. Congress to assess the visibility of political leaders and explore how systemic properties and node attributes influence the formation of legislative connections. We employ descriptive social network statistical methods, the exponential random graph model (ERGM), and the stochastic block model (SBM) to evaluate the relative impact of network systemic properties, as well as institutional and personal traits, on the generation of online relationships among legislators. Our findings reveal that legislative networks on social media platforms like Twitter tend to reinforce the leadership of dominant political actors rather than diminishing their influence. However, we identify that these leadership roles can manifest in various forms. Additionally, we highlight that online connections within legislative networks are influenced by both the systemic properties of the network and institutional characteristics.
{"title":"Leadership and Engagement Dynamics in Legislative Twitter Networks: Statistical Analysis and Modeling","authors":"Carolina Luque, Juan Sosa","doi":"arxiv-2409.10475","DOIUrl":"https://doi.org/arxiv-2409.10475","url":null,"abstract":"In this manuscript, we analyze the interaction network on Twitter among\u0000members of the 117th U.S. Congress to assess the visibility of political\u0000leaders and explore how systemic properties and node attributes influence the\u0000formation of legislative connections. We employ descriptive social network\u0000statistical methods, the exponential random graph model (ERGM), and the\u0000stochastic block model (SBM) to evaluate the relative impact of network\u0000systemic properties, as well as institutional and personal traits, on the\u0000generation of online relationships among legislators. Our findings reveal that\u0000legislative networks on social media platforms like Twitter tend to reinforce\u0000the leadership of dominant political actors rather than diminishing their\u0000influence. However, we identify that these leadership roles can manifest in\u0000various forms. Additionally, we highlight that online connections within\u0000legislative networks are influenced by both the systemic properties of the\u0000network and institutional characteristics.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sing-Wen ChenInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taiwan, Joyce JuangCentral Weather Administration, Taiwan, Charlotte WangInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan, Hui-Ling ChangCentral Weather Administration, Taiwan, Jing-Shan HongCentral Weather Administration, Taiwan, Chuhsing Kate HsiaoInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan
Heavy precipitation from tropical cyclones (TCs) may result in disasters, such as floods and landslides, leading to substantial economic damage and loss of life. Prediction of TC precipitation based on ensemble post-processing procedures using machine learning (ML) approaches has received considerable attention for its flexibility in modeling and its computational power in managing complex models. However, when applying ML techniques to TC precipitation for a specific area, the available observation data are typically insufficient for comprehensive training, validation, and testing of the ML model, primarily due to the rapid movement of TCs. We propose to use the convolutional neural network (CNN) as a deep ML model to leverage the spatial information of precipitation. The proposed model has three distinct features that differentiate it from traditional CNNs applied in meteorology. First, it utilizes data augmentation to alleviate challenges posed by the small sample size. Second, it contains geographical and dynamic variables to account for area-specific features and the relative distance between the study area and the moving TC. Third, it applies unequal weights to accommodate the temporal structure in the training data when calculating the objective function. The proposed CNN-all model is then illustrated with the TC Soudelor's impact on Taiwan. Soudelor was the strongest TC of the 2015 Pacific typhoon season. The results show that the inclusion of augmented data and dynamic variables improves the prediction of heavy precipitation. The proposed CNN-all outperforms traditional CNN models, based on the continuous probability skill score (CRPSS), probability plots, and reliability diagram. The proposed model has the potential to be utilized in a wide range of meteorological studies.
热带气旋(TC)带来的强降水可能导致洪水和山体滑坡等灾害,造成巨大的经济损失和人员伤亡。基于机器学习(ML)方法的集合后处理程序预测热带气旋降水因其建模的灵活性和管理复杂模型的计算能力而受到广泛关注。然而,当将 ML 技术应用于特定区域的 TC 降水时,可用的观测数据通常不足以对 ML 模型进行全面的训练、验证和测试,这主要是由于 TC 的快速移动造成的。我们建议使用卷积神经网络(CNN)作为深度 ML 模型,以充分利用降水的空间信息。所提出的模型有三个显著特点,有别于气象学中应用的传统 CNN。首先,它利用数据扩增来缓解小样本带来的挑战。其次,它包含地理和动态变量,以考虑特定区域的特征以及研究区域与移动 TC 之间的相对距离。第三,在计算目标函数时,它采用了不等权重以适应训练数据中的时间结构。然后,用苏迪罗风暴对台湾的影响来说明所提出的 CNN 全模型。苏迪罗是 2015 年太平洋台风季最强的热带气旋。结果表明,加入增强数据和动态变量可改善强降水预测。根据连续概率技能分数(CRPSS)、概率图和可靠性图,所提出的 CNN 均优于传统 CNN 模型。所提出的模型具有广泛应用于气象研究的潜力。
{"title":"A Convolutional Neural Network-based Ensemble Post-processing with Data Augmentation for Tropical Cyclone Precipitation Forecasts","authors":"Sing-Wen ChenInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taiwan, Joyce JuangCentral Weather Administration, Taiwan, Charlotte WangInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan, Hui-Ling ChangCentral Weather Administration, Taiwan, Jing-Shan HongCentral Weather Administration, Taiwan, Chuhsing Kate HsiaoInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan","doi":"arxiv-2409.09607","DOIUrl":"https://doi.org/arxiv-2409.09607","url":null,"abstract":"Heavy precipitation from tropical cyclones (TCs) may result in disasters,\u0000such as floods and landslides, leading to substantial economic damage and loss\u0000of life. Prediction of TC precipitation based on ensemble post-processing\u0000procedures using machine learning (ML) approaches has received considerable\u0000attention for its flexibility in modeling and its computational power in\u0000managing complex models. However, when applying ML techniques to TC\u0000precipitation for a specific area, the available observation data are typically\u0000insufficient for comprehensive training, validation, and testing of the ML\u0000model, primarily due to the rapid movement of TCs. We propose to use the\u0000convolutional neural network (CNN) as a deep ML model to leverage the spatial\u0000information of precipitation. The proposed model has three distinct features\u0000that differentiate it from traditional CNNs applied in meteorology. First, it\u0000utilizes data augmentation to alleviate challenges posed by the small sample\u0000size. Second, it contains geographical and dynamic variables to account for\u0000area-specific features and the relative distance between the study area and the\u0000moving TC. Third, it applies unequal weights to accommodate the temporal\u0000structure in the training data when calculating the objective function. The\u0000proposed CNN-all model is then illustrated with the TC Soudelor's impact on\u0000Taiwan. Soudelor was the strongest TC of the 2015 Pacific typhoon season. The\u0000results show that the inclusion of augmented data and dynamic variables\u0000improves the prediction of heavy precipitation. The proposed CNN-all\u0000outperforms traditional CNN models, based on the continuous probability skill\u0000score (CRPSS), probability plots, and reliability diagram. The proposed model\u0000has the potential to be utilized in a wide range of meteorological studies.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thiago Trafane Oliveira SantosCentral Bank of Brazil, Brasília, Brazil. Department of %Economics, University of Brasilia, Brazil, Daniel Oliveira CajueiroDepartment of Economics, University of Brasilia, Brazil. National Institute of Science and Technology for Complex Systems
Zipf's law states that the probability of a variable being larger than $s$ is roughly inversely proportional to $s$. In this paper, we evaluate Zipf's law for the distribution of firm size by the number of employees in Brazil. We use publicly available binned annual data from the Central Register of Enterprises (CEMPRE), which is held by the Brazilian Institute of Geography and Statistics (IBGE) and covers all formal organizations. Remarkably, we find that Zipf's law provides a very good, although not perfect, approximation to data for each year between 1996 and 2020 at the economy-wide level and also for agriculture, industry, and services alone. However, a lognormal distribution also performs well and even outperforms Zipf's law in certain cases.
{"title":"Zipf's law in the distribution of Brazilian firm size","authors":"Thiago Trafane Oliveira SantosCentral Bank of Brazil, Brasília, Brazil. Department of %Economics, University of Brasilia, Brazil, Daniel Oliveira CajueiroDepartment of Economics, University of Brasilia, Brazil. National Institute of Science and Technology for Complex Systems","doi":"arxiv-2409.09470","DOIUrl":"https://doi.org/arxiv-2409.09470","url":null,"abstract":"Zipf's law states that the probability of a variable being larger than $s$ is\u0000roughly inversely proportional to $s$. In this paper, we evaluate Zipf's law\u0000for the distribution of firm size by the number of employees in Brazil. We use\u0000publicly available binned annual data from the Central Register of Enterprises\u0000(CEMPRE), which is held by the Brazilian Institute of Geography and Statistics\u0000(IBGE) and covers all formal organizations. Remarkably, we find that Zipf's law\u0000provides a very good, although not perfect, approximation to data for each year\u0000between 1996 and 2020 at the economy-wide level and also for agriculture,\u0000industry, and services alone. However, a lognormal distribution also performs\u0000well and even outperforms Zipf's law in certain cases.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate reconstruction of ambient temperature at death scenes is crucial for estimating the postmortem interval (PMI) in forensic science. Typically, this is done by correcting weather station temperatures using measurements from the scene, often through linear regression. While recent attempts to use alternative algorithms like GAM have improved accuracy, they usually require additional variables such as humidity, making them impractical. This study presents two methods for accurate temperature reconstruction using only temperature data. The first, a concurrent regression model, is known in mathematics and is applied here for mid-term reconstructions (several days of measurements). The second, a new method based on Fourier expansion, is designed for short-term reconstructions (only a few hours of measurements). Both models were tested in quasi-indoor conditions, using data from six different environments. The concurrent regression model provided nearly perfect reconstructions for periods longer than six days, while the short-term model achieved similar accuracy after just 4-5 hours of measurements. These findings demonstrate that reliable temperature corrections for PMI estimation can be made with significantly reduced measurement periods, enhancing the practicality of the method in forensic applications.
{"title":"Forensically useful mid-term and short-term temperature reconstruction for quasi-indoor death scenes","authors":"Jędrzej Wydra, Łukasz Smaga, Szymon Matuszewski","doi":"arxiv-2409.09516","DOIUrl":"https://doi.org/arxiv-2409.09516","url":null,"abstract":"Accurate reconstruction of ambient temperature at death scenes is crucial for\u0000estimating the postmortem interval (PMI) in forensic science. Typically, this\u0000is done by correcting weather station temperatures using measurements from the\u0000scene, often through linear regression. While recent attempts to use\u0000alternative algorithms like GAM have improved accuracy, they usually require\u0000additional variables such as humidity, making them impractical. This study\u0000presents two methods for accurate temperature reconstruction using only\u0000temperature data. The first, a concurrent regression model, is known in\u0000mathematics and is applied here for mid-term reconstructions (several days of\u0000measurements). The second, a new method based on Fourier expansion, is designed\u0000for short-term reconstructions (only a few hours of measurements). Both models\u0000were tested in quasi-indoor conditions, using data from six different\u0000environments. The concurrent regression model provided nearly perfect\u0000reconstructions for periods longer than six days, while the short-term model\u0000achieved similar accuracy after just 4-5 hours of measurements. These findings\u0000demonstrate that reliable temperature corrections for PMI estimation can be\u0000made with significantly reduced measurement periods, enhancing the practicality\u0000of the method in forensic applications.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-resolution stellar spectra offer valuable insights into atmospheric parameters and chemical compositions. However, their inherent complexity and high-dimensionality present challenges in fully utilizing the information they contain. In this study, we utilize data from the Apache Point Observatory Galactic Evolution Experiment (APOGEE) within the Sloan Digital Sky Survey IV (SDSS-IV) to explore latent representations of chemical abundances by applying five dimensionality reduction techniques: PCA, t-SNE, UMAP, Autoencoder, and VAE. Through this exploration, we evaluate the preservation of information and compare reconstructed outputs with the original 19 chemical abundance data. Our findings reveal a performance ranking of PCA < UMAP < t-SNE < VAE <