首页 > 最新文献

arXiv - STAT - Applications最新文献

英文 中文
Spatial occupancy models for data collected on stream networks 溪流网络数据的空间占用模型
Pub Date : 2024-09-16 DOI: arxiv-2409.10017
Olivier Gimenez
To effectively monitor biodiversity in streams and rivers, we need toquantify species distribution accurately. Occupancy models are useful fordistinguishing between the non-detection of a species and its actual absence.While these models can account for spatial autocorrelation, they are not suitedfor streams and rivers due to their unique network spatial structure. Here, Ipropose spatial occupancy models specifically designed for data collected onstream and river networks. I present the statistical developments andillustrate their application using data on a semi-aquatic mammal. Overall,spatial stream network occupancy models offer a robust method for assessingbiodiversity in freshwater ecosystems.
为了有效监测溪流和河流中的生物多样性,我们需要准确量化物种分布。虽然这些模型可以考虑空间自相关性,但由于溪流和河流独特的网络空间结构,它们并不适合溪流和河流。在此,我提出了专门针对在溪流和河流网络中收集的数据而设计的空间占据模型。我介绍了统计方面的发展,并用一种半水生哺乳动物的数据说明了这些模型的应用。总之,空间河网占有率模型为评估淡水生态系统的生物多样性提供了一种可靠的方法。
{"title":"Spatial occupancy models for data collected on stream networks","authors":"Olivier Gimenez","doi":"arxiv-2409.10017","DOIUrl":"https://doi.org/arxiv-2409.10017","url":null,"abstract":"To effectively monitor biodiversity in streams and rivers, we need to\u0000quantify species distribution accurately. Occupancy models are useful for\u0000distinguishing between the non-detection of a species and its actual absence.\u0000While these models can account for spatial autocorrelation, they are not suited\u0000for streams and rivers due to their unique network spatial structure. Here, I\u0000propose spatial occupancy models specifically designed for data collected on\u0000stream and river networks. I present the statistical developments and\u0000illustrate their application using data on a semi-aquatic mammal. Overall,\u0000spatial stream network occupancy models offer a robust method for assessing\u0000biodiversity in freshwater ecosystems.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leadership and Engagement Dynamics in Legislative Twitter Networks: Statistical Analysis and Modeling 立法推特网络中的领导力和参与动态:统计分析与建模
Pub Date : 2024-09-16 DOI: arxiv-2409.10475
Carolina Luque, Juan Sosa
In this manuscript, we analyze the interaction network on Twitter amongmembers of the 117th U.S. Congress to assess the visibility of politicalleaders and explore how systemic properties and node attributes influence theformation of legislative connections. We employ descriptive social networkstatistical methods, the exponential random graph model (ERGM), and thestochastic block model (SBM) to evaluate the relative impact of networksystemic properties, as well as institutional and personal traits, on thegeneration of online relationships among legislators. Our findings reveal thatlegislative networks on social media platforms like Twitter tend to reinforcethe leadership of dominant political actors rather than diminishing theirinfluence. However, we identify that these leadership roles can manifest invarious forms. Additionally, we highlight that online connections withinlegislative networks are influenced by both the systemic properties of thenetwork and institutional characteristics.
在本手稿中,我们分析了第 117 届美国国会议员在 Twitter 上的互动网络,以评估政治领袖的能见度,并探讨系统属性和节点属性如何影响立法联系的形成。我们采用了描述性社会网络统计方法、指数随机图模型(ERGM)和随机块模型(SBM)来评估网络系统属性以及机构和个人特征对议员之间在线关系形成的相对影响。我们的研究结果表明,Twitter 等社交媒体平台上的立法网络倾向于加强占主导地位的政治行为者的领导力,而不是削弱他们的影响力。然而,我们发现这些领导角色可以表现为多种形式。此外,我们还强调,立法网络中的在线联系既受网络系统属性的影响,也受制度特征的影响。
{"title":"Leadership and Engagement Dynamics in Legislative Twitter Networks: Statistical Analysis and Modeling","authors":"Carolina Luque, Juan Sosa","doi":"arxiv-2409.10475","DOIUrl":"https://doi.org/arxiv-2409.10475","url":null,"abstract":"In this manuscript, we analyze the interaction network on Twitter among\u0000members of the 117th U.S. Congress to assess the visibility of political\u0000leaders and explore how systemic properties and node attributes influence the\u0000formation of legislative connections. We employ descriptive social network\u0000statistical methods, the exponential random graph model (ERGM), and the\u0000stochastic block model (SBM) to evaluate the relative impact of network\u0000systemic properties, as well as institutional and personal traits, on the\u0000generation of online relationships among legislators. Our findings reveal that\u0000legislative networks on social media platforms like Twitter tend to reinforce\u0000the leadership of dominant political actors rather than diminishing their\u0000influence. However, we identify that these leadership roles can manifest in\u0000various forms. Additionally, we highlight that online connections within\u0000legislative networks are influenced by both the systemic properties of the\u0000network and institutional characteristics.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Convolutional Neural Network-based Ensemble Post-processing with Data Augmentation for Tropical Cyclone Precipitation Forecasts 基于卷积神经网络的热带气旋降水预报数据增量集合后处理技术
Pub Date : 2024-09-15 DOI: arxiv-2409.09607
Sing-Wen ChenInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taiwan, Joyce JuangCentral Weather Administration, Taiwan, Charlotte WangInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan, Hui-Ling ChangCentral Weather Administration, Taiwan, Jing-Shan HongCentral Weather Administration, Taiwan, Chuhsing Kate HsiaoInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan
Heavy precipitation from tropical cyclones (TCs) may result in disasters,such as floods and landslides, leading to substantial economic damage and lossof life. Prediction of TC precipitation based on ensemble post-processingprocedures using machine learning (ML) approaches has received considerableattention for its flexibility in modeling and its computational power inmanaging complex models. However, when applying ML techniques to TCprecipitation for a specific area, the available observation data are typicallyinsufficient for comprehensive training, validation, and testing of the MLmodel, primarily due to the rapid movement of TCs. We propose to use theconvolutional neural network (CNN) as a deep ML model to leverage the spatialinformation of precipitation. The proposed model has three distinct featuresthat differentiate it from traditional CNNs applied in meteorology. First, itutilizes data augmentation to alleviate challenges posed by the small samplesize. Second, it contains geographical and dynamic variables to account forarea-specific features and the relative distance between the study area and themoving TC. Third, it applies unequal weights to accommodate the temporalstructure in the training data when calculating the objective function. Theproposed CNN-all model is then illustrated with the TC Soudelor's impact onTaiwan. Soudelor was the strongest TC of the 2015 Pacific typhoon season. Theresults show that the inclusion of augmented data and dynamic variablesimproves the prediction of heavy precipitation. The proposed CNN-alloutperforms traditional CNN models, based on the continuous probability skillscore (CRPSS), probability plots, and reliability diagram. The proposed modelhas the potential to be utilized in a wide range of meteorological studies.
热带气旋(TC)带来的强降水可能导致洪水和山体滑坡等灾害,造成巨大的经济损失和人员伤亡。基于机器学习(ML)方法的集合后处理程序预测热带气旋降水因其建模的灵活性和管理复杂模型的计算能力而受到广泛关注。然而,当将 ML 技术应用于特定区域的 TC 降水时,可用的观测数据通常不足以对 ML 模型进行全面的训练、验证和测试,这主要是由于 TC 的快速移动造成的。我们建议使用卷积神经网络(CNN)作为深度 ML 模型,以充分利用降水的空间信息。所提出的模型有三个显著特点,有别于气象学中应用的传统 CNN。首先,它利用数据扩增来缓解小样本带来的挑战。其次,它包含地理和动态变量,以考虑特定区域的特征以及研究区域与移动 TC 之间的相对距离。第三,在计算目标函数时,它采用了不等权重以适应训练数据中的时间结构。然后,用苏迪罗风暴对台湾的影响来说明所提出的 CNN 全模型。苏迪罗是 2015 年太平洋台风季最强的热带气旋。结果表明,加入增强数据和动态变量可改善强降水预测。根据连续概率技能分数(CRPSS)、概率图和可靠性图,所提出的 CNN 均优于传统 CNN 模型。所提出的模型具有广泛应用于气象研究的潜力。
{"title":"A Convolutional Neural Network-based Ensemble Post-processing with Data Augmentation for Tropical Cyclone Precipitation Forecasts","authors":"Sing-Wen ChenInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taiwan, Joyce JuangCentral Weather Administration, Taiwan, Charlotte WangInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan, Hui-Ling ChangCentral Weather Administration, Taiwan, Jing-Shan HongCentral Weather Administration, Taiwan, Chuhsing Kate HsiaoInstitute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, TaiwanMaster Program of Public Health, College of Public Health, National Taiwan University, Taiwan","doi":"arxiv-2409.09607","DOIUrl":"https://doi.org/arxiv-2409.09607","url":null,"abstract":"Heavy precipitation from tropical cyclones (TCs) may result in disasters,\u0000such as floods and landslides, leading to substantial economic damage and loss\u0000of life. Prediction of TC precipitation based on ensemble post-processing\u0000procedures using machine learning (ML) approaches has received considerable\u0000attention for its flexibility in modeling and its computational power in\u0000managing complex models. However, when applying ML techniques to TC\u0000precipitation for a specific area, the available observation data are typically\u0000insufficient for comprehensive training, validation, and testing of the ML\u0000model, primarily due to the rapid movement of TCs. We propose to use the\u0000convolutional neural network (CNN) as a deep ML model to leverage the spatial\u0000information of precipitation. The proposed model has three distinct features\u0000that differentiate it from traditional CNNs applied in meteorology. First, it\u0000utilizes data augmentation to alleviate challenges posed by the small sample\u0000size. Second, it contains geographical and dynamic variables to account for\u0000area-specific features and the relative distance between the study area and the\u0000moving TC. Third, it applies unequal weights to accommodate the temporal\u0000structure in the training data when calculating the objective function. The\u0000proposed CNN-all model is then illustrated with the TC Soudelor's impact on\u0000Taiwan. Soudelor was the strongest TC of the 2015 Pacific typhoon season. The\u0000results show that the inclusion of augmented data and dynamic variables\u0000improves the prediction of heavy precipitation. The proposed CNN-all\u0000outperforms traditional CNN models, based on the continuous probability skill\u0000score (CRPSS), probability plots, and reliability diagram. The proposed model\u0000has the potential to be utilized in a wide range of meteorological studies.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zipf's law in the distribution of Brazilian firm size 巴西企业规模分布中的齐普夫定律
Pub Date : 2024-09-14 DOI: arxiv-2409.09470
Thiago Trafane Oliveira SantosCentral Bank of Brazil, Brasília, Brazil. Department of %Economics, University of Brasilia, Brazil, Daniel Oliveira CajueiroDepartment of Economics, University of Brasilia, Brazil. National Institute of Science and Technology for Complex Systems
Zipf's law states that the probability of a variable being larger than $s$ isroughly inversely proportional to $s$. In this paper, we evaluate Zipf's lawfor the distribution of firm size by the number of employees in Brazil. We usepublicly available binned annual data from the Central Register of Enterprises(CEMPRE), which is held by the Brazilian Institute of Geography and Statistics(IBGE) and covers all formal organizations. Remarkably, we find that Zipf's lawprovides a very good, although not perfect, approximation to data for each yearbetween 1996 and 2020 at the economy-wide level and also for agriculture,industry, and services alone. However, a lognormal distribution also performswell and even outperforms Zipf's law in certain cases.
齐普夫定律指出,变量大于 $s$ 的概率与 $s$ 成反比。在本文中,我们对巴西按雇员人数计算的企业规模分布进行了齐普夫定律评估。我们使用了巴西地理统计局(IBGE)掌握的企业中央登记册(CEMPRE)中公开的年度分档数据,该数据涵盖了所有正规组织。值得注意的是,我们发现齐普夫定律对 1996 年至 2020 年期间每年的数据提供了一个非常好的近似值,尽管这个近似值并不完美,但对整个经济层面以及农业、工业和服务业都是如此。然而,对数正态分布的表现也很好,甚至在某些情况下优于齐普夫定律。
{"title":"Zipf's law in the distribution of Brazilian firm size","authors":"Thiago Trafane Oliveira SantosCentral Bank of Brazil, Brasília, Brazil. Department of %Economics, University of Brasilia, Brazil, Daniel Oliveira CajueiroDepartment of Economics, University of Brasilia, Brazil. National Institute of Science and Technology for Complex Systems","doi":"arxiv-2409.09470","DOIUrl":"https://doi.org/arxiv-2409.09470","url":null,"abstract":"Zipf's law states that the probability of a variable being larger than $s$ is\u0000roughly inversely proportional to $s$. In this paper, we evaluate Zipf's law\u0000for the distribution of firm size by the number of employees in Brazil. We use\u0000publicly available binned annual data from the Central Register of Enterprises\u0000(CEMPRE), which is held by the Brazilian Institute of Geography and Statistics\u0000(IBGE) and covers all formal organizations. Remarkably, we find that Zipf's law\u0000provides a very good, although not perfect, approximation to data for each year\u0000between 1996 and 2020 at the economy-wide level and also for agriculture,\u0000industry, and services alone. However, a lognormal distribution also performs\u0000well and even outperforms Zipf's law in certain cases.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forensically useful mid-term and short-term temperature reconstruction for quasi-indoor death scenes 对准室内死亡现场的中期和短期温度重建具有法医学意义
Pub Date : 2024-09-14 DOI: arxiv-2409.09516
Jędrzej Wydra, Łukasz Smaga, Szymon Matuszewski
Accurate reconstruction of ambient temperature at death scenes is crucial forestimating the postmortem interval (PMI) in forensic science. Typically, thisis done by correcting weather station temperatures using measurements from thescene, often through linear regression. While recent attempts to usealternative algorithms like GAM have improved accuracy, they usually requireadditional variables such as humidity, making them impractical. This studypresents two methods for accurate temperature reconstruction using onlytemperature data. The first, a concurrent regression model, is known inmathematics and is applied here for mid-term reconstructions (several days ofmeasurements). The second, a new method based on Fourier expansion, is designedfor short-term reconstructions (only a few hours of measurements). Both modelswere tested in quasi-indoor conditions, using data from six differentenvironments. The concurrent regression model provided nearly perfectreconstructions for periods longer than six days, while the short-term modelachieved similar accuracy after just 4-5 hours of measurements. These findingsdemonstrate that reliable temperature corrections for PMI estimation can bemade with significantly reduced measurement periods, enhancing the practicalityof the method in forensic applications.
在法医学中,准确重建死亡现场的环境温度对于估计死后间隔时间(PMI)至关重要。通常情况下,这是通过使用现场测量值校正气象站温度来完成的,通常是通过线性回归。虽然最近尝试使用 GAM 等替代算法提高了准确性,但这些算法通常需要湿度等额外变量,因此并不实用。本研究介绍了两种仅使用温度数据进行精确温度重建的方法。第一种是数学中已知的并行回归模型,在此应用于中期重建(几天的测量)。第二种是基于傅立叶扩展的新方法,用于短期重建(仅几个小时的测量)。这两种模型都在准室内条件下进行了测试,使用的数据来自六个不同的环境。同期回归模型为超过六天的时间提供了近乎完美的重建,而短期模型仅在测量 4-5 小时后就达到了类似的准确度。这些研究结果表明,在测量时间大大缩短的情况下,就可以对 PMI 估算进行可靠的温度修正,从而提高了该方法在法医应用中的实用性。
{"title":"Forensically useful mid-term and short-term temperature reconstruction for quasi-indoor death scenes","authors":"Jędrzej Wydra, Łukasz Smaga, Szymon Matuszewski","doi":"arxiv-2409.09516","DOIUrl":"https://doi.org/arxiv-2409.09516","url":null,"abstract":"Accurate reconstruction of ambient temperature at death scenes is crucial for\u0000estimating the postmortem interval (PMI) in forensic science. Typically, this\u0000is done by correcting weather station temperatures using measurements from the\u0000scene, often through linear regression. While recent attempts to use\u0000alternative algorithms like GAM have improved accuracy, they usually require\u0000additional variables such as humidity, making them impractical. This study\u0000presents two methods for accurate temperature reconstruction using only\u0000temperature data. The first, a concurrent regression model, is known in\u0000mathematics and is applied here for mid-term reconstructions (several days of\u0000measurements). The second, a new method based on Fourier expansion, is designed\u0000for short-term reconstructions (only a few hours of measurements). Both models\u0000were tested in quasi-indoor conditions, using data from six different\u0000environments. The concurrent regression model provided nearly perfect\u0000reconstructions for periods longer than six days, while the short-term model\u0000achieved similar accuracy after just 4-5 hours of measurements. These findings\u0000demonstrate that reliable temperature corrections for PMI estimation can be\u0000made with significantly reduced measurement periods, enhancing the practicality\u0000of the method in forensic applications.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Dimensionality Reduction of SDSS Spectral Abundances 探索 SDSS 光谱丰度的降维方法
Pub Date : 2024-09-13 DOI: arxiv-2409.09227
Qianyu Fan, Joshua S. Speagle
High-resolution stellar spectra offer valuable insights into atmosphericparameters and chemical compositions. However, their inherent complexity andhigh-dimensionality present challenges in fully utilizing the information theycontain. In this study, we utilize data from the Apache Point ObservatoryGalactic Evolution Experiment (APOGEE) within the Sloan Digital Sky Survey IV(SDSS-IV) to explore latent representations of chemical abundances by applyingfive dimensionality reduction techniques: PCA, t-SNE, UMAP, Autoencoder, andVAE. Through this exploration, we evaluate the preservation of information andcompare reconstructed outputs with the original 19 chemical abundance data. Ourfindings reveal a performance ranking of PCA < UMAP < t-SNE < VAE <Autoencoder, through comparing their explained variance under optimized MSE.The performance of non-linear (Autoencoder and VAE) algorithms hasapproximately 10% improvement compared to linear (PCA) algorithm. Thisdifference can be referred to as the "non-linearity gap." Future work shouldfocus on incorporating measurement errors into extension VAEs, therebyenhancing the reliability and interpretability of chemical abundanceexploration in astronomical spectra.
高分辨率恒星光谱为了解大气参数和化学成分提供了宝贵的信息。然而,它们固有的复杂性和高维度给充分利用其中的信息带来了挑战。在这项研究中,我们利用斯隆数字巡天 IV(SDSS-IV)中阿帕奇点天文台银河演化实验(APOGEE)的数据,通过应用五种降维技术来探索化学丰度的潜在表征:PCA、t-SNE、UMAP、自动编码器和VAE。通过这种探索,我们评估了信息的保存情况,并将重建输出与原始的 19 个化学丰度数据进行了比较。通过比较它们在优化 MSE 条件下的解释方差,我们的发现揭示了 PCA < UMAP < t-SNE < VAE < Autoencoder 的性能排名。这种差异可称为 "非线性差距"。未来的工作重点是将测量误差纳入扩展 VAE,从而提高天文光谱中化学丰度探索的可靠性和可解释性。
{"title":"Exploring Dimensionality Reduction of SDSS Spectral Abundances","authors":"Qianyu Fan, Joshua S. Speagle","doi":"arxiv-2409.09227","DOIUrl":"https://doi.org/arxiv-2409.09227","url":null,"abstract":"High-resolution stellar spectra offer valuable insights into atmospheric\u0000parameters and chemical compositions. However, their inherent complexity and\u0000high-dimensionality present challenges in fully utilizing the information they\u0000contain. In this study, we utilize data from the Apache Point Observatory\u0000Galactic Evolution Experiment (APOGEE) within the Sloan Digital Sky Survey IV\u0000(SDSS-IV) to explore latent representations of chemical abundances by applying\u0000five dimensionality reduction techniques: PCA, t-SNE, UMAP, Autoencoder, and\u0000VAE. Through this exploration, we evaluate the preservation of information and\u0000compare reconstructed outputs with the original 19 chemical abundance data. Our\u0000findings reveal a performance ranking of PCA < UMAP < t-SNE < VAE <\u0000Autoencoder, through comparing their explained variance under optimized MSE.\u0000The performance of non-linear (Autoencoder and VAE) algorithms has\u0000approximately 10% improvement compared to linear (PCA) algorithm. This\u0000difference can be referred to as the \"non-linearity gap.\" Future work should\u0000focus on incorporating measurement errors into extension VAEs, thereby\u0000enhancing the reliability and interpretability of chemical abundance\u0000exploration in astronomical spectra.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning AutoIRT:利用自动机器学习校准项目反应理论模型
Pub Date : 2024-09-13 DOI: arxiv-2409.08823
James Sharpnack, Phoebe Mulcaire, Klinton Bicknell, Geoff LaFlair, Kevin Yancey
Item response theory (IRT) is a class of interpretable factor models that arewidely used in computerized adaptive tests (CATs), such as language proficiencytests. Traditionally, these are fit using parametric mixed effects models onthe probability of a test taker getting the correct answer to a test item(i.e., question). Neural net extensions of these models, such as BertIRT,require specialized architectures and parameter tuning. We propose a multistagefitting procedure that is compatible with out-of-the-box Automated MachineLearning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop witha two stage inner loop, which trains a non-parametric AutoML grade model usingitem features followed by an item specific parametric model. This greatlyaccelerates the modeling workflow for scoring tests. We demonstrate itseffectiveness by applying it to the Duolingo English Test, a high stakes,online English proficiency test. We show that the resulting model is typicallymore well calibrated, gets better predictive performance, and more accuratescores than existing methods (non-explanatory IRT models and explanatory IRTmodels like BERT-IRT). Along the way, we provide a brief survey of machinelearning methods for calibration of item parameters for CATs.
项目反应理论(IRT)是一类可解释的因素模型,广泛应用于计算机化自适应测试(CAT),如语言能力测试。传统上,这些模型使用参数混合效应模型来拟合应试者得到测试项目(即问题)正确答案的概率。这些模型的神经网络扩展(如 BertIRT)需要专门的架构和参数调整。我们提出了一种与开箱即用的自动机器学习(AutoML)工具兼容的多阶段拟合程序。它基于蒙特卡罗电磁(MCEM)外循环和两阶段内循环,利用项目特征训练非参数 AutoML 等级模型,然后再训练特定项目参数模型。这大大加快了测试评分的建模工作流程。我们将其应用于 Duolingo 英语测试(一种高风险的在线英语水平测试),证明了它的有效性。我们表明,与现有方法(非解释性 IRT 模型和解释性 IRT 模型,如 BERT-IRT)相比,所得到的模型通常校准得更好,预测性能更高,评分也更准确。此外,我们还简要介绍了用于校准 CAT 项目参数的机器学习方法。
{"title":"AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning","authors":"James Sharpnack, Phoebe Mulcaire, Klinton Bicknell, Geoff LaFlair, Kevin Yancey","doi":"arxiv-2409.08823","DOIUrl":"https://doi.org/arxiv-2409.08823","url":null,"abstract":"Item response theory (IRT) is a class of interpretable factor models that are\u0000widely used in computerized adaptive tests (CATs), such as language proficiency\u0000tests. Traditionally, these are fit using parametric mixed effects models on\u0000the probability of a test taker getting the correct answer to a test item\u0000(i.e., question). Neural net extensions of these models, such as BertIRT,\u0000require specialized architectures and parameter tuning. We propose a multistage\u0000fitting procedure that is compatible with out-of-the-box Automated Machine\u0000Learning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with\u0000a two stage inner loop, which trains a non-parametric AutoML grade model using\u0000item features followed by an item specific parametric model. This greatly\u0000accelerates the modeling workflow for scoring tests. We demonstrate its\u0000effectiveness by applying it to the Duolingo English Test, a high stakes,\u0000online English proficiency test. We show that the resulting model is typically\u0000more well calibrated, gets better predictive performance, and more accurate\u0000scores than existing methods (non-explanatory IRT models and explanatory IRT\u0000models like BERT-IRT). Along the way, we provide a brief survey of machine\u0000learning methods for calibration of item parameters for CATs.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Country Comparative Analysis of Climate Resilience and Localized Mapping in Data-Sparse Regions 数据分离地区气候适应性和本地化绘图的跨国比较分析
Pub Date : 2024-09-13 DOI: arxiv-2409.08765
Ronald Katende
Climate resilience across sectors varies significantly in low-incomecountries (LICs), with agriculture being the most vulnerable to climate change.Existing studies typically focus on individual countries, offering limitedinsights into broader cross-country patterns of adaptation and vulnerability.This paper addresses these gaps by introducing a framework for cross-countrycomparative analysis of sectoral climate resilience using meta-analysis andcross-country panel data techniques. The study identifies sharedvulnerabilities and adaptation strategies across LICs, enabling more effectivepolicy design. Additionally, a novel localized climate-agriculture mappingtechnique is developed, integrating sparse agricultural data withhigh-resolution satellite imagery to generate fine-grained maps of agriculturalproductivity under climate stress. Spatial interpolation methods, such askriging, are used to address data gaps, providing detailed insights intoregional agricultural productivity and resilience. The findings offerpolicymakers tools to prioritize climate adaptation efforts and optimizeresource allocation both regionally and nationally.
在低收入国家(LICs),各部门的气候适应能力差异很大,其中农业最容易受到气候变化的影响。现有的研究通常侧重于单个国家,对更广泛的跨国适应和脆弱性模式的洞察力有限。本文采用元分析和跨国面板数据技术,引入了一个跨国比较分析框架,对各部门的气候适应能力进行了分析,从而弥补了这些不足。该研究确定了低收入国家的共同脆弱性和适应战略,从而使政策设计更加有效。此外,还开发了一种新颖的本地化气候-农业绘图技术,将稀疏的农业数据与高分辨率卫星图像相结合,生成气候压力下农业生产力的精细地图。利用灌溉等空间插值方法解决了数据缺口问题,提供了对区域农业生产力和恢复力的详细见解。研究结果为政策制定者提供了确定气候适应工作的优先次序以及优化地区和国家资源分配的工具。
{"title":"Cross-Country Comparative Analysis of Climate Resilience and Localized Mapping in Data-Sparse Regions","authors":"Ronald Katende","doi":"arxiv-2409.08765","DOIUrl":"https://doi.org/arxiv-2409.08765","url":null,"abstract":"Climate resilience across sectors varies significantly in low-income\u0000countries (LICs), with agriculture being the most vulnerable to climate change.\u0000Existing studies typically focus on individual countries, offering limited\u0000insights into broader cross-country patterns of adaptation and vulnerability.\u0000This paper addresses these gaps by introducing a framework for cross-country\u0000comparative analysis of sectoral climate resilience using meta-analysis and\u0000cross-country panel data techniques. The study identifies shared\u0000vulnerabilities and adaptation strategies across LICs, enabling more effective\u0000policy design. Additionally, a novel localized climate-agriculture mapping\u0000technique is developed, integrating sparse agricultural data with\u0000high-resolution satellite imagery to generate fine-grained maps of agricultural\u0000productivity under climate stress. Spatial interpolation methods, such as\u0000kriging, are used to address data gaps, providing detailed insights into\u0000regional agricultural productivity and resilience. The findings offer\u0000policymakers tools to prioritize climate adaptation efforts and optimize\u0000resource allocation both regionally and nationally.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Analysis of Quantitative Cancer Imaging Data 癌症定量成像数据的统计分析
Pub Date : 2024-09-13 DOI: arxiv-2409.08809
Shariq Mohammed, Maria Masotti, Nathaniel Osher, Satwik Acharyya, Veerabhadran Baladandayuthapani
Recent advances in types and extent of medical imaging technologies has ledto proliferation of multimodal quantitative imaging data in cancer.Quantitative medical imaging data refer to numerical representations derivedfrom medical imaging technologies, such as radiology and pathology imaging,that can be used to assess and quantify characteristics of diseases, especiallycancer. The use of such data in both clinical and research setting enablesprecise quantifications and analyses of tumor characteristics that canfacilitate objective evaluation of disease progression, response to therapy,and prognosis. The scale and size of these imaging biomarkers is vast andpresents several analytical and computational challenges that range fromhigh-dimensionality to complex structural correlation patterns. In this reviewarticle, we summarize some state-of-the-art statistical methods developed forquantitative medical imaging data ranging from topological, functional andshape data analyses to spatial process models. We delve into common imagingbiomarkers with a focus on radiology and pathology imaging in cancer, addressthe analytical questions and challenges they present, and highlight theinnovative statistical and machine learning models that have been developed toanswer relevant scientific and clinical questions. We also outline someemerging and open problems in this area for future explorations.
定量医学影像数据是指通过医学影像技术(如放射学和病理学成像)获得的数字表示,可用于评估和量化疾病(尤其是癌症)的特征。在临床和研究环境中使用这些数据可对肿瘤特征进行精确的量化和分析,从而有助于客观评估疾病的进展、对治疗的反应和预后。这些成像生物标记物的规模和大小非常庞大,带来了从高维到复杂结构相关模式等多个分析和计算方面的挑战。在这篇综述文章中,我们总结了为定量医学成像数据开发的一些最先进的统计方法,包括拓扑、功能和形状数据分析以及空间过程模型。我们深入研究了常见的成像生物标记物,重点关注癌症的放射学和病理学成像,探讨了它们带来的分析问题和挑战,并重点介绍了为回答相关科学和临床问题而开发的创新统计和机器学习模型。我们还概述了这一领域的一些新问题和开放性问题,供未来探索。
{"title":"Statistical Analysis of Quantitative Cancer Imaging Data","authors":"Shariq Mohammed, Maria Masotti, Nathaniel Osher, Satwik Acharyya, Veerabhadran Baladandayuthapani","doi":"arxiv-2409.08809","DOIUrl":"https://doi.org/arxiv-2409.08809","url":null,"abstract":"Recent advances in types and extent of medical imaging technologies has led\u0000to proliferation of multimodal quantitative imaging data in cancer.\u0000Quantitative medical imaging data refer to numerical representations derived\u0000from medical imaging technologies, such as radiology and pathology imaging,\u0000that can be used to assess and quantify characteristics of diseases, especially\u0000cancer. The use of such data in both clinical and research setting enables\u0000precise quantifications and analyses of tumor characteristics that can\u0000facilitate objective evaluation of disease progression, response to therapy,\u0000and prognosis. The scale and size of these imaging biomarkers is vast and\u0000presents several analytical and computational challenges that range from\u0000high-dimensionality to complex structural correlation patterns. In this review\u0000article, we summarize some state-of-the-art statistical methods developed for\u0000quantitative medical imaging data ranging from topological, functional and\u0000shape data analyses to spatial process models. We delve into common imaging\u0000biomarkers with a focus on radiology and pathology imaging in cancer, address\u0000the analytical questions and challenges they present, and highlight the\u0000innovative statistical and machine learning models that have been developed to\u0000answer relevant scientific and clinical questions. We also outline some\u0000emerging and open problems in this area for future explorations.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian framework to evaluate evidence in cases of alleged cheating with secret codes in sports 用贝叶斯框架评估体育运动中涉嫌使用暗码作弊案件的证据
Pub Date : 2024-09-12 DOI: arxiv-2409.08172
Aafko Boonstra, Ronald Meester
We present a Bayesian framework to analyze a case of alleged cheating in themind sport contract bridge. We explain why a Bayesian approach is called for,and not a frequentistic one. We argue that such a Bayesian framework can andshould also be used in other sports for cases of alleged cheating by means ofillegal signalling.
我们提出了一个贝叶斯框架,用于分析一个涉嫌在桥牌运动中作弊的案例。我们解释了为什么需要贝叶斯方法,而不是频数方法。我们认为,这种贝叶斯框架也可以并应该用于其他体育项目,以分析通过非法信号手段涉嫌作弊的案例。
{"title":"A Bayesian framework to evaluate evidence in cases of alleged cheating with secret codes in sports","authors":"Aafko Boonstra, Ronald Meester","doi":"arxiv-2409.08172","DOIUrl":"https://doi.org/arxiv-2409.08172","url":null,"abstract":"We present a Bayesian framework to analyze a case of alleged cheating in the\u0000mind sport contract bridge. We explain why a Bayesian approach is called for,\u0000and not a frequentistic one. We argue that such a Bayesian framework can and\u0000should also be used in other sports for cases of alleged cheating by means of\u0000illegal signalling.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1