Given a collection of historical sports rankings, can one tell which player is the greatest of all time (i.e., the GOAT)? In this work, we design a data-driven random walk on the symmetric group to obtain a stationary distribution over player rankings, spanning across different time periods in sports history. We combine this distribution with a notion of stochastic dominance to obtain a partial order over the players. We implement our methods using publicly available data from the Association of Tennis Professionals (ATP) and the Women's Tennis Association (WTA) to find the GOATs in the respective categories.
{"title":"Who's the GOAT? Sports Rankings and Data-Driven Random Walks on the Symmetric Group","authors":"Gian-Gabriel P. Garcia, J. Carlos Martínez Mori","doi":"arxiv-2409.12107","DOIUrl":"https://doi.org/arxiv-2409.12107","url":null,"abstract":"Given a collection of historical sports rankings, can one tell which player\u0000is the greatest of all time (i.e., the GOAT)? In this work, we design a\u0000data-driven random walk on the symmetric group to obtain a stationary\u0000distribution over player rankings, spanning across different time periods in\u0000sports history. We combine this distribution with a notion of stochastic\u0000dominance to obtain a partial order over the players. We implement our methods\u0000using publicly available data from the Association of Tennis Professionals\u0000(ATP) and the Women's Tennis Association (WTA) to find the GOATs in the\u0000respective categories.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ISO/IEC 17000:2020 defines conformity assessment as an "activity to determine whether specified requirements relating to a product, process, system, person or body are fulfilled". JCGM (2012) establishes a framework for accounting for measurement uncertainty in conformity assessment. The focus of JCGM (2012) is on the conformity assessment of individual units of product based on measurements on a cardinal continuous scale. However, the scheme can also be applied to composite assessment targets like finite lots of product or manufacturing processes, and to the evaluation of characteristics in discrete cardinal or nominal scales. We consider the application of the JCGM scheme in the conformity assessment of finite lots or processes of discrete units subject to a dichotomous quality classification as conforming and nonconforming. A lot or process is classified as conforming if the actual proportion nonconforming does not exceed a prescribed upper tolerance limit, otherwise the lot or process is classified as nonconforming. The measurement on the lot or process is a statistical estimation of the proportion nonconforming based on attributes or variables sampling, and meassurement uncertainty is sampling uncertainty. Following JCGM (2012), we analyse the effect of measurement uncertainty (sampling uncertainty) in attributes sampling, and we calculate key conformity assessment parameters, in particular the producer's and consumer's risk. We suggest to integrate such parameters as a useful add-on into ISO acceptance sampling standards such as the ISO 2859 series.
ISO/IEC 17000:2020 将符合性评估定义为 "确定与产品、过程、系统、个人或机构有关的规定要求是否得到满足的活动"。JCGM(2012)为符合性评估中测量不确定性的核算建立了一个框架。JCGM(2012)的重点是基于连续标尺上的测量对单个产品单位进行符合性评估。不过,该方案也可应用于复合评估目标,如有限批次的产品或制造过程,以及离散标度或名义标度的特性评估。我们考虑将 JCGM 方案应用于有限批次或离散单元过程的符合性评估,这些离散单元被二分为合格和不合格两种质量分类。如果不合格的实际比例不超过规定的公差上限,则批量或过程被归类为合格,否则被归类为不合格。对批次或过程的测量是基于属性或无变异抽样对不合格比例的统计估计,测量的不确定性就是抽样的不确定性。根据 JCGM(2012),我们分析了属性抽样中测量不确定性(抽样不确定性)的影响,并计算了关键的合格评定参数,特别是生产者和消费者的风险。我们建议将这些参数作为有用的附加参数纳入 ISO 验收抽样标准,如 ISO 2859 系列标准。
{"title":"Conformity assessment of processes and lots in the framework of JCGM 106:2012","authors":"Rainer Göb, Steffen Uhlig, Bernard Colson","doi":"arxiv-2409.11912","DOIUrl":"https://doi.org/arxiv-2409.11912","url":null,"abstract":"ISO/IEC 17000:2020 defines conformity assessment as an \"activity to determine\u0000whether specified requirements relating to a product, process, system, person\u0000or body are fulfilled\". JCGM (2012) establishes a framework for accounting for\u0000measurement uncertainty in conformity assessment. The focus of JCGM (2012) is\u0000on the conformity assessment of individual units of product based on\u0000measurements on a cardinal continuous scale. However, the scheme can also be\u0000applied to composite assessment targets like finite lots of product or\u0000manufacturing processes, and to the evaluation of characteristics in discrete\u0000cardinal or nominal scales. We consider the application of the JCGM scheme in the conformity assessment\u0000of finite lots or processes of discrete units subject to a dichotomous quality\u0000classification as conforming and nonconforming. A lot or process is classified\u0000as conforming if the actual proportion nonconforming does not exceed a\u0000prescribed upper tolerance limit, otherwise the lot or process is classified as\u0000nonconforming. The measurement on the lot or process is a statistical\u0000estimation of the proportion nonconforming based on attributes or variables\u0000sampling, and meassurement uncertainty is sampling uncertainty. Following JCGM\u0000(2012), we analyse the effect of measurement uncertainty (sampling uncertainty)\u0000in attributes sampling, and we calculate key conformity assessment parameters,\u0000in particular the producer's and consumer's risk. We suggest to integrate such\u0000parameters as a useful add-on into ISO acceptance sampling standards such as\u0000the ISO 2859 series.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua C. Macdonald, Javier Blanco-Portillo, Marcus W. Feldman, Yoav Ram
Principal component analysis (PCA) is often used to analyze multivariate data together with cluster analysis, which depends on the number of principal components used. It is therefore important to determine the number of significant principal components (PCs) extracted from a data set. Here we use a variational Bayesian version of classical PCA, to develop a new method for estimating the number of significant PCs in contexts where the number of samples is of a similar to or greater than the number of features. This eliminates guesswork and potential bias in manually determining the number of principal components and avoids overestimation of variance by filtering noise. This framework can be applied to datasets of different shapes (number of rows and columns), different data types (binary, ordinal, categorical, continuous), and with noisy and missing data. Therefore, it is especially useful for data with arbitrary encodings and similar numbers of rows and columns, such as cultural, ecological, morphological, and behavioral datasets. We tested our method on both synthetic data and empirical datasets and found that it may underestimate but not overestimate the number of principal components for the synthetic data. A small number of components was found for each empirical dataset. These results suggest that it is broadly applicable across the life sciences.
主成分分析(PCA)通常与聚类分析一起用于分析多变量数据,而聚类分析则取决于所使用的主成分数量。因此,确定从数据集中提取的重要主成分(PC)的数量非常重要。在此,我们使用经典 PCA 的变异贝叶斯版本,开发出一种新方法,可以在样本数量与特征数量相近或大于特征数量的情况下,估算出重要 PC 的数量。该框架可应用于不同形状(行列数)、不同数据类型(二元、序数、分类、连续)以及存在噪声和缺失数据的数据集。因此,该方法尤其适用于具有任意编码和类似行列数的数据,如文化、生态、形态和行为数据集。我们在合成数据和经验数据集上测试了我们的方法,发现它可能会低估但不会高估合成数据的主成分数。每个经验数据集的主成分数量都很少。这些结果表明,该方法广泛适用于生命科学领域。
{"title":"Bayesian estimation of the number of significant principal components for cultural data","authors":"Joshua C. Macdonald, Javier Blanco-Portillo, Marcus W. Feldman, Yoav Ram","doi":"arxiv-2409.12129","DOIUrl":"https://doi.org/arxiv-2409.12129","url":null,"abstract":"Principal component analysis (PCA) is often used to analyze multivariate data\u0000together with cluster analysis, which depends on the number of principal\u0000components used. It is therefore important to determine the number of\u0000significant principal components (PCs) extracted from a data set. Here we use a\u0000variational Bayesian version of classical PCA, to develop a new method for\u0000estimating the number of significant PCs in contexts where the number of\u0000samples is of a similar to or greater than the number of features. This\u0000eliminates guesswork and potential bias in manually determining the number of\u0000principal components and avoids overestimation of variance by filtering noise.\u0000This framework can be applied to datasets of different shapes (number of rows\u0000and columns), different data types (binary, ordinal, categorical, continuous),\u0000and with noisy and missing data. Therefore, it is especially useful for data\u0000with arbitrary encodings and similar numbers of rows and columns, such as\u0000cultural, ecological, morphological, and behavioral datasets. We tested our\u0000method on both synthetic data and empirical datasets and found that it may\u0000underestimate but not overestimate the number of principal components for the\u0000synthetic data. A small number of components was found for each empirical\u0000dataset. These results suggest that it is broadly applicable across the life\u0000sciences.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual search is a fundamental natural task for humans and other animals. We investigated the decision processes humans use when searching briefly presented displays having well-separated potential target-object locations. Performance was compared with the Bayesian-optimal decision process under the assumption that the information from the different potential target locations is statistically independent. Surprisingly, humans performed slightly better than optimal, despite humans' substantial loss of sensitivity in the fovea, and the implausibility of the human brain replicating the optimal computations. We show that three factors can quantitatively explain these seemingly paradoxical results. Most importantly, simple and fixed heuristic decision rules reach near optimal search performance. Secondly, foveal neglect primarily affects only the central potential target location. Finally, spatially correlated neural noise causes search performance to exceed that predicted for independent noise. These findings have far-reaching implications for understanding visual search tasks and other identification tasks in humans and other animals.
{"title":"Optimal Visual Search with Highly Heuristic Decision Rules","authors":"Anqi Zhang, Wilson S. Geisler","doi":"arxiv-2409.12124","DOIUrl":"https://doi.org/arxiv-2409.12124","url":null,"abstract":"Visual search is a fundamental natural task for humans and other animals. We\u0000investigated the decision processes humans use when searching briefly presented\u0000displays having well-separated potential target-object locations. Performance\u0000was compared with the Bayesian-optimal decision process under the assumption\u0000that the information from the different potential target locations is\u0000statistically independent. Surprisingly, humans performed slightly better than\u0000optimal, despite humans' substantial loss of sensitivity in the fovea, and the\u0000implausibility of the human brain replicating the optimal computations. We show\u0000that three factors can quantitatively explain these seemingly paradoxical\u0000results. Most importantly, simple and fixed heuristic decision rules reach near\u0000optimal search performance. Secondly, foveal neglect primarily affects only the\u0000central potential target location. Finally, spatially correlated neural noise\u0000causes search performance to exceed that predicted for independent noise. These\u0000findings have far-reaching implications for understanding visual search tasks\u0000and other identification tasks in humans and other animals.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We conducted a literature review of COVID-19 vaccine allocation modelling papers, specifically looking for publications that considered equity. We found that most models did not take equity into account, with the vast majority of publications presenting aggregated results and no results by any subgroup (e.g. age, race, geography, etc). We then give examples of how modelling can be useful to answer equity questions, and highlight some of the findings from the publications that did. Lastly, we describe seven considerations that seem important to consider when including equity in future vaccine allocation models.
{"title":"Equity considerations in COVID-19 vaccine allocation modelling: a literature review","authors":"Eva Rumpler, Marc Lipsitch","doi":"arxiv-2409.11462","DOIUrl":"https://doi.org/arxiv-2409.11462","url":null,"abstract":"We conducted a literature review of COVID-19 vaccine allocation modelling\u0000papers, specifically looking for publications that considered equity. We found\u0000that most models did not take equity into account, with the vast majority of\u0000publications presenting aggregated results and no results by any subgroup (e.g.\u0000age, race, geography, etc). We then give examples of how modelling can be\u0000useful to answer equity questions, and highlight some of the findings from the\u0000publications that did. Lastly, we describe seven considerations that seem\u0000important to consider when including equity in future vaccine allocation\u0000models.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tests for racial bias commonly assess whether two people of different races are treated differently. A fundamental challenge is that, because two people may differ in many ways, factors besides race might explain differences in treatment. Here, we propose a test for bias which circumvents the difficulty of comparing two people by instead assessing whether the $textit{same person}$ is treated differently when their race is perceived differently. We apply our method to test for bias in police traffic stops, finding that the same driver is likelier to be searched or arrested by police when they are perceived as Hispanic than when they are perceived as white. Our test is broadly applicable to other datasets where race, gender, or other identity data are perceived rather than self-reported, and the same person is observed multiple times.
{"title":"Testing for racial bias using inconsistent perceptions of race","authors":"Nora Gera, Emma Pierson","doi":"arxiv-2409.11269","DOIUrl":"https://doi.org/arxiv-2409.11269","url":null,"abstract":"Tests for racial bias commonly assess whether two people of different races\u0000are treated differently. A fundamental challenge is that, because two people\u0000may differ in many ways, factors besides race might explain differences in\u0000treatment. Here, we propose a test for bias which circumvents the difficulty of\u0000comparing two people by instead assessing whether the $textit{same person}$ is\u0000treated differently when their race is perceived differently. We apply our\u0000method to test for bias in police traffic stops, finding that the same driver\u0000is likelier to be searched or arrested by police when they are perceived as\u0000Hispanic than when they are perceived as white. Our test is broadly applicable\u0000to other datasets where race, gender, or other identity data are perceived\u0000rather than self-reported, and the same person is observed multiple times.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"211 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyu LiJason, DayongJason, Wu, Xinyue Ye, Quan Sun
Urban traffic safety is a pressing concern in modern transportation systems, especially in rapidly growing metropolitan areas where increased traffic congestion, complex road networks, and diverse driving behaviors exacerbate the risk of traffic incidents. Traditional traffic crash data analysis offers valuable insights but often overlooks a broader range of road safety risks. Near-crash events, which occur more frequently and signal potential collisions, provide a more comprehensive perspective on traffic safety. However, city-scale analysis of near-crash events remains limited due to the significant challenges in large-scale real-world data collection, processing, and analysis. This study utilizes one month of connected vehicle data, comprising billions of records, to detect and analyze near-crash events across the road network in the City of San Antonio, Texas. We propose an efficient framework integrating spatial-temporal buffering and heading algorithms to accurately identify and map near-crash events. A binary logistic regression model is employed to assess the influence of road geometry, traffic volume, and vehicle types on near-crash risks. Additionally, we examine spatial and temporal patterns, including variations by time of day, day of the week, and road category. The findings of this study show that the vehicles on more than half of road segments will be involved in at least one near-crash event. In addition, more than 50% near-crash events involved vehicles traveling at speeds over 57.98 mph, and many occurred at short distances between vehicles. The analysis also found that wider roadbeds and multiple lanes reduced near-crash risks, while single-unit trucks slightly increased the likelihood of near-crash events. Finally, the spatial-temporal analysis revealed that near-crash risks were most prominent during weekday peak hours, especially in downtown areas.
{"title":"Leveraging Connected Vehicle Data for Near-Crash Detection and Analysis in Urban Environments","authors":"Xinyu LiJason, DayongJason, Wu, Xinyue Ye, Quan Sun","doi":"arxiv-2409.11341","DOIUrl":"https://doi.org/arxiv-2409.11341","url":null,"abstract":"Urban traffic safety is a pressing concern in modern transportation systems,\u0000especially in rapidly growing metropolitan areas where increased traffic\u0000congestion, complex road networks, and diverse driving behaviors exacerbate the\u0000risk of traffic incidents. Traditional traffic crash data analysis offers\u0000valuable insights but often overlooks a broader range of road safety risks.\u0000Near-crash events, which occur more frequently and signal potential collisions,\u0000provide a more comprehensive perspective on traffic safety. However, city-scale\u0000analysis of near-crash events remains limited due to the significant challenges\u0000in large-scale real-world data collection, processing, and analysis. This study\u0000utilizes one month of connected vehicle data, comprising billions of records,\u0000to detect and analyze near-crash events across the road network in the City of\u0000San Antonio, Texas. We propose an efficient framework integrating\u0000spatial-temporal buffering and heading algorithms to accurately identify and\u0000map near-crash events. A binary logistic regression model is employed to assess\u0000the influence of road geometry, traffic volume, and vehicle types on near-crash\u0000risks. Additionally, we examine spatial and temporal patterns, including\u0000variations by time of day, day of the week, and road category. The findings of\u0000this study show that the vehicles on more than half of road segments will be\u0000involved in at least one near-crash event. In addition, more than 50%\u0000near-crash events involved vehicles traveling at speeds over 57.98 mph, and\u0000many occurred at short distances between vehicles. The analysis also found that\u0000wider roadbeds and multiple lanes reduced near-crash risks, while single-unit\u0000trucks slightly increased the likelihood of near-crash events. Finally, the\u0000spatial-temporal analysis revealed that near-crash risks were most prominent\u0000during weekday peak hours, especially in downtown areas.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Self-exciting point processes are widely used to model the contagious effects of crime events living within continuous geographic space, using their occurrence time and locations. However, in urban environments, most events are naturally constrained within the city's street network structure, and the contagious effects of crime are governed by such a network geography. Meanwhile, the complex distribution of urban infrastructures also plays an important role in shaping crime patterns across space. We introduce a novel spatio-temporal-network point process framework for crime modeling that integrates these urban environmental characteristics by incorporating self-attention graph neural networks. Our framework incorporates the street network structure as the underlying event space, where crime events can occur at random locations on the network edges. To realistically capture criminal movement patterns, distances between events are measured using street network distances. We then propose a new mark for a crime event by concatenating the event's crime category with the type of its nearby landmark, aiming to capture how the urban design influences the mixing structures of various crime types. A graph attention network architecture is adopted to learn the existence of mark-to-mark interactions. Extensive experiments on crime data from Valencia, Spain, demonstrate the effectiveness of our framework in understanding the crime landscape and forecasting crime risks across regions.
{"title":"Spatio-Temporal-Network Point Processes for Modeling Crime Events with Landmarks","authors":"Zheng Dong, Jorge Mateu, Yao Xie","doi":"arxiv-2409.10882","DOIUrl":"https://doi.org/arxiv-2409.10882","url":null,"abstract":"Self-exciting point processes are widely used to model the contagious effects\u0000of crime events living within continuous geographic space, using their\u0000occurrence time and locations. However, in urban environments, most events are\u0000naturally constrained within the city's street network structure, and the\u0000contagious effects of crime are governed by such a network geography.\u0000Meanwhile, the complex distribution of urban infrastructures also plays an\u0000important role in shaping crime patterns across space. We introduce a novel\u0000spatio-temporal-network point process framework for crime modeling that\u0000integrates these urban environmental characteristics by incorporating\u0000self-attention graph neural networks. Our framework incorporates the street\u0000network structure as the underlying event space, where crime events can occur\u0000at random locations on the network edges. To realistically capture criminal\u0000movement patterns, distances between events are measured using street network\u0000distances. We then propose a new mark for a crime event by concatenating the\u0000event's crime category with the type of its nearby landmark, aiming to capture\u0000how the urban design influences the mixing structures of various crime types. A\u0000graph attention network architecture is adopted to learn the existence of\u0000mark-to-mark interactions. Extensive experiments on crime data from Valencia,\u0000Spain, demonstrate the effectiveness of our framework in understanding the\u0000crime landscape and forecasting crime risks across regions.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arianna Burzacchi, Nicoletta D'Angelo, David Payares-Garcia, Jorge Mateu
We study noisy calcium imaging data, with a focus on the classification of spike traces. As raw traces obscure the true temporal structure of neuron's activity, we performed a tuned filtering of the calcium concentration using two methods: a biophysical model and a kernel mapping. The former characterizes spike trains related to a particular triggering event, while the latter filters out the signal and refines the selection of the underlying neuronal response. Transitioning from traditional time series analysis to point process theory, the study explores spike-time distance metrics and point pattern prototypes to describe repeated observations. We assume that the analyzed neuron's firing events, i.e. spike occurrences, are temporal point process events. In particular, the study aims to categorize 47 point patterns by depth, assuming the similarity of spike occurrences within specific depth categories. The results highlight the pivotal roles of depth and stimuli in discerning diverse temporal structures of neuron firing events, confirming the point process approach based on prototype analysis is largely useful in the classification of spike traces.
{"title":"A point process approach for the classification of noisy calcium imaging data","authors":"Arianna Burzacchi, Nicoletta D'Angelo, David Payares-Garcia, Jorge Mateu","doi":"arxiv-2409.10409","DOIUrl":"https://doi.org/arxiv-2409.10409","url":null,"abstract":"We study noisy calcium imaging data, with a focus on the classification of\u0000spike traces. As raw traces obscure the true temporal structure of neuron's\u0000activity, we performed a tuned filtering of the calcium concentration using two\u0000methods: a biophysical model and a kernel mapping. The former characterizes\u0000spike trains related to a particular triggering event, while the latter filters\u0000out the signal and refines the selection of the underlying neuronal response.\u0000Transitioning from traditional time series analysis to point process theory,\u0000the study explores spike-time distance metrics and point pattern prototypes to\u0000describe repeated observations. We assume that the analyzed neuron's firing\u0000events, i.e. spike occurrences, are temporal point process events. In\u0000particular, the study aims to categorize 47 point patterns by depth, assuming\u0000the similarity of spike occurrences within specific depth categories. The\u0000results highlight the pivotal roles of depth and stimuli in discerning diverse\u0000temporal structures of neuron firing events, confirming the point process\u0000approach based on prototype analysis is largely useful in the classification of\u0000spike traces.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Liu, Jiacheng Gu, Xiyuan Huang, Junjie Shi, Tongtong Feng, Ning He
Accurate sports prediction is a crucial skill for professional coaches, which can assist in developing effective training strategies and scientific competition tactics. Traditional methods often use complex mathematical statistical techniques to boost predictability, but this often is limited by dataset scale and has difficulty handling long-term predictions with variable distributions, notably underperforming when predicting point-set-game multi-level matches. To deal with this challenge, this paper proposes TM2, a TCDformer-based Momentum Transfer Model for long-term sports prediction, which encompasses a momentum encoding module and a prediction module based on momentum transfer. TM2 initially encodes momentum in large-scale unstructured time series using the local linear scaling approximation (LLSA) module. Then it decomposes the reconstructed time series with momentum transfer into trend and seasonal components. The final prediction results are derived from the additive combination of a multilayer perceptron (MLP) for predicting trend components and wavelet attention mechanisms for seasonal components. Comprehensive experimental results show that on the 2023 Wimbledon men's tournament datasets, TM2 significantly surpasses existing sports prediction models in terms of performance, reducing MSE by 61.64% and MAE by 63.64%.
{"title":"TCDformer-based Momentum Transfer Model for Long-term Sports Prediction","authors":"Hui Liu, Jiacheng Gu, Xiyuan Huang, Junjie Shi, Tongtong Feng, Ning He","doi":"arxiv-2409.10176","DOIUrl":"https://doi.org/arxiv-2409.10176","url":null,"abstract":"Accurate sports prediction is a crucial skill for professional coaches, which\u0000can assist in developing effective training strategies and scientific\u0000competition tactics. Traditional methods often use complex mathematical\u0000statistical techniques to boost predictability, but this often is limited by\u0000dataset scale and has difficulty handling long-term predictions with variable\u0000distributions, notably underperforming when predicting point-set-game\u0000multi-level matches. To deal with this challenge, this paper proposes TM2, a\u0000TCDformer-based Momentum Transfer Model for long-term sports prediction, which\u0000encompasses a momentum encoding module and a prediction module based on\u0000momentum transfer. TM2 initially encodes momentum in large-scale unstructured\u0000time series using the local linear scaling approximation (LLSA) module. Then it\u0000decomposes the reconstructed time series with momentum transfer into trend and\u0000seasonal components. The final prediction results are derived from the additive\u0000combination of a multilayer perceptron (MLP) for predicting trend components\u0000and wavelet attention mechanisms for seasonal components. Comprehensive\u0000experimental results show that on the 2023 Wimbledon men's tournament datasets,\u0000TM2 significantly surpasses existing sports prediction models in terms of\u0000performance, reducing MSE by 61.64% and MAE by 63.64%.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"188 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}