首页 > 最新文献

arXiv - STAT - Other Statistics最新文献

英文 中文
A framework for understanding data science 了解数据科学的框架
Pub Date : 2024-02-14 DOI: arxiv-2403.00776
Michael L Brodie
The objective of this research is to provide a framework with which the datascience community can understand, define, and develop data science as a fieldof inquiry. The framework is based on the classical reference framework(axiology, ontology, epistemology, methodology) used for 200 years to defineknowledge discovery paradigms and disciplines in the humanities, sciences,algorithms, and now data science. I augmented it for automated problem-solvingwith (methods, technology, community). The resulting data science referenceframework is used to define the data science knowledge discovery paradigm interms of the philosophy of data science addressed in previous papers and thedata science problem-solving paradigm, i.e., the data science method, and thedata science problem-solving workflow, both addressed in this paper. Theframework is a much called for unifying framework for data science as itcontains the components required to define data science. For insights to betterunderstand data science, this paper uses the framework to define the emerging,often enigmatic, data science problem-solving paradigm and workflow, and tocompare them with their well-understood scientific counterparts, scientificproblem-solving paradigm and workflow.
这项研究的目的是提供一个框架,使数据科学界能够理解、定义和发展数据科学,将其作为一个研究领域。该框架基于经典参考框架(公理、本体、认识论、方法论),该框架已在人文、科学、算法以及现在的数据科学领域使用了 200 年,用于定义知识发现范式和学科。我用(方法、技术、社区)对其进行了扩充,以便自动解决问题。由此产生的数据科学参考框架被用来定义数据科学知识发现范式,即前几篇论文中论述的数据科学哲学和本文中论述的数据科学问题解决范式,即数据科学方法和数据科学问题解决工作流。该框架是一个备受关注的数据科学统一框架,因为它包含了定义数据科学所需的各个组成部分。为了更好地理解数据科学,本文使用该框架来定义新兴的、往往是神秘的数据科学问题解决范式和工作流程,并将它们与人们熟知的科学对应范式--科学问题解决范式和工作流程--进行比较。
{"title":"A framework for understanding data science","authors":"Michael L Brodie","doi":"arxiv-2403.00776","DOIUrl":"https://doi.org/arxiv-2403.00776","url":null,"abstract":"The objective of this research is to provide a framework with which the data\u0000science community can understand, define, and develop data science as a field\u0000of inquiry. The framework is based on the classical reference framework\u0000(axiology, ontology, epistemology, methodology) used for 200 years to define\u0000knowledge discovery paradigms and disciplines in the humanities, sciences,\u0000algorithms, and now data science. I augmented it for automated problem-solving\u0000with (methods, technology, community). The resulting data science reference\u0000framework is used to define the data science knowledge discovery paradigm in\u0000terms of the philosophy of data science addressed in previous papers and the\u0000data science problem-solving paradigm, i.e., the data science method, and the\u0000data science problem-solving workflow, both addressed in this paper. The\u0000framework is a much called for unifying framework for data science as it\u0000contains the components required to define data science. For insights to better\u0000understand data science, this paper uses the framework to define the emerging,\u0000often enigmatic, data science problem-solving paradigm and workflow, and to\u0000compare them with their well-understood scientific counterparts, scientific\u0000problem-solving paradigm and workflow.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable, synergy-first backbone decomposition of higher-order structures in complex systems 对复杂系统中的高阶结构进行可扩展、协同效应优先的骨干分解
Pub Date : 2024-02-13 DOI: arxiv-2402.08135
Thomas F. Varley
Since its introduction in 2011, the partial information decomposition (PID)has triggered an explosion of interest in the field of multivariate informationtheory and the study of emergent, higher-order ("synergistic") interactions incomplex systems. Despite its power, however, the PID has a number oflimitations that restrict its general applicability: it scales poorly withsystem size and the standard approach to decomposition hinges on a definitionof "redundancy", leaving synergy only vaguely defined as "that information notredundant." Other heuristic measures, such as the O-information, have beenintroduced, although these measures typically only provided a summary statisticof redundancy/synergy dominance, rather than direct insight into the synergyitself. To address this issue, we present an alternative decomposition that issynergy-first, scales much more gracefully than the PID, and has astraightforward interpretation. Our approach defines synergy as thatinformation in a set that would be lost following the minimally invasiveperturbation on any single element. By generalizing this idea to sets ofelements, we construct a totally ordered "backbone" of partial synergy atomsthat sweeps systems scales. Our approach starts with entropy, but can begeneralized to the Kullback-Leibler divergence, and by extension, to the totalcorrelation and the single-target mutual information. Finally, we show thatthis approach can be used to decompose higher-order interactions beyond justinformation theory: we demonstrate this by showing how synergistic combinationsof pairwise edges in a complex network supports signal communicability andglobal integration. We conclude by discussing how this perspective onsynergistic structure (information-based or otherwise) can deepen ourunderstanding of part-whole relationships in complex systems.
部分信息分解(PID)自 2011 年问世以来,在多元信息理论领域以及对复杂系统中出现的高阶("协同")相互作用的研究中引发了极大的兴趣。然而,尽管 PID 功能强大,但它也有许多局限性,限制了它的普遍适用性:随着系统规模的扩大,它的扩展性很差,而且标准的分解方法依赖于 "冗余 "的定义,协同作用只能模糊地定义为 "不冗余的信息"。其他启发式测量方法,如 O-信息,也已被引入,不过这些方法通常只能提供冗余/协同优势的汇总统计,而不能直接洞察协同本身。为了解决这个问题,我们提出了另一种分解方法,它以协同作用为先,比 PID 更容易扩展,并且具有直接的解释。我们的方法将协同作用定义为集合中的信息,这些信息在对任何单一元素进行最小侵入性扰动后都会丢失。通过将这一概念推广到元素集合,我们构建了一个完全有序的部分协同原子 "骨干",它可以横扫系统尺度。我们的方法从熵开始,但可以推广到库尔贝克-莱伯勒发散,进而推广到总相关性和单目标互信息。最后,我们展示了这种方法可用于分解信息论之外的高阶交互作用:我们通过展示复杂网络中成对边缘的协同组合如何支持信号可传播性和全球整合来证明这一点。最后,我们将讨论这种关于协同结构(基于信息或其他)的观点如何加深我们对复杂系统中部分-整体关系的理解。
{"title":"A scalable, synergy-first backbone decomposition of higher-order structures in complex systems","authors":"Thomas F. Varley","doi":"arxiv-2402.08135","DOIUrl":"https://doi.org/arxiv-2402.08135","url":null,"abstract":"Since its introduction in 2011, the partial information decomposition (PID)\u0000has triggered an explosion of interest in the field of multivariate information\u0000theory and the study of emergent, higher-order (\"synergistic\") interactions in\u0000complex systems. Despite its power, however, the PID has a number of\u0000limitations that restrict its general applicability: it scales poorly with\u0000system size and the standard approach to decomposition hinges on a definition\u0000of \"redundancy\", leaving synergy only vaguely defined as \"that information not\u0000redundant.\" Other heuristic measures, such as the O-information, have been\u0000introduced, although these measures typically only provided a summary statistic\u0000of redundancy/synergy dominance, rather than direct insight into the synergy\u0000itself. To address this issue, we present an alternative decomposition that is\u0000synergy-first, scales much more gracefully than the PID, and has a\u0000straightforward interpretation. Our approach defines synergy as that\u0000information in a set that would be lost following the minimally invasive\u0000perturbation on any single element. By generalizing this idea to sets of\u0000elements, we construct a totally ordered \"backbone\" of partial synergy atoms\u0000that sweeps systems scales. Our approach starts with entropy, but can be\u0000generalized to the Kullback-Leibler divergence, and by extension, to the total\u0000correlation and the single-target mutual information. Finally, we show that\u0000this approach can be used to decompose higher-order interactions beyond just\u0000information theory: we demonstrate this by showing how synergistic combinations\u0000of pairwise edges in a complex network supports signal communicability and\u0000global integration. We conclude by discussing how this perspective on\u0000synergistic structure (information-based or otherwise) can deepen our\u0000understanding of part-whole relationships in complex systems.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139764541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Mathlink Cubes to Introduce Data Wrangling with Examples in R 使用 Mathlink 立方体介绍数据整理,附 R 示例
Pub Date : 2024-02-10 DOI: arxiv-2402.07029
Lucy D'Agostino McGowan
This paper explores an innovative approach to teaching data wrangling skillsto students through hands-on activities before transitioning to coding. Datawrangling, a critical aspect of data analysis, involves cleaning, transforming,and restructuring data. We introduce the use of a physical tool, mathlinkcubes, to facilitate a tangible understanding of data sets. This approach helpsstudents grasp the concepts of data wrangling before implementing them incoding languages such as R. We detail a classroom activity that includeshands-on tasks paralleling common data wrangling processes such as filtering,selecting, and mutating, followed by their coding equivalents using R's `dplyr`package.
本文探讨了一种在过渡到编码之前通过实践活动向学生传授数据整理技能的创新方法。数据整理是数据分析的一个重要方面,涉及数据的清理、转换和重组。我们介绍了一种物理工具--数学链接立方体--的使用,以促进对数据集的具体理解。我们详细介绍了一个课堂活动,其中包括与过滤、选择和突变等常见数据处理过程并行的实践任务,以及使用 R 的 "dplyr "包进行的等效编码。
{"title":"Using Mathlink Cubes to Introduce Data Wrangling with Examples in R","authors":"Lucy D'Agostino McGowan","doi":"arxiv-2402.07029","DOIUrl":"https://doi.org/arxiv-2402.07029","url":null,"abstract":"This paper explores an innovative approach to teaching data wrangling skills\u0000to students through hands-on activities before transitioning to coding. Data\u0000wrangling, a critical aspect of data analysis, involves cleaning, transforming,\u0000and restructuring data. We introduce the use of a physical tool, mathlink\u0000cubes, to facilitate a tangible understanding of data sets. This approach helps\u0000students grasp the concepts of data wrangling before implementing them in\u0000coding languages such as R. We detail a classroom activity that includes\u0000hands-on tasks paralleling common data wrangling processes such as filtering,\u0000selecting, and mutating, followed by their coding equivalents using R's `dplyr`\u0000package.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139764450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool 学术研究中人工智能生成文本的定量分析:使用人工智能检测工具对 Arxiv 论文中人工智能存在情况的研究
Pub Date : 2024-02-09 DOI: arxiv-2403.13812
Arslan Akram
Many people are interested in ChatGPT since it has become a prominent AIGCmodel that provides high-quality responses in various contexts, such assoftware development and maintenance. Misuse of ChatGPT might cause significantissues, particularly in public safety and education, despite its immensepotential. The majority of researchers choose to publish their work on Arxiv.The effectiveness and originality of future work depend on the ability todetect AI components in such contributions. To address this need, this studywill analyze a method that can see purposely manufactured content that academicorganizations use to post on Arxiv. For this study, a dataset was created usingphysics, mathematics, and computer science articles. Using the newly builtdataset, the following step is to put originality.ai through its paces. Thestatistical analysis shows that Originality.ai is very accurate, with a rate of98%.
许多人都对 ChatGPT 感兴趣,因为它已成为一个著名的 AIGC 模型,可在软件开发和维护等各种情况下提供高质量的响应。尽管 ChatGPT 潜力巨大,但滥用 ChatGPT 可能会引发重大问题,尤其是在公共安全和教育领域。大多数研究人员都选择在 Arxiv 上发表自己的研究成果。未来工作的有效性和原创性取决于能否检测出这些贡献中的人工智能成分。为了满足这一需求,本研究将分析一种方法,该方法可以发现学术组织在 Arxiv 上发布的特意制造的内容。本研究使用物理学、数学和计算机科学文章创建了一个数据集。利用新建立的数据集,我们将对 originality.ai 进行测试。统计分析显示,Originality.ai 的准确率高达 98%。
{"title":"Quantitative Analysis of AI-Generated Texts in Academic Research: A Study of AI Presence in Arxiv Submissions using AI Detection Tool","authors":"Arslan Akram","doi":"arxiv-2403.13812","DOIUrl":"https://doi.org/arxiv-2403.13812","url":null,"abstract":"Many people are interested in ChatGPT since it has become a prominent AIGC\u0000model that provides high-quality responses in various contexts, such as\u0000software development and maintenance. Misuse of ChatGPT might cause significant\u0000issues, particularly in public safety and education, despite its immense\u0000potential. The majority of researchers choose to publish their work on Arxiv.\u0000The effectiveness and originality of future work depend on the ability to\u0000detect AI components in such contributions. To address this need, this study\u0000will analyze a method that can see purposely manufactured content that academic\u0000organizations use to post on Arxiv. For this study, a dataset was created using\u0000physics, mathematics, and computer science articles. Using the newly built\u0000dataset, the following step is to put originality.ai through its paces. The\u0000statistical analysis shows that Originality.ai is very accurate, with a rate of\u000098%.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Malaria incidence and prevalence: An ecological analysis through Six Sigma approach 疟疾发病率和流行率:通过六西格玛方法进行生态分析
Pub Date : 2024-02-03 DOI: arxiv-2402.02233
Md. Al-Amin, Kesava Chandran Vijaya Bhaskar, Walaa Enab, Reza Kamali Miab, Jennifer Slavin, Nigar Sultana
Malaria is the leading cause of death globally, especially in sub-SaharanAfrican countries claiming over 400,000 deaths globally each year, underscoringthe critical need for continued efforts to combat this preventable andtreatable disease. The objective of this study is to provide statisticalguidance on the optimal preventive and control measures against malaria. Datahave been collected from reliable sources, such as World Health Organization,UNICEF, Our World in Data, and STATcompiler. Data were categorized according tothe factors and sub-factors related to deaths caused by malaria. These factorsand sub-factors were determined based on root cause analysis and data sources.Using JMP 16 Pro software, both linear and multiple linear regression wereconducted to analyze the data. The analyses aimed to establish a linearrelationship between the dependent variable (malaria deaths in the overallpopulation) and independent variables, such as life expectancy, malariaprevalence in children, net usage, indoor residual spraying usage, literatepopulation, and population with inadequate sanitation in each selected samplecountry. The statistical analysis revealed that using insecticide treated nets(ITNs) by children and individuals significantly decreased the death count, as1,000 individuals sleeping under ITNs could reduce the death count by eight.Based on the statistical analysis, this study suggests more rigorous researchon the usage of ITNs.
疟疾是导致全球死亡的主要原因,尤其是在撒哈拉以南非洲国家,每年全球有 40 多万人死于疟疾,这凸显了继续努力防治这种可预防、可治疗疾病的迫切需要。本研究的目的是为疟疾的最佳预防和控制措施提供统计指导。数据收集自可靠来源,如世界卫生组织、联合国儿童基金会、数据中的我们的世界和 STATcompiler。数据按照与疟疾导致的死亡有关的因素和子因素进行了分类。使用 JMP 16 Pro 软件对数据进行了线性回归和多元线性回归分析。这些分析旨在建立因变量(总人口中的疟疾死亡人数)与自变量之间的线性关系,自变量包括每个选定样本国家的预期寿命、儿童疟疾发病率、蚊帐使用率、室内滞留喷洒使用率、识字人口和卫生条件不足的人口。统计分析显示,儿童和个人使用驱虫蚊帐可显著减少死亡人数,因为 1,000 人睡在驱虫蚊帐内可减少 8 人死亡。
{"title":"Malaria incidence and prevalence: An ecological analysis through Six Sigma approach","authors":"Md. Al-Amin, Kesava Chandran Vijaya Bhaskar, Walaa Enab, Reza Kamali Miab, Jennifer Slavin, Nigar Sultana","doi":"arxiv-2402.02233","DOIUrl":"https://doi.org/arxiv-2402.02233","url":null,"abstract":"Malaria is the leading cause of death globally, especially in sub-Saharan\u0000African countries claiming over 400,000 deaths globally each year, underscoring\u0000the critical need for continued efforts to combat this preventable and\u0000treatable disease. The objective of this study is to provide statistical\u0000guidance on the optimal preventive and control measures against malaria. Data\u0000have been collected from reliable sources, such as World Health Organization,\u0000UNICEF, Our World in Data, and STATcompiler. Data were categorized according to\u0000the factors and sub-factors related to deaths caused by malaria. These factors\u0000and sub-factors were determined based on root cause analysis and data sources.\u0000Using JMP 16 Pro software, both linear and multiple linear regression were\u0000conducted to analyze the data. The analyses aimed to establish a linear\u0000relationship between the dependent variable (malaria deaths in the overall\u0000population) and independent variables, such as life expectancy, malaria\u0000prevalence in children, net usage, indoor residual spraying usage, literate\u0000population, and population with inadequate sanitation in each selected sample\u0000country. The statistical analysis revealed that using insecticide treated nets\u0000(ITNs) by children and individuals significantly decreased the death count, as\u00001,000 individuals sleeping under ITNs could reduce the death count by eight.\u0000Based on the statistical analysis, this study suggests more rigorous research\u0000on the usage of ITNs.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science 老年生物统计学 2.0:数据科学时代的 10 多年发展历程
Pub Date : 2024-02-02 DOI: arxiv-2402.01112
Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy
Background: Introduced in 2010, the sub-discipline of gerontologicbiostatistics (GBS) was conceptualized to address the specific challenges inanalyzing data from research studies involving older adults. However, theevolving technological landscape has catalyzed data science and statisticaladvancements since the original GBS publication, greatly expanding the scope ofgerontologic research. There is a need to describe how these advancementsenhance the analysis of multi-modal data and complex phenotypes that arehallmarks of gerontologic research. Methods: This paper introduces GBS 2.0, anupdated and expanded set of analytical methods reflective of the practice ofgerontologic biostatistics in contemporary and future research. Results: GBS2.0 topics and relevant software resources include cutting-edge methods inexperimental design; analytical techniques that include adaptations of machinelearning, quantifying deep phenotypic measurements, high-dimensional -omicsanalysis; the integration of information from multiple studies, and strategiesto foster reproducibility, replicability, and open science. Discussion: Themethodological topics presented here seek to update and expand GBS. Byfacilitating the synthesis of biostatistics and data science in gerontology, weaim to foster the next generation of gerontologic researchers.
背景:老年生物统计学(Gerontologicbiostatistics,GBS)这一分支学科于 2010 年提出,旨在解决分析老年人研究数据时遇到的具体挑战。然而,自最初的 GBS 出版以来,不断发展的技术环境推动了数据科学和统计学的进步,大大扩展了老年学研究的范围。我们有必要说明这些进步是如何提高多模态数据和复杂表型的分析能力的,而这些正是老年学研究的特点。方法:本文介绍了 GBS 2.0,这是一套经过更新和扩展的分析方法,反映了老年生物统计学在当代和未来研究中的实践。结果:GBS2.0 的主题和相关软件资源包括:非实验设计的前沿方法;分析技术,包括机器学习的调整、深度表型测量的量化、高维组学分析;来自多项研究的信息整合,以及促进可重复性、可复制性和开放科学的策略。讨论:本文介绍的方法学主题旨在更新和扩展全球生物统计系统。通过促进生物统计学和数据科学在老年学中的综合应用,我们希望培养下一代老年学研究人员。
{"title":"Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science","authors":"Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy","doi":"arxiv-2402.01112","DOIUrl":"https://doi.org/arxiv-2402.01112","url":null,"abstract":"Background: Introduced in 2010, the sub-discipline of gerontologic\u0000biostatistics (GBS) was conceptualized to address the specific challenges in\u0000analyzing data from research studies involving older adults. However, the\u0000evolving technological landscape has catalyzed data science and statistical\u0000advancements since the original GBS publication, greatly expanding the scope of\u0000gerontologic research. There is a need to describe how these advancements\u0000enhance the analysis of multi-modal data and complex phenotypes that are\u0000hallmarks of gerontologic research. Methods: This paper introduces GBS 2.0, an\u0000updated and expanded set of analytical methods reflective of the practice of\u0000gerontologic biostatistics in contemporary and future research. Results: GBS\u00002.0 topics and relevant software resources include cutting-edge methods in\u0000experimental design; analytical techniques that include adaptations of machine\u0000learning, quantifying deep phenotypic measurements, high-dimensional -omics\u0000analysis; the integration of information from multiple studies, and strategies\u0000to foster reproducibility, replicability, and open science. Discussion: The\u0000methodological topics presented here seek to update and expand GBS. By\u0000facilitating the synthesis of biostatistics and data science in gerontology, we\u0000aim to foster the next generation of gerontologic researchers.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139690246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of regularised estimation methods and cross-validation in spatiotemporal statistics 时空统计中的正则化估计方法和交叉验证综述
Pub Date : 2024-01-31 DOI: arxiv-2402.00183
Philipp Otto, Alessandro Fassò, Paolo Maranzano
This review article focuses on regularised estimation procedures applicableto geostatistical and spatial econometric models. These methods areparticularly relevant in the case of big geospatial data for dimensionalityreduction or model selection. To structure the review, we initially considerthe most general case of multivariate spatiotemporal processes (i.e., $g > 1$dimensions of the spatial domain, a one-dimensional temporal domain, and $qgeq 1$ random variables). Then, the idea of regularised/penalised estimationprocedures and different choices of shrinkage targets are discussed. Finally,guided by the elements of a mixed-effects model, which allows for a variety ofspatiotemporal models, we show different regularisation procedures and how theycan be used for the analysis of geo-referenced data, e.g. for selection ofrelevant regressors, dimensionality reduction of the covariance matrices,detection of conditionally independent locations, or the estimation of a fullspatial interaction matrix.
这篇综述文章的重点是适用于地理统计和空间计量经济学模型的正则化估计程序。这些方法尤其适用于地理空间大数据的降维或模型选择。为了安排综述的结构,我们首先考虑多变量时空过程的最一般情况(即空间域的维数为 $g > 1$,时间域为一维,随机变量为 $qgeq 1$)。然后,讨论了正则化/惩罚性估计程序的思想和收缩目标的不同选择。最后,在混合效应模型元素的指导下,我们展示了不同的正则化程序,以及如何将它们用于地理参考数据的分析,例如选择相关回归因子、降低协方差矩阵的维度、检测条件独立的位置或估计全空间交互矩阵。
{"title":"A review of regularised estimation methods and cross-validation in spatiotemporal statistics","authors":"Philipp Otto, Alessandro Fassò, Paolo Maranzano","doi":"arxiv-2402.00183","DOIUrl":"https://doi.org/arxiv-2402.00183","url":null,"abstract":"This review article focuses on regularised estimation procedures applicable\u0000to geostatistical and spatial econometric models. These methods are\u0000particularly relevant in the case of big geospatial data for dimensionality\u0000reduction or model selection. To structure the review, we initially consider\u0000the most general case of multivariate spatiotemporal processes (i.e., $g > 1$\u0000dimensions of the spatial domain, a one-dimensional temporal domain, and $q\u0000geq 1$ random variables). Then, the idea of regularised/penalised estimation\u0000procedures and different choices of shrinkage targets are discussed. Finally,\u0000guided by the elements of a mixed-effects model, which allows for a variety of\u0000spatiotemporal models, we show different regularisation procedures and how they\u0000can be used for the analysis of geo-referenced data, e.g. for selection of\u0000relevant regressors, dimensionality reduction of the covariance matrices,\u0000detection of conditionally independent locations, or the estimation of a full\u0000spatial interaction matrix.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139668263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-Centric and Integrative Lighting Asset Management in Public Libraries: Qualitative Insights and Challenges from a Swedish Field Study 公共图书馆以人为本的综合照明资产管理:来自瑞典实地研究的定性见解和挑战
Pub Date : 2024-01-19 DOI: arxiv-2401.11000
Jing Lin, Per Olof Hedekvist, Nina Mylly, Math Bollen, Jingchun Shen, Jiawei Xiong, Christofer Silfvenius
Traditional lighting source reliability evaluations, often covering just halfof a lamp's volume, can misrepresent real-world performance. To overcome theselimitations,adopting advanced asset management strategies for a more holisticevaluation is crucial. This paper investigates human-centric and integrativelighting asset management in Swedish public libraries. Through fieldobservations, interviews, and gap analysis, the study highlights a disparitybetween current lighting conditions and stakeholder expectations, with issueslike eye strain suggesting significant improvement potential. We propose ashift towards more dynamic lighting asset management and reliabilityevaluations, emphasizing continuous enhancement and comprehensive training inhuman-centric and integrative lighting principles.
传统的照明光源可靠性评估通常只涉及灯具体积的一半,可能会误导实际性能。为了克服这些局限性,采用先进的资产管理策略进行更全面的评估至关重要。本文研究了瑞典公共图书馆以人为本的综合照明资产管理。通过实地观察、访谈和差距分析,该研究强调了当前照明条件与利益相关者期望之间的差距,其中眼睛疲劳等问题显示了巨大的改进潜力。我们建议转向更动态的照明资产管理和可靠性评估,强调以人为本和综合照明原则的持续改进和全面培训。
{"title":"Human-Centric and Integrative Lighting Asset Management in Public Libraries: Qualitative Insights and Challenges from a Swedish Field Study","authors":"Jing Lin, Per Olof Hedekvist, Nina Mylly, Math Bollen, Jingchun Shen, Jiawei Xiong, Christofer Silfvenius","doi":"arxiv-2401.11000","DOIUrl":"https://doi.org/arxiv-2401.11000","url":null,"abstract":"Traditional lighting source reliability evaluations, often covering just half\u0000of a lamp's volume, can misrepresent real-world performance. To overcome these\u0000limitations,adopting advanced asset management strategies for a more holistic\u0000evaluation is crucial. This paper investigates human-centric and integrative\u0000lighting asset management in Swedish public libraries. Through field\u0000observations, interviews, and gap analysis, the study highlights a disparity\u0000between current lighting conditions and stakeholder expectations, with issues\u0000like eye strain suggesting significant improvement potential. We propose a\u0000shift towards more dynamic lighting asset management and reliability\u0000evaluations, emphasizing continuous enhancement and comprehensive training in\u0000human-centric and integrative lighting principles.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139556295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radius selection using kernel density estimation for the computation of nonlinear measures 利用核密度估计进行半径选择,以计算非线性测量值
Pub Date : 2024-01-08 DOI: arxiv-2401.03891
Johan Medrano, Abderrahmane Kheddar, Annick Lesne, Sofiane Ramdani
When nonlinear measures are estimated from sampled temporal signals withfinite-length, a radius parameter must be carefully selected to avoid a poorestimation. These measures are generally derived from the correlation integralwhich quantifies the probability of finding neighbors, i.e. pair of pointsspaced by less than the radius parameter. While each nonlinear measure comeswith several specific empirical rules to select a radius value, we provide asystematic selection method. We show that the optimal radius for nonlinearmeasures can be approximated by the optimal bandwidth of a Kernel DensityEstimator (KDE) related to the correlation sum. The KDE framework providesnon-parametric tools to approximate a density function from finite samples(e.g. histograms) and optimal methods to select a smoothing parameter, thebandwidth (e.g. bin width in histograms). We use results from KDE to derive aclosed-form expression for the optimal radius. The latter is used to computethe correlation dimension and to construct recurrence plots yielding anestimate of Kolmogorov-Sinai entropy. We assess our method through numericalexperiments on signals generated by nonlinear systems and experimentalelectroencephalographic time series.
从无限长的采样时间信号中估计非线性度量时,必须仔细选择半径参数,以避免估计结果不佳。这些度量通常由相关积分推导而来,相关积分量化了找到邻近点(即间距小于半径参数的点对)的概率。虽然每种非线性度量都有几种特定的经验规则来选择半径值,但我们提供了一种系统的选择方法。我们证明,非线性度量的最佳半径可以用与相关性总和相关的核密度估计器(KDE)的最佳带宽来近似。KDE 框架提供了从有限样本(如直方图)近似密度函数的非参数工具,以及选择平滑参数--带宽(如直方图中的二进制宽度)的最优方法。我们利用 KDE 的结果推导出最优半径的封闭式表达式。后者用于计算相关维度和构建递归图,从而得出柯尔莫哥洛夫-西奈熵的估计值。我们通过对非线性系统产生的信号和脑电图时间序列进行数值实验来评估我们的方法。
{"title":"Radius selection using kernel density estimation for the computation of nonlinear measures","authors":"Johan Medrano, Abderrahmane Kheddar, Annick Lesne, Sofiane Ramdani","doi":"arxiv-2401.03891","DOIUrl":"https://doi.org/arxiv-2401.03891","url":null,"abstract":"When nonlinear measures are estimated from sampled temporal signals with\u0000finite-length, a radius parameter must be carefully selected to avoid a poor\u0000estimation. These measures are generally derived from the correlation integral\u0000which quantifies the probability of finding neighbors, i.e. pair of points\u0000spaced by less than the radius parameter. While each nonlinear measure comes\u0000with several specific empirical rules to select a radius value, we provide a\u0000systematic selection method. We show that the optimal radius for nonlinear\u0000measures can be approximated by the optimal bandwidth of a Kernel Density\u0000Estimator (KDE) related to the correlation sum. The KDE framework provides\u0000non-parametric tools to approximate a density function from finite samples\u0000(e.g. histograms) and optimal methods to select a smoothing parameter, the\u0000bandwidth (e.g. bin width in histograms). We use results from KDE to derive a\u0000closed-form expression for the optimal radius. The latter is used to compute\u0000the correlation dimension and to construct recurrence plots yielding an\u0000estimate of Kolmogorov-Sinai entropy. We assess our method through numerical\u0000experiments on signals generated by nonlinear systems and experimental\u0000electroencephalographic time series.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139412931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quotient geometry of bounded or fixed rank correlation matrices 有界或固定秩相关矩阵的商几何
Pub Date : 2024-01-06 DOI: arxiv-2401.03126
Hengchao Chen
This paper studies the quotient geometry of bounded or fixed-rank correlationmatrices. The set of bounded-rank correlation matrices is in bijection with aquotient set of a spherical product manifold by an orthogonal group. We showthat it admits an orbit space structure and its stratification is determined bythe rank of the matrices. Also, the principal stratum has a compatibleRiemannian quotient manifold structure. We develop efficient Riemannianoptimization algorithms for computing the distance and the weighted Frechetmean in the orbit space. We prove that any minimizing geodesic in the orbitspace has constant rank on the interior of the segment. Moreover, we examinegeometric properties of the quotient manifold, including horizontal andvertical spaces, Riemannian metric, injectivity radius, exponential andlogarithmic map, gradient and Hessian.
本文研究有界或定秩相关矩阵的商几何。有界秩相关矩阵集与一个正交群的球积流形的商集是双射的。我们证明它具有轨道空间结构,其分层由矩阵的秩决定。此外,主层具有兼容的黎曼商流形结构。我们开发了高效的黎曼优化算法,用于计算轨道空间中的距离和加权弗雷谢特均值。我们证明了轨道空间中的任何最小化大地线在线段内部具有恒定秩。此外,我们还研究了商流形的几何性质,包括水平空间和垂直空间、黎曼度量、注入半径、指数图和对数图、梯度和黑森。
{"title":"Quotient geometry of bounded or fixed rank correlation matrices","authors":"Hengchao Chen","doi":"arxiv-2401.03126","DOIUrl":"https://doi.org/arxiv-2401.03126","url":null,"abstract":"This paper studies the quotient geometry of bounded or fixed-rank correlation\u0000matrices. The set of bounded-rank correlation matrices is in bijection with a\u0000quotient set of a spherical product manifold by an orthogonal group. We show\u0000that it admits an orbit space structure and its stratification is determined by\u0000the rank of the matrices. Also, the principal stratum has a compatible\u0000Riemannian quotient manifold structure. We develop efficient Riemannian\u0000optimization algorithms for computing the distance and the weighted Frechet\u0000mean in the orbit space. We prove that any minimizing geodesic in the orbit\u0000space has constant rank on the interior of the segment. Moreover, we examine\u0000geometric properties of the quotient manifold, including horizontal and\u0000vertical spaces, Riemannian metric, injectivity radius, exponential and\u0000logarithmic map, gradient and Hessian.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139412935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Other Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1