Wiley Interdisciplinary Reviews-Computational Statistics最新文献

英文中文

Neuroimaging statistical approaches for determining neural correlates of Alzheimer's disease via positron emission tomography imaging. 通过正电子发射断层成像确定阿尔茨海默病神经相关因素的神经影像学统计方法

IF 4.4 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2023-09-01 Epub Date: 2023-04-03 DOI: 10.1002/wics.1606

Daniel F Drake, Gordana Derado, Lijun Zhang, F DuBois Bowman

Alzheimer's disease (AD) is a degenerative disorder involving significant memory loss and other cognitive deficits, manifesting as a progression from normal cognitive functioning to mild cognitive impairment to AD. The sooner an accurate diagnosis of probable AD is made, the easier it is to manage symptoms and plan for future therapy. Functional neuroimaging stands to be a useful tool in achieving early diagnosis. Among the many neuroimaging modalities, positron emission tomography (PET) provides direct regional assessment of, among others, brain metabolism, cerebral blood flow, amyloid deposition-all quantities of interest in the characterization of AD. However, there are analytic challenges in identifying early indicators of AD from these high-dimensional imaging data sets, and it is unclear whether early indicators of AD are more likely to emerge in localized patterns of brain activity or in patterns of correlation between distinct brain regions. Early PET-based analyses of AD focused on alterations in metabolic activity at the voxel-level or in anatomically defined regions of interest. Other approaches, including seed-voxel and multivariate techniques, seek to characterize metabolic connectivity by identifying other regions in the brain with similar patterns of activity across subjects. We briefly review various neuroimaging statistical approaches applied to determine changes in metabolic activity or metabolic connectivity associated with AD. We then present an approach that provides a unified statistical framework for addressing both metabolic activity and connectivity. Specifically, we apply a Bayesian spatial hierarchical framework to longitudinal metabolic PET scans from the Alzheimer's Disease Neuroimaging Initiative.

阿尔茨海默病（AD）是一种退行性疾病，涉及严重的记忆丧失和其他认知缺陷，表现为从正常认知功能到轻度认知障碍再到AD的发展。越早准确诊断出可能的AD，就越容易控制症状并计划未来的治疗。功能性神经影像学是实现早期诊断的有用工具。在许多神经成像模式中，正电子发射断层扫描（PET）提供了对大脑代谢、脑血流、淀粉样蛋白沉积等的直接区域评估，所有这些都是AD表征的重要内容。然而，从这些高维成像数据集中识别AD的早期指标存在分析挑战，目前尚不清楚AD的早期指标是更可能出现在大脑活动的局部模式中，还是出现在不同大脑区域之间的相关性模式中。早期基于PET的AD分析侧重于体素水平或解剖学定义的感兴趣区域的代谢活动变化。其他方法，包括种子体素和多元技术，试图通过识别受试者大脑中具有相似活动模式的其他区域来表征代谢连接。我们简要回顾了用于确定与AD相关的代谢活动或代谢连接性变化的各种神经影像学统计方法。然后，我们提出了一种方法，为解决代谢活动和连接性提供了统一的统计框架。具体来说，我们将贝叶斯空间层次框架应用于阿尔茨海默病神经成像倡议的纵向代谢PET扫描。

{"title":"Neuroimaging statistical approaches for determining neural correlates of Alzheimer's disease via positron emission tomography imaging.","authors":"Daniel F Drake, Gordana Derado, Lijun Zhang, F DuBois Bowman","doi":"10.1002/wics.1606","DOIUrl":"10.1002/wics.1606","url":null,"abstract":"Alzheimer's disease (AD) is a degenerative disorder involving significant memory loss and other cognitive deficits, manifesting as a progression from normal cognitive functioning to mild cognitive impairment to AD. The sooner an accurate diagnosis of probable AD is made, the easier it is to manage symptoms and plan for future therapy. Functional neuroimaging stands to be a useful tool in achieving early diagnosis. Among the many neuroimaging modalities, positron emission tomography (PET) provides direct regional assessment of, among others, brain metabolism, cerebral blood flow, amyloid deposition-all quantities of interest in the characterization of AD. However, there are analytic challenges in identifying early indicators of AD from these high-dimensional imaging data sets, and it is unclear whether early indicators of AD are more likely to emerge in localized patterns of brain activity or in patterns of correlation between distinct brain regions. Early PET-based analyses of AD focused on alterations in metabolic activity at the voxel-level or in anatomically defined regions of interest. Other approaches, including seed-voxel and multivariate techniques, seek to characterize metabolic connectivity by identifying other regions in the brain with similar patterns of activity across subjects. We briefly review various neuroimaging statistical approaches applied to determine changes in metabolic activity or metabolic connectivity associated with AD. We then present an approach that provides a unified statistical framework for addressing both metabolic activity and connectivity. Specifically, we apply a Bayesian spatial hierarchical framework to longitudinal metabolic PET scans from the Alzheimer's Disease Neuroimaging Initiative.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":"15 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11626230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42907370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A spectrum of explainable and interpretable machine learning approaches for genomic studies 用于基因组研究的一系列可解释和可解释的机器学习方法

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2023-05-04 DOI: 10.1002/wics.1617

A. M. Conard, Alan DenAdel, Lorin Crawford

The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.

高通量基因组分析的进步导致大规模生物数据集的可用性大幅增长。在过去的二十年里，这些日益复杂的数据需要比传统线性模型更复杂的统计方法。神经网络等机器学习方法在许多生物医学应用中为基于预测的任务带来了最先进的性能。然而，这些机器学习模型的一个显著缺点是，它们通常不会揭示如何或为什么做出准确的预测。在生物医学的许多领域，这种“黑匣子”特性可能不太理想，尤其是当需要对生物系统进行计算机假设测试时，除了为下游决策证明模型发现的合理性外，例如确定最佳的下一个实验或治疗策略。为了克服这个问题，出现了可解释和可解释的机器学习方法。虽然可解释的方法试图获得对模型所学内容的事后理解，但可解释的模型被设计为固有地提供其参数和架构的可理解定义。在这里，我们回顾了从黑匣子和可解释到可解释的机器学习方法的模型透明度谱。受基因组学应用的启发，我们提供了这一领域的进展背景，详细介绍了监督和非监督学习的具体方法。重要的是，我们专注于在构建用于生物医学应用的可解释机器学习方法时结合现有生物学知识的前景。然后，我们以这一领域新发展的考虑和机遇作为结束语。

{"title":"A spectrum of explainable and interpretable machine learning approaches for genomic studies","authors":"A. M. Conard, Alan DenAdel, Lorin Crawford","doi":"10.1002/wics.1617","DOIUrl":"https://doi.org/10.1002/wics.1617","url":null,"abstract":"The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49095612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Functional neuroimaging in the era of Big Data and Open Science: A modern overview 大数据和开放科学时代的功能神经影像学:现代概述

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2023-04-19 DOI: 10.1002/wics.1609

N. Lazar

In the past 30 years, the statistical analysis of functional neuroimaging data has made much progress, and spurred many new research directions. At the same time, problems with reproducibility and replicability have plagued the field, owing in part to small sample sizes, a plethora of choices at the data preprocessing stage, and overall lack of transparency in reporting. The latter two in particular pose barriers to statisticians who want to become involved in the area. Recent efforts by some in the neuroimaging community to address these problems represent a turning point. This article highlights the current landscape and provides an introduction to some of the relevant resources in “open neuroimaging.”

过去30年近年来，功能性神经影像学数据的统计分析取得了很大的进展，并激发了许多新的研究方向。与此同时，再现性和可复制性的问题一直困扰着该领域，部分原因是样本量小，数据预处理阶段的选择过多，以及报告总体缺乏透明度。后两者尤其对想要参与这一领域的统计学家构成了障碍。神经影像学界的一些人最近为解决这些问题所做的努力代表了一个转折点。这篇文章强调了当前的形势，并介绍了“开放神经成像”中的一些相关资源

引用次数: 0

Information criteria for model selection 模型选择的信息标准

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2023-02-20 DOI: 10.1002/wics.1607

Jiawei Zhang, Yuhong Yang, Jie Ding

The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general.

建模技术的快速发展为数据驱动的发现和预测带来了许多机会。然而，这也带来了为任何特定的数据任务选择最合适的模型的挑战。信息准则，如Akaike信息准则（AIC）和Bayesian信息准则（BIC），已经发展成为一类与统计学和信息论的基本思想有着深刻联系的一般模型选择方法。为了理解何时以及如何使用信息标准，已经形成了许多观点和理论依据，这些标准通常取决于特定的数据环境。这篇综述文章将通过总结信息标准的关键概念、评估指标、基本特性、相互联系、最新进展和常见误解来重新审视信息标准，以丰富对模型选择的总体理解。

引用次数: 2

Data Integration in Causal Inference. 因果推理中的数据集成。

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2023-01-01 Epub Date: 2022-04-08 DOI: 10.1002/wics.1581

Xu Shi, Ziyang Pan, Wang Miao

Integrating data from multiple heterogeneous sources has become increasingly popular to achieve a large sample size and diverse study population. This paper reviews development in causal inference methods that combines multiple datasets collected by potentially different designs from potentially heterogeneous populations. We summarize recent advances on combining randomized clinical trial with external information from observational studies or historical controls, combining samples when no single sample has all relevant variables with application to two-sample Mendelian randomization, distributed data setting under privacy concerns for comparative effectiveness and safety research using real-world data, Bayesian causal inference, and causal discovery methods.

整合来自多个异构来源的数据越来越受欢迎，以实现大样本量和多样化的研究人群。本文综述了因果推理方法的发展，该方法结合了由潜在异质群体的潜在不同设计收集的多个数据集。我们总结了将随机临床试验与来自观察性研究或历史对照的外部信息相结合的最新进展，在没有单个样本具有所有相关变量的情况下将样本相结合，并应用于两个样本的孟德尔随机化，在隐私考虑下的分布式数据设置，以使用真实世界数据进行比较有效性和安全性研究，贝叶斯因果推断和因果发现方法。

引用次数: 9

A review on authorship attribution in text mining 文本挖掘中作者归属研究综述

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2023-01-01 DOI: 10.1002/wics.1584

Wanwan Zheng, Mingzhe Jin

The issue of authorship attribution has long been considered and continues to be a popular topic. Because of advances in digital computers, this field has experienced rapid developments in the last decade. In this article, a survey of recent advances in authorship attribution in text mining is presented. This survey focuses on authorship attribution methods that are statistically or computationally supported as opposed to traditional literary approaches. The main aspects covered include the changes in research topics over time, basic feature metrics, machine learning techniques, and the advantages and disadvantages of each approach. Moreover, the corpus size, number of candidates, data imbalance, and result description, all of which pose challenges in authorship attribution, are discussed to inform future work.

作者归属问题长期以来一直被认为是一个热门话题。由于数字计算机的进步，这一领域在过去十年中经历了迅速的发展。本文综述了文本挖掘中作者归属研究的最新进展。这项调查的重点是作者归属的方法，是统计或计算支持，而不是传统的文学方法。涵盖的主要方面包括研究主题随时间的变化，基本特征度量，机器学习技术，以及每种方法的优缺点。此外，本文还讨论了语料库规模、候选者数量、数据不平衡和结果描述等对作者归属构成挑战的问题，为今后的工作提供信息。

引用次数: 7

A survey of smoothing techniques based on a backfitting algorithm in estimation of semiparametric additive models 半参数加性模型估计中基于反拟合算法的平滑技术综述

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2022-12-25 DOI: 10.1002/wics.1605

S. E. Ahmed, D. Aydın, E. Yılmaz

This paper aims to present an overview of Semiparametric additive models. An estimation of the finite‐parameters of semiparametric regression models that involve additive nonparametric components is explained, including their historical background. In addition, three different smoothing techniques are considered in order to show the working procedures of the estimators and to explore their statistical properties: smoothing splines, kernel smoothing and local linear regression. These methods are compared with respect to both their theoretical and practical behaviors. A simulation study and a real data example are carried out to reveal the performances of the three methods. Accordingly, the advantages and disadvantages of each method regarding semiparametric additive models are presented based on their comparative scores using determined evaluation metrics for loss of information.

本文对半参数加性模型进行了综述。解释了涉及加性非参数分量的半参数回归模型的有限参数估计，包括其历史背景。此外，为了展示估计量的工作过程并探索其统计特性，还考虑了三种不同的平滑技术：平滑样条、核平滑和局部线性回归。从理论和实践两个方面对这些方法进行了比较。通过仿真研究和实际数据实例，揭示了这三种方法的性能。因此，关于半参数加性模型的每种方法的优缺点都是基于它们的比较得分，使用确定的信息损失评估指标来呈现的。

引用次数: 1

Diseases maps of spatial epidemiological data by R 空间流行病学数据的疾病地图

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2022-12-21 DOI: 10.1002/wics.1604

T. Kubota

Disease maps are essential when analyzing spatial epidemiological data, such as newly detected COVID‐19 positive cases or suicide deaths, because it is necessary to determine the method of analysis in order to perform spatial statistical analysis. Disease maps give an initial overview of the data and provide evidence of regional trends, which the analyst can check. Therefore, in this article, the author aimed to use R, a statistical data analysis tool, to draw spatial epidemiological data in the form of disease maps. This article presents three different methods and analyzes recent trends in COVID‐19 and suicide mortality. The author used monthly data from April, July, and October 2020. The results showed no significant trend in April, but some prefectures showed a negative correlation in July. On the other hand, some prefectures showed a positive correlation in October, confirming the influence of COVID‐19 on suicide by region.

在分析空间流行病学数据(如新发现的COVID - 19阳性病例或自杀死亡)时，疾病地图至关重要，因为有必要确定分析方法，以便进行空间统计分析。疾病地图提供了数据的初步概述，并提供了分析人员可以检查的区域趋势的证据。因此，在本文中，作者旨在使用统计数据分析工具R，以疾病地图的形式绘制空间流行病学数据。本文提出了三种不同的方法，并分析了COVID - 19和自杀死亡率的最新趋势。作者使用了2020年4月、7月和10月的月度数据。结果显示，4月份没有明显的趋势，但部分地区在7月份出现负相关。另一方面，一些县在10月份表现出正相关，证实了COVID - 19对地区自杀的影响。

引用次数: 0

Issue Information 问题信息

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2022-12-01 DOI: 10.1002/wics.1589

引用次数: 0

Error control in tree structured hypothesis testing 树结构假设检验中的误差控制

IF 3.2 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics

Pub Date : 2022-11-25 DOI: 10.1002/wics.1603

J. Miecznikowski, Jiefei Wang

This manuscript reviews some recent and popular error control methods for tree structured hypothesis testing. We review a common setting/definition for hypotheses arranged in a tree structure and we discuss two common Type I errors present in multiple testing: family wise error rates (FWERs) and false discovery rate (FDR). We also contrast these methods with a recent development designed to control the false selection rate (FSR). We discuss the algorithms used to implement these error controls and the strategies used to navigate tree structures in light of these errors. We highlight the assumptions necessary in these strategies, summarize the available R software packages to implement these approaches, and show them at work on an example.

本文综述了一些最近流行的树结构假设检验误差控制方法。我们回顾了树形结构中假设的常见设置/定义，并讨论了多重测试中出现的两种常见I型错误:家庭明智错误率(fwer)和错误发现率(FDR)。我们还将这些方法与最近设计用于控制错误选择率(FSR)的方法进行了对比。我们讨论了用于实现这些错误控制的算法以及用于根据这些错误导航树结构的策略。我们强调了这些策略中必要的假设，总结了实现这些方法的可用R软件包，并在一个示例中展示了它们的工作原理。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Wiley Interdisciplinary Reviews-Computational Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀