首页 > 最新文献

Wiley Interdisciplinary Reviews-Computational Statistics最新文献

英文 中文
Copulae: An overview and recent developments Copulae:综述和最新进展
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-04-03 DOI: 10.1002/wics.1557
Joshua Größer, Ostap Okhrin
Over the decades that have passed since they were introduced, copulae still remain a very powerful tool for modeling and estimating multivariate distributions. This work gives an overview of copula theory and it also summarizes the latest results. This article recalls the basic definition, the most important cases of bivariate copulae, and it then proceeds to a sketch of how multivariate copulae are developed both from bivariate copulae and from scratch. Regarding higher dimensions, the focus is on hierarchical Archimedean, vine, and factor copulae, which are the most often used and most flexible ways to introduce copulae to multivariate distributions. We also provide an overview of how copulae can be used in various fields of data science, including recent results. These fields include but are not limited to time series and machine learning. Finally, we describe estimation and testing methods for copulae in general, their application to the presented copula structures, and we give some specific testing and estimation procedures for those specific copulae.
自引入以来的几十年过去了,copulae仍然是建模和估计多元分布的一个非常强大的工具。本文对copula理论进行了综述,并对最新的研究成果进行了总结。本文回顾了二元交点的基本定义和最重要的情况,然后概述了如何从二元交点和从零开始发展多元交点。对于高维,重点是层次化的阿基米德、vine和因子copulae,这是将copulae引入多元分布的最常用和最灵活的方法。我们还概述了copulae如何在数据科学的各个领域中使用,包括最近的结果。这些领域包括但不限于时间序列和机器学习。最后,我们描述了一般的估计和检验方法,以及它们在所提出的联结结构中的应用,并给出了这些特定联结结构的一些具体的检验和估计步骤。
{"title":"Copulae: An overview and recent developments","authors":"Joshua Größer, Ostap Okhrin","doi":"10.1002/wics.1557","DOIUrl":"https://doi.org/10.1002/wics.1557","url":null,"abstract":"Over the decades that have passed since they were introduced, copulae still remain a very powerful tool for modeling and estimating multivariate distributions. This work gives an overview of copula theory and it also summarizes the latest results. This article recalls the basic definition, the most important cases of bivariate copulae, and it then proceeds to a sketch of how multivariate copulae are developed both from bivariate copulae and from scratch. Regarding higher dimensions, the focus is on hierarchical Archimedean, vine, and factor copulae, which are the most often used and most flexible ways to introduce copulae to multivariate distributions. We also provide an overview of how copulae can be used in various fields of data science, including recent results. These fields include but are not limited to time series and machine learning. Finally, we describe estimation and testing methods for copulae in general, their application to the presented copula structures, and we give some specific testing and estimation procedures for those specific copulae.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1557","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46645427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Differential Network Analysis: A Statistical Perspective. 差分网络分析:统计学视角。
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-03-01 Epub Date: 2020-04-06 DOI: 10.1002/wics.1508
Ali Shojaie

Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.

网络有效地捕捉复杂系统组件之间的相互作用,因此已成为许多科学学科的支柱。越来越多的证据,尤其是来自生物学的证据表明,网络会随着时间的推移以及对外部刺激的反应而发生变化。在生物学和医学中,这些变化被发现可以预测复杂的疾病。它们也被用来深入了解疾病的发生和发展机制。本文主要受生物学应用的启发,综述了最近用于推断网络和识别其结构变化的统计机器学习方法。
{"title":"Differential Network Analysis: A Statistical Perspective.","authors":"Ali Shojaie","doi":"10.1002/wics.1508","DOIUrl":"10.1002/wics.1508","url":null,"abstract":"<p><p>Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.</p>","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":"13 2","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1508","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9364103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Why BDeu? Regular Bayesian network structure learning with discrete and continuous variables 为什么选择BDU?具有离散和连续变量的正则贝叶斯网络结构学习
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-03-01 DOI: 10.1002/wics.1554
J. Suzuki
We consider the problem of Bayesian network structure learning (BNSL) from data. In particular, we focus on the score‐based approach rather than the constraint‐based approach and address what score we should use for the purpose. The Bayesian Dirichlet equivalent uniform (BDeu) has been mainly used within the community of BNs (not outside of it). We know that for any model selection and any data, the fitter the data to a model, the more complex the model, and vice versa. However, recently, it was proven that BDeu violates regularity, which means that it does not balance the two factors, although it works satisfactorily (consistently) when the sample size is infinitely large. In addition, we claim that the merit of using the regular scores over the BDeu is that tighter bounds of pruning rules are available when we consider efficient BNSL. Finally, using experiments, we compare the performances of the procedures to examine the claim. (This paper is for review and gives a unified viewpoint from the recent progress on the topic.)
研究了基于数据的贝叶斯网络结构学习问题。特别是,我们专注于基于分数的方法,而不是基于约束的方法,并解决我们应该使用什么分数来达到目的。贝叶斯狄利克雷等效均匀性(BDeu)主要在bn社区内使用(而不是在其外)。我们知道,对于任何模型选择和任何数据,数据越适合模型,模型就越复杂,反之亦然。然而,最近,人们证明BDeu违反了规律性,这意味着它不能平衡这两个因素,尽管它在样本量无限大时工作得令人满意(一致)。此外,我们声称使用正则分数优于BDeu的优点是,当我们考虑有效的BNSL时,可以使用更严格的修剪规则边界。最后,通过实验,我们比较了程序的性能来检验索赔。(本文仅供回顾,并从该主题的最新进展中给出一个统一的观点。)
{"title":"Why BDeu? Regular Bayesian network structure learning with discrete and continuous variables","authors":"J. Suzuki","doi":"10.1002/wics.1554","DOIUrl":"https://doi.org/10.1002/wics.1554","url":null,"abstract":"We consider the problem of Bayesian network structure learning (BNSL) from data. In particular, we focus on the score‐based approach rather than the constraint‐based approach and address what score we should use for the purpose. The Bayesian Dirichlet equivalent uniform (BDeu) has been mainly used within the community of BNs (not outside of it). We know that for any model selection and any data, the fitter the data to a model, the more complex the model, and vice versa. However, recently, it was proven that BDeu violates regularity, which means that it does not balance the two factors, although it works satisfactorily (consistently) when the sample size is infinitely large. In addition, we claim that the merit of using the regular scores over the BDeu is that tighter bounds of pruning rules are available when we consider efficient BNSL. Finally, using experiments, we compare the performances of the procedures to examine the claim. (This paper is for review and gives a unified viewpoint from the recent progress on the topic.)","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1554","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48140309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Challenges and opportunities beyond structured data in analysis of electronic health records 电子健康记录分析中结构化数据之外的挑战和机遇
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-02-14 DOI: 10.1002/wics.1549
Maryam Tayefi, Phuong D. Ngo, T. Chomutare, H. Dalianis, Elisa Salvi, A. Budrionis, F. Godtliebsen
Electronic health records (EHR) contain a lot of valuable information about individual patients and the whole population. Besides structured data, unstructured data in EHRs can provide extra, valuable information but the analytics processes are complex, time‐consuming, and often require excessive manual effort. Among unstructured data, clinical text and images are the two most popular and important sources of information. Advanced statistical algorithms in natural language processing, machine learning, deep learning, and radiomics have increasingly been used for analyzing clinical text and images. Although there exist many challenges that have not been fully addressed, which can hinder the use of unstructured data, there are clear opportunities for well‐designed diagnosis and decision support tools that efficiently incorporate both structured and unstructured data for extracting useful information and provide better outcomes. However, access to clinical data is still very restricted due to data sensitivity and ethical issues. Data quality is also an important challenge in which methods for improving data completeness, conformity and plausibility are needed. Further, generalizing and explaining the result of machine learning models are important problems for healthcare, and these are open challenges. A possible solution to improve data quality and accessibility of unstructured data is developing machine learning methods that can generate clinically relevant synthetic data, and accelerating further research on privacy preserving techniques such as deidentification and pseudonymization of clinical text.
电子健康记录(EHR)包含了关于个体患者和整个人群的大量有价值的信息。除了结构化数据,电子病历中的非结构化数据可以提供额外的、有价值的信息,但分析过程复杂、耗时,而且通常需要大量的人工工作。在非结构化数据中,临床文本和图像是两种最流行和最重要的信息来源。自然语言处理、机器学习、深度学习和放射组学中的高级统计算法已越来越多地用于分析临床文本和图像。尽管存在许多尚未完全解决的挑战,这可能会阻碍非结构化数据的使用,但设计良好的诊断和决策支持工具显然有机会有效地将结构化和非结构化数据结合起来,以提取有用的信息并提供更好的结果。然而,由于数据敏感性和伦理问题,对临床数据的访问仍然非常有限。数据质量也是一个重要的挑战,需要提高数据完整性、一致性和可信性的方法。此外,概括和解释机器学习模型的结果是医疗保健的重要问题,这些都是开放的挑战。提高数据质量和非结构化数据可访问性的一个可能解决方案是开发能够生成临床相关合成数据的机器学习方法,并加速对隐私保护技术的进一步研究,如临床文本的去识别和假名化。
{"title":"Challenges and opportunities beyond structured data in analysis of electronic health records","authors":"Maryam Tayefi, Phuong D. Ngo, T. Chomutare, H. Dalianis, Elisa Salvi, A. Budrionis, F. Godtliebsen","doi":"10.1002/wics.1549","DOIUrl":"https://doi.org/10.1002/wics.1549","url":null,"abstract":"Electronic health records (EHR) contain a lot of valuable information about individual patients and the whole population. Besides structured data, unstructured data in EHRs can provide extra, valuable information but the analytics processes are complex, time‐consuming, and often require excessive manual effort. Among unstructured data, clinical text and images are the two most popular and important sources of information. Advanced statistical algorithms in natural language processing, machine learning, deep learning, and radiomics have increasingly been used for analyzing clinical text and images. Although there exist many challenges that have not been fully addressed, which can hinder the use of unstructured data, there are clear opportunities for well‐designed diagnosis and decision support tools that efficiently incorporate both structured and unstructured data for extracting useful information and provide better outcomes. However, access to clinical data is still very restricted due to data sensitivity and ethical issues. Data quality is also an important challenge in which methods for improving data completeness, conformity and plausibility are needed. Further, generalizing and explaining the result of machine learning models are important problems for healthcare, and these are open challenges. A possible solution to improve data quality and accessibility of unstructured data is developing machine learning methods that can generate clinically relevant synthetic data, and accelerating further research on privacy preserving techniques such as deidentification and pseudonymization of clinical text.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1549","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42824997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Issue Information 问题信息
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-02-05 DOI: 10.1002/wics.1518
{"title":"Issue Information","authors":"","doi":"10.1002/wics.1518","DOIUrl":"https://doi.org/10.1002/wics.1518","url":null,"abstract":"","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45329921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An introduction to persistent homology for time series 时间序列的持久同源性引论
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-02-04 DOI: 10.1002/wics.1548
N. Ravishanker, Renjie Chen
Topological data analysis (TDA) uses information from topological structures in complex data for statistical analysis and learning. This paper discusses persistent homology, a part of computational (algorithmic) topology that converts data into simplicial complexes and elicits information about the persistence of homology classes in the data. It computes and outputs the birth and death of such topologies via a persistence diagram. Data inputs for persistent homology are usually represented as point clouds or as functions, while the outputs depend on the nature of the analysis and commonly consist of either a persistence diagram, or persistence landscapes. This paper gives an introductory level tutorial on computing these summaries for time series using R, followed by an overview on using these approaches for time series classification and clustering.
拓扑数据分析(TDA)使用复杂数据中拓扑结构的信息进行统计分析和学习。本文讨论了持久同调,这是计算(算法)拓扑的一部分,它将数据转换为单纯复形,并引出关于数据中同调类的持久性的信息。它通过持久性图来计算和输出这种拓扑的诞生和死亡。持久同源性的数据输入通常表示为点云或函数,而输出取决于分析的性质,通常由持久性图或持久性景观组成。本文提供了关于使用R计算时间序列的这些摘要的入门级教程,然后概述了使用这些方法进行时间序列分类和聚类。
{"title":"An introduction to persistent homology for time series","authors":"N. Ravishanker, Renjie Chen","doi":"10.1002/wics.1548","DOIUrl":"https://doi.org/10.1002/wics.1548","url":null,"abstract":"Topological data analysis (TDA) uses information from topological structures in complex data for statistical analysis and learning. This paper discusses persistent homology, a part of computational (algorithmic) topology that converts data into simplicial complexes and elicits information about the persistence of homology classes in the data. It computes and outputs the birth and death of such topologies via a persistence diagram. Data inputs for persistent homology are usually represented as point clouds or as functions, while the outputs depend on the nature of the analysis and commonly consist of either a persistence diagram, or persistence landscapes. This paper gives an introductory level tutorial on computing these summaries for time series using R, followed by an overview on using these approaches for time series classification and clustering.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46586723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Detecting clusters in multivariate response regression 多元响应回归中的聚类检测
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-02-03 DOI: 10.1002/wics.1551
Bradley S. Price, Corban Allenbrand, Ben Sherwood
Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.
多元回归也可以作为一个多任务机器学习问题,用于更好地理解基于给定输入集的多个输出。关于如何利用关于反应的共享信息,已经提出了许多方法,这些方法在经济学、基因组学、先进制造业和精准医学等领域都有应用。对这些领域的兴趣,加上大型数据集(“大数据”)的兴起,产生了人们对如何提高计算效率的兴趣,同时也产生了对开发解释响应之间可能存在的异质性的方法的兴趣。利用反应之间这种异质性的一种方法是使用检测相关反应的组(也称为集群)的方法。这些方法提供了一个可以提高计算速度并考虑大量响应关系复杂性的框架。这种灵活性带来了额外的挑战,如如何识别这些响应集群、模型选择,以及开发更复杂的算法,将监督和非监督学习文献中的概念结合起来。我们探索了当前最先进的方法,提出了一个框架来更好地理解利用或检测响应集群的方法,并提供了与该框架相关的计算挑战的见解。具体而言,我们提出了一项模拟研究,讨论了在检测感兴趣的响应集群时模型选择的挑战。我们还评论了研究和从业者群体感兴趣的扩展和开放问题。
{"title":"Detecting clusters in multivariate response regression","authors":"Bradley S. Price, Corban Allenbrand, Ben Sherwood","doi":"10.1002/wics.1551","DOIUrl":"https://doi.org/10.1002/wics.1551","url":null,"abstract":"Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48751204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
From object detection to text detection and recognition: A brief evolution history of optical character recognition 从物体检测到文本检测与识别:光学字符识别的发展简史
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-01-25 DOI: 10.1002/wics.1547
Haifeng Wang, Chang Pan, Xiao Guo, Chun Ji, Ke Deng
Text detection and recognition, which is also known as optical character recognition (OCR), is an active research area under quick development with a lot of exciting applications. Deep‐learning‐based methods represent the state‐of‐art of this area. However, these methods are largely deterministic: they give a deterministic output for each input. For both statisticians and general users, methods supporting uncertainty inference are of great appeal, leaving rich research opportunities to incorporate statistical models and methods with the established deep‐learning‐based approaches. In this paper, we provide a comprehensive review of the evolution history of research development on OCR with discussions on the statistical insights behind these developments and potential directions to enhance the current methods with statistical approaches. We hope this article can serve as a useful guidebook for statisticians who are seeking for a path toward edge‐cutting research in this exciting area.
文本检测与识别,也称为光学字符识别(OCR),是一个发展迅速的活跃研究领域,有许多令人兴奋的应用。基于深度学习的方法代表了这一领域的最新技术。然而,这些方法在很大程度上是确定性的:它们为每个输入提供确定性的输出。对于统计学家和一般用户来说,支持不确定性推理的方法具有很大的吸引力,为将统计模型和方法与已建立的基于深度学习的方法相结合留下了丰富的研究机会。在本文中,我们全面回顾了OCR研究发展的演变历史,讨论了这些发展背后的统计见解以及用统计方法增强当前方法的潜在方向。我们希望这篇文章可以作为一个有用的指南,为统计学家谁正在寻求一个路径走向前沿的研究在这个令人兴奋的领域。
{"title":"From object detection to text detection and recognition: A brief evolution history of optical character recognition","authors":"Haifeng Wang, Chang Pan, Xiao Guo, Chun Ji, Ke Deng","doi":"10.1002/wics.1547","DOIUrl":"https://doi.org/10.1002/wics.1547","url":null,"abstract":"Text detection and recognition, which is also known as optical character recognition (OCR), is an active research area under quick development with a lot of exciting applications. Deep‐learning‐based methods represent the state‐of‐art of this area. However, these methods are largely deterministic: they give a deterministic output for each input. For both statisticians and general users, methods supporting uncertainty inference are of great appeal, leaving rich research opportunities to incorporate statistical models and methods with the established deep‐learning‐based approaches. In this paper, we provide a comprehensive review of the evolution history of research development on OCR with discussions on the statistical insights behind these developments and potential directions to enhance the current methods with statistical approaches. We hope this article can serve as a useful guidebook for statisticians who are seeking for a path toward edge‐cutting research in this exciting area.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1547","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46185325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Ordinal regression: A review and a taxonomy of models 有序回归:模型的回顾和分类
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-01-11 DOI: 10.1002/wics.1545
G. Tutz
Ordinal models can be seen as being composed from simpler, in particular binary models. This view on ordinal models allows to derive a taxonomy of models that includes basic ordinal regression models, models with more complex parameterizations, the class of hierarchically structured models, and the more recently developed finite mixture models. The structured overview that is given covers existing models and shows how models can be extended to account for further effects of explanatory variables. Particular attention is given to the modeling of additional heterogeneity as, for example, dispersion effects. The modeling is embedded into the framework of response styles and the exact meaning of heterogeneity terms in ordinal models is investigated. It is shown that the meaning of terms is crucially determined by the type of model that is used. Moreover, it is demonstrated how models with a complex category‐specific effect structure can be simplified to obtain simpler models that fit sufficiently well. The fitting of models is illustrated by use of a real data set, and a short overview of existing software is given.
有序模型可以看作是由更简单的模型,特别是二元模型组成的。这种关于有序模型的观点允许导出模型的分类,其中包括基本的有序回归模型、具有更复杂参数化的模型、层次结构模型类以及最近开发的有限混合模型。给出的结构化概述涵盖了现有模型,并展示了如何扩展模型以解释解释变量的进一步影响。特别注意额外的非均质性的建模,例如,色散效应。将模型嵌入到响应样式的框架中,并研究了有序模型中异质性项的确切含义。结果表明,术语的含义关键取决于所使用的模型类型。此外,还演示了如何简化具有复杂类别特定效应结构的模型,以获得足够好拟合的更简单的模型。用一个实际数据集来说明模型的拟合,并对现有软件进行了简要概述。
{"title":"Ordinal regression: A review and a taxonomy of models","authors":"G. Tutz","doi":"10.1002/wics.1545","DOIUrl":"https://doi.org/10.1002/wics.1545","url":null,"abstract":"Ordinal models can be seen as being composed from simpler, in particular binary models. This view on ordinal models allows to derive a taxonomy of models that includes basic ordinal regression models, models with more complex parameterizations, the class of hierarchically structured models, and the more recently developed finite mixture models. The structured overview that is given covers existing models and shows how models can be extended to account for further effects of explanatory variables. Particular attention is given to the modeling of additional heterogeneity as, for example, dispersion effects. The modeling is embedded into the framework of response styles and the exact meaning of heterogeneity terms in ordinal models is investigated. It is shown that the meaning of terms is crucially determined by the type of model that is used. Moreover, it is demonstrated how models with a complex category‐specific effect structure can be simplified to obtain simpler models that fit sufficiently well. The fitting of models is illustrated by use of a real data set, and a short overview of existing software is given.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1545","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48668266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Improving the Gibbs sampler 吉布斯采样器的改进
IF 3.2 2区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-01-07 DOI: 10.1002/wics.1546
Taeyoung Park, Seunghan Lee
The Gibbs sampler is a simple but very powerful algorithm used to simulate from a complex high‐dimensional distribution. It is particularly useful in Bayesian analysis when a complex Bayesian model involves a number of model parameters and the conditional posterior distribution of each component given the others can be derived as a standard distribution. In the presence of a strong correlation structure among components, however, the Gibbs sampler can be criticized for its slow convergence. Here we discuss several algorithmic strategies such as blocking, collapsing, and partial collapsing that are available for improving the convergence characteristics of the Gibbs sampler.
吉布斯采样器是一种简单但功能强大的算法,用于模拟复杂的高维分布。它在贝叶斯分析中特别有用,当一个复杂的贝叶斯模型涉及许多模型参数,并且在给定其他成分的情况下,每个成分的条件后验分布可以导出为标准分布。然而,在组分之间存在强相关结构的情况下,吉布斯采样器可能会因其缓慢收敛而受到批评。在这里,我们讨论了几种算法策略,如阻塞,坍缩和部分坍缩,可用于改善吉布斯采样器的收敛特性。
{"title":"Improving the Gibbs sampler","authors":"Taeyoung Park, Seunghan Lee","doi":"10.1002/wics.1546","DOIUrl":"https://doi.org/10.1002/wics.1546","url":null,"abstract":"The Gibbs sampler is a simple but very powerful algorithm used to simulate from a complex high‐dimensional distribution. It is particularly useful in Bayesian analysis when a complex Bayesian model involves a number of model parameters and the conditional posterior distribution of each component given the others can be derived as a standard distribution. In the presence of a strong correlation structure among components, however, the Gibbs sampler can be criticized for its slow convergence. Here we discuss several algorithmic strategies such as blocking, collapsing, and partial collapsing that are available for improving the convergence characteristics of the Gibbs sampler.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42829044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Wiley Interdisciplinary Reviews-Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1