Handbook of Multiple Comparisons

The American Statistician Pub Date : 2023-04-03 DOI:10.1080/00031305.2023.2198355

Junyong Park

{"title":"Handbook of Multiple Comparisons","authors":"Junyong Park","doi":"10.1080/00031305.2023.2198355","DOIUrl":null,"url":null,"abstract":"Finite population sampling has found numerous applications in the past century. Validity inference of real populations is possible based on known sampling probabilities, “irrespectively of the unknown properties of the target population studied” (Neyman, 1934). Graphs allow one to incorporate the connections among the population units in addition. Many socio-economic, biological, spatial, or technological phenomena exhibit an underlying graph structure that may be the central interest of study, or the edges may effectively provide access to those nodes that are the primary targets. Either way, graph sampling provides a universally valid approach to studying realvalued graphs. This book establishes a rigorous conceptual framework for graph sampling and gives a unified presentation of much of the existing theory and methods, including several of the most recent developments. The most central concepts are introduced in Chapter 1, such as graph totals and parameters as targets of estimation, observation procedures following an initial sample of nodes that drive graph sampling, sample graph in which different kinds of induced subgraphs (such as edge, triangle, 4circle, K-star) can be observed, and graph sampling strategy consisting of a sampling method and an associated estimator. Chapters 2–4 introduce strategies based on bipartite graph sampling and incidence weighting estimator, which encompass all the existing unconventional finite population sampling methods, including indirect, network, adaptive cluster, or line intercept sampling. This can help to raise awareness of these methods, allowing them to be more effectively studied and applied as cases of graph sampling. For instance, Chapter 4 considers how to apply adaptive network sampling in a situation like the covid outbreak, which allows one to combat the virus spread by testtrace and to estimate the prevalence at the same time, provided the necessary elements of probability design and observation procedure are implemented. Chapters 5 and 6 deal with snowball sampling and targeted random walk sampling, respectively, which can be regarded as probabilistic breath-first or depth-first non-exhaustive search methods in graphs. Novel approaches to sampling strategies are developed and illustrated, such as how to account for the fact that an observed triangle could have been observed in many other ways that remain hidden from the realized sample graph, or how to estimate a parameter related to certain finiteorder subgraphs (such as a triangle) based on a random walk in the graph. The Bibliographic Notes at the end of each chapter contain some reflections on sources of inspiration, motivations for chosen approaches, and topics for future development. I found that the contents of the book are highly innovative and useful. The indirect sampling of Lavillee (2007) can be viewed as a special case of graph sampling. The materials in adaptive cluster sampling should be very useful in many real-world sampling problems. Some materials are not yet published elsewhere. Modern sampling research topics such as respondent-driven sampling or reinforcement learning can be viewed as a graph sampling problem. In this sense, graph sampling can be the future of sampling. However, the explanations in the book are somewhat concise. More examples and contexts will help us understand the concepts. Also, the design-based framework assumes that the conditional inclusion probabilities are known in advance. It would be great if the author could cover the situations where these inclusion probabilities are estimated rather than known. Also, a chapter on real applications would help the readers understand the materials better. I hope these contents are covered in the second edition of the book. Anyway, there is a lot to explore in this area, and the book can be a good guide for the tour of graph sampling. I plan to use the book as a reference in my course on advanced survey sampling at Iowa State University.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The American Statistician","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/00031305.2023.2198355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Finite population sampling has found numerous applications in the past century. Validity inference of real populations is possible based on known sampling probabilities, “irrespectively of the unknown properties of the target population studied” (Neyman, 1934). Graphs allow one to incorporate the connections among the population units in addition. Many socio-economic, biological, spatial, or technological phenomena exhibit an underlying graph structure that may be the central interest of study, or the edges may effectively provide access to those nodes that are the primary targets. Either way, graph sampling provides a universally valid approach to studying realvalued graphs. This book establishes a rigorous conceptual framework for graph sampling and gives a unified presentation of much of the existing theory and methods, including several of the most recent developments. The most central concepts are introduced in Chapter 1, such as graph totals and parameters as targets of estimation, observation procedures following an initial sample of nodes that drive graph sampling, sample graph in which different kinds of induced subgraphs (such as edge, triangle, 4circle, K-star) can be observed, and graph sampling strategy consisting of a sampling method and an associated estimator. Chapters 2–4 introduce strategies based on bipartite graph sampling and incidence weighting estimator, which encompass all the existing unconventional finite population sampling methods, including indirect, network, adaptive cluster, or line intercept sampling. This can help to raise awareness of these methods, allowing them to be more effectively studied and applied as cases of graph sampling. For instance, Chapter 4 considers how to apply adaptive network sampling in a situation like the covid outbreak, which allows one to combat the virus spread by testtrace and to estimate the prevalence at the same time, provided the necessary elements of probability design and observation procedure are implemented. Chapters 5 and 6 deal with snowball sampling and targeted random walk sampling, respectively, which can be regarded as probabilistic breath-first or depth-first non-exhaustive search methods in graphs. Novel approaches to sampling strategies are developed and illustrated, such as how to account for the fact that an observed triangle could have been observed in many other ways that remain hidden from the realized sample graph, or how to estimate a parameter related to certain finiteorder subgraphs (such as a triangle) based on a random walk in the graph. The Bibliographic Notes at the end of each chapter contain some reflections on sources of inspiration, motivations for chosen approaches, and topics for future development. I found that the contents of the book are highly innovative and useful. The indirect sampling of Lavillee (2007) can be viewed as a special case of graph sampling. The materials in adaptive cluster sampling should be very useful in many real-world sampling problems. Some materials are not yet published elsewhere. Modern sampling research topics such as respondent-driven sampling or reinforcement learning can be viewed as a graph sampling problem. In this sense, graph sampling can be the future of sampling. However, the explanations in the book are somewhat concise. More examples and contexts will help us understand the concepts. Also, the design-based framework assumes that the conditional inclusion probabilities are known in advance. It would be great if the author could cover the situations where these inclusion probabilities are estimated rather than known. Also, a chapter on real applications would help the readers understand the materials better. I hope these contents are covered in the second edition of the book. Anyway, there is a lot to explore in this area, and the book can be a good guide for the tour of graph sampling. I plan to use the book as a reference in my course on advanced survey sampling at Iowa State University.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多重比较手册

有限总体抽样在过去的一个世纪里得到了许多应用。根据已知的抽样概率，“与所研究的目标群体的未知特性无关”，可以对真实群体进行有效性推断(Neyman, 1934)。图表还允许人们将人口单位之间的联系结合起来。许多社会经济、生物、空间或技术现象表现出一种潜在的图结构，这可能是研究的中心兴趣，或者边缘可能有效地提供通往那些节点的通道，这些节点是主要目标。无论哪种方式，图采样都提供了一种普遍有效的方法来研究重估图。本书为图采样建立了一个严格的概念框架，并给出了许多现有理论和方法的统一介绍，包括一些最新的发展。在第1章中介绍了最核心的概念，例如作为估计目标的图总数和参数，驱动图采样的节点初始样本后的观察过程，可以观察到不同类型的诱导子图(如边，三角形，4circle, K-star)的样本图，以及由采样方法和相关估计器组成的图采样策略。第2-4章介绍了基于二部图采样和关联加权估计的策略，其中包括所有现有的非常规有限总体采样方法，包括间接采样，网络采样，自适应聚类采样或线截采样。这有助于提高对这些方法的认识，使它们能够更有效地作为图采样的案例进行研究和应用。例如，第4章考虑了如何在像covid爆发这样的情况下应用自适应网络抽样，它允许人们通过测试跟踪来对抗病毒传播，同时估计患病率，提供了概率设计和观察程序的必要元素。第5章和第6章分别讨论了雪球抽样和目标随机漫步抽样，它们可以看作是概率呼吸优先或深度优先的图中的非穷举搜索方法。开发并说明了采样策略的新方法，例如如何解释观察到的三角形可以以许多其他方式观察到的事实，这些方式仍然隐藏在实现的样本图中，或者如何基于图中的随机游走估计与某些有限阶子图(如三角形)相关的参数。每章末尾的参考书目注释包含对灵感来源、选择方法的动机和未来发展主题的一些反思。我发现这本书的内容很有创意，也很有用。Lavillee(2007)的间接抽样可以看作是图抽样的一个特例。自适应聚类抽样中的资料在许多实际的抽样问题中应该是非常有用的。有些材料尚未在其他地方出版。现代抽样研究课题，如受访者驱动抽样或强化学习，可以被视为一个图抽样问题。从这个意义上说，图采样可以成为采样的未来。然而，书中的解释有些简洁。更多的例子和上下文将帮助我们理解这些概念。此外，基于设计的框架假定条件包含概率是事先已知的。如果作者能够涵盖这些包含概率是估计而不是已知的情况，那就太好了。此外，关于实际应用的章节将帮助读者更好地理解材料。我希望这本书的第二版中包含这些内容。无论如何，在这个领域还有很多需要探索的地方，这本书可以成为图采样之旅的一个很好的指南。我打算把这本书作为我在爱荷华州立大学高级调查抽样课程的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The American Statistician

自引率

0.00%

发文量