{"title":"谷歌信息学趋势:避免不可再现的结果和无效结论的方法步骤。","authors":"Alessandro Rovetta","doi":"10.1016/j.ijmedinf.2024.105563","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).</p></div><div><h3>Objective</h3><p>The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.</p></div><div><h3>Material and methods</h3><p>The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.</p></div><div><h3>Results</h3><p>The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: <span><span>[8]</span></span>, <span><span>[13]</span></span>). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.</p></div><div><h3>Conclusions</h3><p>Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.</p></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions\",\"authors\":\"Alessandro Rovetta\",\"doi\":\"10.1016/j.ijmedinf.2024.105563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).</p></div><div><h3>Objective</h3><p>The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.</p></div><div><h3>Material and methods</h3><p>The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.</p></div><div><h3>Results</h3><p>The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: <span><span>[8]</span></span>, <span><span>[13]</span></span>). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.</p></div><div><h3>Conclusions</h3><p>Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.</p></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624002260\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624002260","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
背景介绍谷歌趋势(Google Trends)是一种广泛用于信息网络学调查的工具。然而,随机抽样和聚合算法中的不规范影响了相对搜索量(RSV)和区域在线兴趣(ROI)的可靠性:本研究旨在揭示通过谷歌趋势开展信息网络学调查时通常会忽略的方法论关键问题。材料与方法:使用不同的时间截图、类别和 IP 地址对谷歌主题 "2019 年冠状病毒疾病 "进行了调查。为评估 RSV 和 ROI 的稳定性,对相同样本进行了多次人工采集。稳定性是通过变异性指标(如百分比变异系数 "CV%"及其4-surprisal interval "4-I")来估算的。通过对 RSV 和 ROI 的定量分析以及对相关查询的定性研究,评估了算法对主题和类别的内容聚合能力:结果:谷歌趋势的 RSV 和 ROI 的稳定性与数据集维度或 IP 地址无关。次区域数据集可能非常不稳定(例如,CV% = 10,4-I:[8,13])。Google Trends 类别和主题可能会排除相关查询或包含不必要的查询。统计情况符合以下假设:i)包含太少查询的数据集非常不稳定;ii)"随时间变化的兴趣 "数据格式对于评估趋势和相关性通常是可靠的;iii)谷歌趋势的改进改变了 RSV 的历史趋势:结论:只要对网络搜索索引的可靠性进行适当的分析和权衡,谷歌趋势可以成为一种有效的信息学工具。本研究中讨论的方法步骤对于得出有效和相关的科学结论至关重要。
Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions
Background
Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).
Objective
The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.
Material and methods
The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.
Results
The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8], [13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.
Conclusions
Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.