谷歌信息学趋势:避免不可再现的结果和无效结论的方法步骤。

IF 3.7 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Medical Informatics Pub Date : 2024-07-21 DOI:10.1016/j.ijmedinf.2024.105563
Alessandro Rovetta
{"title":"谷歌信息学趋势:避免不可再现的结果和无效结论的方法步骤。","authors":"Alessandro Rovetta","doi":"10.1016/j.ijmedinf.2024.105563","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).</p></div><div><h3>Objective</h3><p>The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.</p></div><div><h3>Material and methods</h3><p>The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.</p></div><div><h3>Results</h3><p>The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: <span><span>[8]</span></span>, <span><span>[13]</span></span>). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.</p></div><div><h3>Conclusions</h3><p>Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.</p></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions\",\"authors\":\"Alessandro Rovetta\",\"doi\":\"10.1016/j.ijmedinf.2024.105563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).</p></div><div><h3>Objective</h3><p>The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.</p></div><div><h3>Material and methods</h3><p>The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.</p></div><div><h3>Results</h3><p>The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: <span><span>[8]</span></span>, <span><span>[13]</span></span>). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.</p></div><div><h3>Conclusions</h3><p>Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.</p></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624002260\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624002260","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景介绍谷歌趋势(Google Trends)是一种广泛用于信息网络学调查的工具。然而,随机抽样和聚合算法中的不规范影响了相对搜索量(RSV)和区域在线兴趣(ROI)的可靠性:本研究旨在揭示通过谷歌趋势开展信息网络学调查时通常会忽略的方法论关键问题。材料与方法:使用不同的时间截图、类别和 IP 地址对谷歌主题 "2019 年冠状病毒疾病 "进行了调查。为评估 RSV 和 ROI 的稳定性,对相同样本进行了多次人工采集。稳定性是通过变异性指标(如百分比变异系数 "CV%"及其4-surprisal interval "4-I")来估算的。通过对 RSV 和 ROI 的定量分析以及对相关查询的定性研究,评估了算法对主题和类别的内容聚合能力:结果:谷歌趋势的 RSV 和 ROI 的稳定性与数据集维度或 IP 地址无关。次区域数据集可能非常不稳定(例如,CV% = 10,4-I:[8,13])。Google Trends 类别和主题可能会排除相关查询或包含不必要的查询。统计情况符合以下假设:i)包含太少查询的数据集非常不稳定;ii)"随时间变化的兴趣 "数据格式对于评估趋势和相关性通常是可靠的;iii)谷歌趋势的改进改变了 RSV 的历史趋势:结论:只要对网络搜索索引的可靠性进行适当的分析和权衡,谷歌趋势可以成为一种有效的信息学工具。本研究中讨论的方法步骤对于得出有效和相关的科学结论至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions

Background

Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).

Objective

The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.

Material and methods

The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.

Results

The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8], [13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.

Conclusions

Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
期刊最新文献
Enhanced NSCLC subtyping and staging through attention-augmented multi-task deep learning: A novel diagnostic tool. Application of the openEHR reference model for PGHD: A case study on the DH-Convener initiative Tracking provenance in clinical data warehouses for quality management Accuracy of machine learning in diagnosing microsatellite instability in gastric cancer: A systematic review and meta-analysis. Acute myocardial infarction risk prediction in emergency chest pain patients: An external validation study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1