Automated data analysis of unstructured grey literature in health research: A mapping review

IF 6.1 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Research Synthesis Methods Pub Date : 2023-12-19 DOI:10.1002/jrsm.1692

Lena Schmidt, Saleh Mohamed, Nick Meader, Jaume Bacardit, Dawn Craig

{"title":"Automated data analysis of unstructured grey literature in health research: A mapping review","authors":"Lena Schmidt, Saleh Mohamed, Nick Meader, Jaume Bacardit, Dawn Craig","doi":"10.1002/jrsm.1692","DOIUrl":null,"url":null,"abstract":"<p>The amount of grey literature and ‘softer’ intelligence from social media or websites is vast. Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. To our knowledge, this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans, health technology assessments (HTA), evidence maps, or other literature reviews. We searched six databases to cover both health- and computer-science literature. After deduplication, 10% of the search results were screened by two reviewers, the remainder was single-screened up to an estimated 95% sensitivity; screening was stopped early after screening an additional 1000 results with no new includes. All full texts were retrieved, screened, and extracted by a single reviewer and 10% were checked in duplicate. We included 84 papers covering automation for health-related social media, internet fora, news, patents, government agencies and charities, or trial registers. From each paper, we extracted data about important functionalities for users of the tool or method; information about the level of support and reliability; and about practical challenges and research gaps. Poor availability of code, data, and usable tools leads to low transparency regarding performance and duplication of work. Financial implications, scalability, integration into downstream workflows, and meaningful evaluations should be carefully planned before starting to develop a tool, given the vast amounts of data and opportunities those tools offer to expedite research.</p>","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"15 2","pages":"178-197"},"PeriodicalIF":6.1000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jrsm.1692","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1692","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The amount of grey literature and ‘softer’ intelligence from social media or websites is vast. Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. To our knowledge, this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans, health technology assessments (HTA), evidence maps, or other literature reviews. We searched six databases to cover both health- and computer-science literature. After deduplication, 10% of the search results were screened by two reviewers, the remainder was single-screened up to an estimated 95% sensitivity; screening was stopped early after screening an additional 1000 results with no new includes. All full texts were retrieved, screened, and extracted by a single reviewer and 10% were checked in duplicate. We included 84 papers covering automation for health-related social media, internet fora, news, patents, government agencies and charities, or trial registers. From each paper, we extracted data about important functionalities for users of the tool or method; information about the level of support and reliability; and about practical challenges and research gaps. Poor availability of code, data, and usable tools leads to low transparency regarding performance and duplication of work. Financial implications, scalability, integration into downstream workflows, and meaningful evaluations should be carefully planned before starting to develop a tool, given the vast amounts of data and opportunities those tools offer to expedite research.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对健康研究中的非结构化灰色文献进行自动数据分析：绘图审查。

灰色文献和来自社交媒体或网站的 "软 "情报数量巨大。鉴于高质量的同行评审健康信息的制作周期较长，这就需要有新的方法来为二次研究提供及时的输入。据我们所知，这是首次对与健康相关的灰色文献和软数据的自动数据提取方法或工具进行综述，重点关注地平线扫描、健康技术评估 (HTA)、证据图谱或其他文献综述的（半）自动化。我们搜索了六个数据库，涵盖了健康和计算机科学文献。重复数据删除后，10% 的搜索结果由两名审稿人进行筛选，其余的则进行单项筛选，灵敏度估计为 95%；在筛选了另外 1000 个结果且没有新收录后，我们提前停止了筛选。所有全文均由一名审稿人进行检索、筛选和提取，10%的全文进行了重复检查。我们共收录了 84 篇论文，涉及与健康相关的社交媒体、互联网论坛、新闻、专利、政府机构和慈善机构或试验登记册的自动化。我们从每篇论文中提取了有关工具或方法用户重要功能的数据、有关支持水平和可靠性的信息，以及有关实际挑战和研究空白的数据。代码、数据和可用工具的可用性差，导致绩效透明度低和工作重复。鉴于这些工具可提供大量数据和机会以加快研究，因此在开始开发工具之前，应仔细规划其财务影响、可扩展性、与下游工作流程的整合以及有意义的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Research Synthesis Methods MATHEMATICAL & COMPUTATIONAL BIOLOGYMULTID-MULTIDISCIPLINARY SCIENCES

CiteScore

16.90

自引率

3.10%

发文量

期刊介绍： Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines. Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines. By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.