MMiDaS-AE:用于生物医学摘要筛选的多模态缺失数据感知堆叠自编码器。

Proceedings of the ACM Conference on Health, Inference, and Learning Pub Date : 2020-04-01 Epub Date: 2020-04-02 DOI:10.1145/3368555.3384463

Eric W Lee, Byron C Wallace, Karla I Galaviz, Joyce C Ho

{"title":"MMiDaS-AE:用于生物医学摘要筛选的多模态缺失数据感知堆叠自编码器。","authors":"Eric W Lee, Byron C Wallace, Karla I Galaviz, Joyce C Ho","doi":"10.1145/3368555.3384463","DOIUrl":null,"url":null,"abstract":"Systematic review (SR) is an essential process to identify, evaluate, and summarize the findings of all relevant individual studies concerning health-related questions. However, conducting a SR is labor-intensive, as identifying relevant studies is a daunting process that entails multiple researchers screening thousands of articles for relevance. In this paper, we propose MMiDaS-AE, a Multi-modal Missing Data aware Stacked Autoencoder, for semi-automating screening for SRs. We use a multi-modal view that exploits three representations, of: 1) documents, 2) topics, and 3) citation networks. Documents that contain similar words will be nearby in the document embedding space. Models can also exploit the relationship between documents and the associated SR MeSH terms to capture article relevancy. Finally, related works will likely share the same citations, and thus closely related articles would, intuitively, be trained to be close to each other in the embedding space. However, using all three learned representations as features directly result in an unwieldy number of parameters. Thus, motivated by recent work on multi-modal auto-encoders, we adopt a multi-modal stacked autoencoder that can learn a shared representation encoding all three representations in a compressed space. However, in practice one or more of these modalities may be missing for an article (e.g., if we cannot recover citation information). Therefore, we propose to learn to impute the shared representation even when specific inputs are missing. We find this new model significantly improves performance on a dataset consisting of 15 SRs compared to existing approaches.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"2020 ","pages":"139-150"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3368555.3384463","citationCount":"6","resultStr":"{\"title\":\"MMiDaS-AE: Multi-modal Missing Data aware Stacked Autoencoder for Biomedical Abstract Screening.\",\"authors\":\"Eric W Lee, Byron C Wallace, Karla I Galaviz, Joyce C Ho\",\"doi\":\"10.1145/3368555.3384463\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Systematic review (SR) is an essential process to identify, evaluate, and summarize the findings of all relevant individual studies concerning health-related questions. However, conducting a SR is labor-intensive, as identifying relevant studies is a daunting process that entails multiple researchers screening thousands of articles for relevance. In this paper, we propose MMiDaS-AE, a Multi-modal Missing Data aware Stacked Autoencoder, for semi-automating screening for SRs. We use a multi-modal view that exploits three representations, of: 1) documents, 2) topics, and 3) citation networks. Documents that contain similar words will be nearby in the document embedding space. Models can also exploit the relationship between documents and the associated SR MeSH terms to capture article relevancy. Finally, related works will likely share the same citations, and thus closely related articles would, intuitively, be trained to be close to each other in the embedding space. However, using all three learned representations as features directly result in an unwieldy number of parameters. Thus, motivated by recent work on multi-modal auto-encoders, we adopt a multi-modal stacked autoencoder that can learn a shared representation encoding all three representations in a compressed space. However, in practice one or more of these modalities may be missing for an article (e.g., if we cannot recover citation information). Therefore, we propose to learn to impute the shared representation even when specific inputs are missing. We find this new model significantly improves performance on a dataset consisting of 15 SRs compared to existing approaches.\",\"PeriodicalId\":87342,\"journal\":{\"name\":\"Proceedings of the ACM Conference on Health, Inference, and Learning\",\"volume\":\"2020 \",\"pages\":\"139-150\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/3368555.3384463\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Conference on Health, Inference, and Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3368555.3384463\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/4/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Conference on Health, Inference, and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368555.3384463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/4/2 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

系统综述（SR）是识别、评估和总结所有与健康相关问题的相关个体研究结果的重要过程。然而，进行SR是劳动密集型的，因为识别相关研究是一个艰巨的过程，需要多名研究人员筛选数千篇文章的相关性。在本文中，我们提出了MMiDaS AE，一种多模态缺失数据感知堆叠自动编码器，用于SR的半自动筛选。我们使用了一个多模态视图，它利用了以下三种表示：1）文档，2）主题和3）引用网络。包含相似单词的文档将位于文档嵌入空间的附近。模型还可以利用文档和相关联的SR-MeSH术语之间的关系来捕获文章相关性。最后，相关作品可能会共享相同的引文，因此，直观地说，密切相关的文章会被训练成在嵌入空间中彼此接近。然而，使用所有三种学习的表示作为特征直接导致参数数量的笨拙。因此，受最近关于多模态自动编码器的工作的启发，我们采用了一种多模态堆叠自动编码器，它可以学习在压缩空间中对所有三种表示进行编码的共享表示。然而，在实践中，一篇文章可能缺少其中一种或多种模式（例如，如果我们无法恢复引用信息）。因此，我们建议即使在缺少特定输入的情况下，也要学会估算共享表示。我们发现，与现有方法相比，这种新模型显著提高了由15个SR组成的数据集的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MMiDaS-AE: Multi-modal Missing Data aware Stacked Autoencoder for Biomedical Abstract Screening.

Systematic review (SR) is an essential process to identify, evaluate, and summarize the findings of all relevant individual studies concerning health-related questions. However, conducting a SR is labor-intensive, as identifying relevant studies is a daunting process that entails multiple researchers screening thousands of articles for relevance. In this paper, we propose MMiDaS-AE, a Multi-modal Missing Data aware Stacked Autoencoder, for semi-automating screening for SRs. We use a multi-modal view that exploits three representations, of: 1) documents, 2) topics, and 3) citation networks. Documents that contain similar words will be nearby in the document embedding space. Models can also exploit the relationship between documents and the associated SR MeSH terms to capture article relevancy. Finally, related works will likely share the same citations, and thus closely related articles would, intuitively, be trained to be close to each other in the embedding space. However, using all three learned representations as features directly result in an unwieldy number of parameters. Thus, motivated by recent work on multi-modal auto-encoders, we adopt a multi-modal stacked autoencoder that can learn a shared representation encoding all three representations in a compressed space. However, in practice one or more of these modalities may be missing for an article (e.g., if we cannot recover citation information). Therefore, we propose to learn to impute the shared representation even when specific inputs are missing. We find this new model significantly improves performance on a dataset consisting of 15 SRs compared to existing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM Conference on Health, Inference, and Learning

自引率

0.00%

发文量

期刊最新文献

Explaining a machine learning decision to physicians via counterfactuals Rare Life Event Detection via Mobile Sensing Using Multi-Task Learning PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis Large-Scale Study of Temporal Shift in Health Insurance Claims Token Imbalance Adaptation for Radiology Report Generation