生物膜文档分类的多模态深度学习搜索引擎框架

Pei-Chi Huang, Ejan Shakya, Myoungkyu Song, M. Subramaniam
{"title":"生物膜文档分类的多模态深度学习搜索引擎框架","authors":"Pei-Chi Huang, Ejan Shakya, Myoungkyu Song, M. Subramaniam","doi":"10.1109/BIBM55620.2022.9994867","DOIUrl":null,"url":null,"abstract":"As biofilms research grows rapidly, a corpus of bibliographic literature (i.e., documents) is increasing at an incredible rate. Many researchers often need to inspect these large document collections, including (1) text, (2) images, and (3) captions, to understand underlying biological mechanisms and make a critical decision. However, researchers have great difficulty in exploring such ever-growing large datasets in labor-intensive processes. Thus, automation of such tasks is urgently required for the automatic identification or classification of a large volume of document collections. To address this problem, we present a multimodal deep learning-based approach to automatically classify documents for a specialized information retrieval technique based on biofilm images, captions, and texts, which is a major source of information for the classification of documents. Images, captions, and texts from biofilm documents are represented in a large vector space. Then, they are fed into convolutional neural networks (CNNs), to improve similarity matching and relevance. Our extensive experiments and analysis will take captions, texts, or images as unimodal models as inputs and concatenate them all into multimodal models. The trained models for this classification approach in turn help a search engine to precisely identify relevant and domain-specific documents from a large volume of document collections for further research direction in biofilm development.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BioMDSE: A Multimodal Deep Learning-Based Search Engine Framework for Biofilm Documents Classifications\",\"authors\":\"Pei-Chi Huang, Ejan Shakya, Myoungkyu Song, M. Subramaniam\",\"doi\":\"10.1109/BIBM55620.2022.9994867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As biofilms research grows rapidly, a corpus of bibliographic literature (i.e., documents) is increasing at an incredible rate. Many researchers often need to inspect these large document collections, including (1) text, (2) images, and (3) captions, to understand underlying biological mechanisms and make a critical decision. However, researchers have great difficulty in exploring such ever-growing large datasets in labor-intensive processes. Thus, automation of such tasks is urgently required for the automatic identification or classification of a large volume of document collections. To address this problem, we present a multimodal deep learning-based approach to automatically classify documents for a specialized information retrieval technique based on biofilm images, captions, and texts, which is a major source of information for the classification of documents. Images, captions, and texts from biofilm documents are represented in a large vector space. Then, they are fed into convolutional neural networks (CNNs), to improve similarity matching and relevance. Our extensive experiments and analysis will take captions, texts, or images as unimodal models as inputs and concatenate them all into multimodal models. The trained models for this classification approach in turn help a search engine to precisely identify relevant and domain-specific documents from a large volume of document collections for further research direction in biofilm development.\",\"PeriodicalId\":210337,\"journal\":{\"name\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM55620.2022.9994867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9994867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着生物膜研究的迅速发展,书目文献(即文献)的语料库正以令人难以置信的速度增长。许多研究人员经常需要检查这些大型文档集合,包括(1)文本,(2)图像和(3)标题,以了解潜在的生物学机制并做出关键决策。然而,研究人员很难在劳动密集型的过程中探索这些不断增长的大型数据集。因此,迫切需要将这些任务自动化,以便对大量文档集合进行自动识别或分类。为了解决这个问题,我们提出了一种基于多模态深度学习的方法,用于基于生物膜图像、字幕和文本的专业信息检索技术的文档自动分类,这是文档分类的主要信息来源。来自生物膜文档的图像、字幕和文本在一个大的向量空间中表示。然后,将它们输入卷积神经网络(cnn),以提高相似性匹配和相关性。我们广泛的实验和分析将把字幕、文本或图像作为单模态模型作为输入,并将它们全部连接到多模态模型中。这种分类方法的训练模型反过来帮助搜索引擎从大量的文档集合中精确地识别相关的和特定领域的文档,从而为生物膜开发的进一步研究方向提供帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BioMDSE: A Multimodal Deep Learning-Based Search Engine Framework for Biofilm Documents Classifications
As biofilms research grows rapidly, a corpus of bibliographic literature (i.e., documents) is increasing at an incredible rate. Many researchers often need to inspect these large document collections, including (1) text, (2) images, and (3) captions, to understand underlying biological mechanisms and make a critical decision. However, researchers have great difficulty in exploring such ever-growing large datasets in labor-intensive processes. Thus, automation of such tasks is urgently required for the automatic identification or classification of a large volume of document collections. To address this problem, we present a multimodal deep learning-based approach to automatically classify documents for a specialized information retrieval technique based on biofilm images, captions, and texts, which is a major source of information for the classification of documents. Images, captions, and texts from biofilm documents are represented in a large vector space. Then, they are fed into convolutional neural networks (CNNs), to improve similarity matching and relevance. Our extensive experiments and analysis will take captions, texts, or images as unimodal models as inputs and concatenate them all into multimodal models. The trained models for this classification approach in turn help a search engine to precisely identify relevant and domain-specific documents from a large volume of document collections for further research direction in biofilm development.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A framework for associating structural variants with cell-specific transcription factors and histone modifications in defect phenotypes Secure Password Using EEG-based BrainPrint System: Unlock Smartphone Password Using Brain-Computer Interface Technology On functional annotation with gene co-expression networks ST-ChIP: Accurate prediction of spatiotemporal ChIP-seq data with recurrent neural networks Discovering the Knowledge in Unstructured Early Drug Development Data Using NLP and Advanced Analytics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1