首页 > 最新文献

Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering最新文献

英文 中文
Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. 基于变压器的从头肽测序,用于数据独立采集质谱。
Pub Date : 2023-12-01 Epub Date: 2024-02-19 DOI: 10.1109/bibe60311.2023.00013
Shiva Ebrahimi, Xuan Guo

Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.

串联质谱(MS/MS)是全面分析生物样本中蛋白质含量的主要高通量技术。这种方法是推动蛋白质组学发展的基石。近年来,数据独立获取(DIA)策略取得了长足进步,促进了前体离子的公正和非靶向碎裂。由于其固有的高复用性,DIA 生成的 MS/MS 图谱是一个巨大的障碍。每个谱图都包含来自多个前体肽的碎片产物离子。这种复杂性给肽/蛋白质的从头测序带来了特别严峻的挑战,而目前的测序方法还不足以解决多路复用的难题。在本文中,我们介绍了基于变压器架构的深度学习模型 Casanovo-DIA。它能从 DIA 质谱数据中解读肽序列。我们的研究结果表明,与现有的 STOA 方法(包括 DeepNovo-DIA 和 PepNet)相比,Casanovo-DIA 有了明显的改进。在氨基酸水平上,Casano-DIA 的精确度提高了 15.14% 至 34.8%,召回率提高了 11.62% 至 31.94%,在肽水平上,精确度提高了 59% 至 81.36%。将 DIA 数据与我们的 Casanovo-DIA 模型相结合,有望发现新的肽段,并对生物样本进行更全面的分析。Casanovo-DIA 在 GNU GPL 许可证下免费提供,网址为 https://github.com/Biocomputing-Research-Group/Casanovo-DIA。
{"title":"Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry.","authors":"Shiva Ebrahimi, Xuan Guo","doi":"10.1109/bibe60311.2023.00013","DOIUrl":"https://doi.org/10.1109/bibe60311.2023.00013","url":null,"abstract":"<p><p>Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11044815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140873985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion Learning on Multiple-Tag RFID Measurements for Respiratory Rate Monitoring. 用于呼吸频率监测的多标签 RFID 测量融合学习。
Pub Date : 2020-10-01 Epub Date: 2020-12-16 DOI: 10.1109/bibe50027.2020.00082
Stephen Hansen, Daniel Schwartz, Jesse Stover, Md Abu Saleh Tajin, William M Mongan, Kapil R Dandekar

Future advances in the medical Internet of Things (IoT) will require sensors that are unobtrusive and passively powered. With the use of wireless, wearable, and passive knitted smart garment sensors, we monitor infant respiratory activity. We improve the utility of multi-tag Radio Frequency Identification (RFID) measurements via fusion learning across various features from multiple tags to determine the magnitude and temporal information of the artifacts. In this paper, we develop an algorithm that classifies and separates respiratory activity via a Regime Hidden Markov Model compounded with higher-order features of Minkowski and Mahalanobis distances. Our algorithm improves respiratory rate detection by increasing the Signal to Noise Ratio (SNR) on average from 17.12 dB to 34.74 dB. The effectiveness of our algorithm in increasing SNR shows that higher-order features can improve signal strength detection in RFID systems. Our algorithm can be extended to include more feature sources and can be used in a variety of machine learning algorithms for respiratory data classification, and other applications. Further work on the algorithm will include accurate parameterization of the algorithm's window size.

未来医疗物联网(IoT)的发展需要不显眼、无源供电的传感器。通过使用无线、可穿戴和无源针织智能服装传感器,我们监测了婴儿的呼吸活动。我们通过对来自多个标签的各种特征进行融合学习,来确定工件的大小和时间信息,从而提高多标签射频识别(RFID)测量的实用性。在本文中,我们开发了一种算法,该算法通过时序隐马尔可夫模型与闵科夫斯基距离和马哈拉诺比斯距离的高阶特征相结合,对呼吸活动进行分类和分离。我们的算法平均可将信噪比(SNR)从 17.12 dB 提高到 34.74 dB,从而改善呼吸频率检测。我们的算法在提高信噪比方面的有效性表明,高阶特征可以改善 RFID 系统中的信号强度检测。我们的算法可以扩展到更多的特征源,并可用于呼吸数据分类的各种机器学习算法和其他应用中。该算法的下一步工作将包括算法窗口大小的精确参数化。
{"title":"Fusion Learning on Multiple-Tag RFID Measurements for Respiratory Rate Monitoring.","authors":"Stephen Hansen, Daniel Schwartz, Jesse Stover, Md Abu Saleh Tajin, William M Mongan, Kapil R Dandekar","doi":"10.1109/bibe50027.2020.00082","DOIUrl":"10.1109/bibe50027.2020.00082","url":null,"abstract":"<p><p>Future advances in the medical Internet of Things (IoT) will require sensors that are unobtrusive and passively powered. With the use of wireless, wearable, and passive knitted smart garment sensors, we monitor infant respiratory activity. We improve the utility of multi-tag Radio Frequency Identification (RFID) measurements via fusion learning across various features from multiple tags to determine the magnitude and temporal information of the artifacts. In this paper, we develop an algorithm that classifies and separates respiratory activity via a Regime Hidden Markov Model compounded with higher-order features of Minkowski and Mahalanobis distances. Our algorithm improves respiratory rate detection by increasing the Signal to Noise Ratio (SNR) on average from 17.12 dB to 34.74 dB. The effectiveness of our algorithm in increasing SNR shows that higher-order features can improve signal strength detection in RFID systems. Our algorithm can be extended to include more feature sources and can be used in a variety of machine learning algorithms for respiratory data classification, and other applications. Further work on the algorithm will include accurate parameterization of the algorithm's window size.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130190/pdf/nihms-1701065.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39000982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Classification of Noisy, Gigapixel Histology Images. 噪声、千兆像素组织学图像的半监督分类
Pub Date : 2020-10-01 Epub Date: 2020-12-16 DOI: 10.1109/BIBE50027.2020.00097
J Vince Pulido, Shan Guleria, Lubaina Ehsan, Matthew Fasullo, Robert Lippman, Pritesh Mutha, Tilak Shah, Sana Syed, Donald E Brown

One of the greatest obstacles in the adoption of deep neural networks for new medical applications is that training these models typically require a large amount of manually labeled training samples. In this body of work, we investigate the semi-supervised scenario where one has access to large amounts of unlabeled data and only a few labeled samples. We study the performance of MixMatch and FixMatch-two popular semi-supervised learning methods-on a histology dataset. More specifically, we study these models' impact under a highly noisy and imbalanced setting. The findings here motivate the development of semi-supervised methods to ameliorate problems commonly encountered in medical data applications.

在新的医疗应用中采用深度神经网络的最大障碍之一是,训练这些模型通常需要大量人工标注的训练样本。在这部分工作中,我们研究了半监督场景,在这种场景中,我们可以访问大量未标记的数据,但只能访问少量标记样本。我们研究了混合匹配(MixMatch)和固定匹配(FixMatch)这两种流行的半监督学习方法在组织学数据集上的表现。更具体地说,我们研究了这些模型在高噪声和不平衡环境下的影响。这些发现推动了半监督方法的发展,以改善医疗数据应用中常见的问题。
{"title":"Semi-Supervised Classification of Noisy, Gigapixel Histology Images.","authors":"J Vince Pulido, Shan Guleria, Lubaina Ehsan, Matthew Fasullo, Robert Lippman, Pritesh Mutha, Tilak Shah, Sana Syed, Donald E Brown","doi":"10.1109/BIBE50027.2020.00097","DOIUrl":"10.1109/BIBE50027.2020.00097","url":null,"abstract":"<p><p>One of the greatest obstacles in the adoption of deep neural networks for new medical applications is that training these models typically require a large amount of manually labeled training samples. In this body of work, we investigate the semi-supervised scenario where one has access to large amounts of unlabeled data and only a few labeled samples. We study the performance of MixMatch and FixMatch-two popular semi-supervised learning methods-on a histology dataset. More specifically, we study these models' impact under a highly noisy and imbalanced setting. The findings here motivate the development of semi-supervised methods to ameliorate problems commonly encountered in medical data applications.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8144886/pdf/nihms-1696232.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39027379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Multiview Learning to Identify Population Structure with Multimodal Imaging. 基于多模态成像的深度多视图学习识别种群结构。
Pub Date : 2020-10-01 Epub Date: 2020-12-16 DOI: 10.1109/bibe50027.2020.00057
Yixue Feng, Kefei Liu, Mansu Kim, Qi Long, Xiaohui Yao, Li Shen

We present an effective deep multiview learning framework to identify population structure using multimodal imaging data. Our approach is based on canonical correlation analysis (CCA). We propose to use deep generalized CCA (DGCCA) to learn a shared latent representation of non-linearly mapped and maximally correlated components from multiple imaging modalities with reduced dimensionality. In our empirical study, this representation is shown to effectively capture more variance in original data than conventional generalized CCA (GCCA) which applies only linear transformation to the multi-view data. Furthermore, subsequent cluster analysis on the new feature set learned from DGCCA is able to identify a promising population structure in an Alzheimer's disease (AD) cohort. Genetic association analyses of the clustering results demonstrate that the shared representation learned from DGCCA yields a population structure with a stronger genetic basis than several competing feature learning methods.

我们提出了一个有效的深度多视图学习框架,利用多模态成像数据来识别种群结构。我们的方法是基于典型相关分析(CCA)。我们建议使用深度广义CCA (DGCCA)从多个降维成像模式中学习非线性映射和最大相关成分的共享潜在表示。在我们的实证研究中,这种表示被证明比传统的广义CCA (GCCA)更有效地捕获原始数据中的方差,后者仅对多视图数据进行线性变换。此外,对从DGCCA中学习到的新特征集的后续聚类分析能够识别阿尔茨海默病(AD)队列中有希望的人群结构。聚类结果的遗传关联分析表明,从DGCCA中学习到的共享表示比几种竞争的特征学习方法产生的群体结构具有更强的遗传基础。
{"title":"Deep Multiview Learning to Identify Population Structure with Multimodal Imaging.","authors":"Yixue Feng,&nbsp;Kefei Liu,&nbsp;Mansu Kim,&nbsp;Qi Long,&nbsp;Xiaohui Yao,&nbsp;Li Shen","doi":"10.1109/bibe50027.2020.00057","DOIUrl":"https://doi.org/10.1109/bibe50027.2020.00057","url":null,"abstract":"<p><p>We present an effective deep multiview learning framework to identify population structure using multimodal imaging data. Our approach is based on canonical correlation analysis (CCA). We propose to use deep generalized CCA (DGCCA) to learn a shared latent representation of non-linearly mapped and maximally correlated components from multiple imaging modalities with reduced dimensionality. In our empirical study, this representation is shown to effectively capture more variance in original data than conventional generalized CCA (GCCA) which applies only linear transformation to the multi-view data. Furthermore, subsequent cluster analysis on the new feature set learned from DGCCA is able to identify a promising population structure in an Alzheimer's disease (AD) cohort. Genetic association analyses of the clustering results demonstrate that the shared representation learned from DGCCA yields a population structure with a stronger genetic basis than several competing feature learning methods.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibe50027.2020.00057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25422735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Nanopore Guided Assembly of Segmental Duplications near Telomeres. 纳米孔引导组装端粒附近的片段复制。
Pub Date : 2019-10-01 Epub Date: 2019-12-26 DOI: 10.1109/bibe.2019.00020
Eleni Adam, Tunazzina Islam, Desh Ranjan, Harold Riethman

Human subtelomere regions are highly enriched in large segmental duplications and structural variants, leading to many gaps and misassemblies in these regions. We develop a novel method, NPGREAT (NanoPore Guided REgional Assembly Tool), which combines Nanopore ultralong read datasets and short-read assemblies derived from 10x linked-reads to efficiently assemble these subtelomere regions into a single continuous sequence. We show that with the use of ultralong Nanopore reads as a guide, the highly accurate shorter linked-read sequence contigs are correctly oriented, ordered, spaced and extended. In the rare cases where a linked-read sequence contig contains inaccurately assembled segments, the use of Nanopore reads allows for detection and correction of this error. We tested NPGREAT on four representative subtelomeres of the NA12878 human genome (10p, 16p, 19q and 20p). The results demonstrate that the final computed assembly of each subtelomere is accurate and complete.

人类亚染色体区域高度富含大片段重复和结构变异,导致这些区域出现许多空白和错误组装。我们开发了一种新方法--NPGREAT(NanoPore Guided REgional Assembly Tool,纳米孔引导区域组装工具),该方法结合了纳米孔超长读数数据集和 10 倍链接读数产生的短读数组装,可有效地将这些次elomere 区域组装成单一的连续序列。我们的研究表明,在超长 Nanopore 读数的指导下,高精度的短链接读数序列等位体可以正确地定向、排序、间隔和扩展。在极少数情况下,连读序列等高线包含不准确的组装片段,使用 Nanopore 读数可以检测并纠正这种错误。我们在 NA12878 人类基因组的四个具有代表性的子片段(10p、16p、19q 和 20p)上测试了 NPGREAT。结果表明,最终计算出的每个子基因组的组装结果都是准确和完整的。
{"title":"Nanopore Guided Assembly of Segmental Duplications near Telomeres.","authors":"Eleni Adam, Tunazzina Islam, Desh Ranjan, Harold Riethman","doi":"10.1109/bibe.2019.00020","DOIUrl":"10.1109/bibe.2019.00020","url":null,"abstract":"<p><p>Human subtelomere regions are highly enriched in large segmental duplications and structural variants, leading to many gaps and misassemblies in these regions. We develop a novel method, NPGREAT (NanoPore Guided REgional Assembly Tool), which combines Nanopore ultralong read datasets and short-read assemblies derived from 10x linked-reads to efficiently assemble these subtelomere regions into a single continuous sequence. We show that with the use of ultralong Nanopore reads as a guide, the highly accurate shorter linked-read sequence contigs are correctly oriented, ordered, spaced and extended. In the rare cases where a linked-read sequence contig contains inaccurately assembled segments, the use of Nanopore reads allows for detection and correction of this error. We tested NPGREAT on four representative subtelomeres of the NA12878 human genome (10p, 16p, 19q and 20p). The results demonstrate that the final computed assembly of each subtelomere is accurate and complete.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049597/pdf/nihms-1060068.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38884671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Modeling of Ebola Propagation. 埃博拉病毒传播的混合模型。
Pub Date : 2019-10-01 Epub Date: 2019-12-26 DOI: 10.1109/bibe.2019.00044
Cyrus Tanade, Nathanael Pate, Elianna Paljug, Ryan A Hoffman, May D Wang

The Ebola virus disease (EVD) epidemic that occurred in West Africa between 2014-16 resulted in over 28,000 cases and 11,000 deaths - one of the deadliest to date. A generalized model of the spatiotemporal progression of EVD for Liberia, Guinea, and Sierra Leone in 2014-16 remains elusive. There is also a disconnect in the literature on which interventions are most effective in curbing disease progression. To solve these two key issues, we designed a hybrid agent-based and compartmental model that switches from one paradigm to the other on a stochastic threshold. We modeled disease progression with promising accuracy using WHO datasets.

2014年至2016年期间在西非发生的埃博拉病毒病(EVD)疫情导致2.8万多例病例和1.1万人死亡,是迄今为止最致命的疫情之一。2014- 2016年利比里亚、几内亚和塞拉利昂埃博拉病毒病时空发展的广义模型仍然难以确定。关于哪些干预措施在抑制疾病进展方面最有效,文献中也存在脱节。为了解决这两个关键问题,我们设计了一个基于智能体和隔间的混合模型,该模型在随机阈值上从一种范式切换到另一种范式。我们使用世卫组织数据集对疾病进展进行了建模,准确度很高。
{"title":"Hybrid Modeling of Ebola Propagation.","authors":"Cyrus Tanade,&nbsp;Nathanael Pate,&nbsp;Elianna Paljug,&nbsp;Ryan A Hoffman,&nbsp;May D Wang","doi":"10.1109/bibe.2019.00044","DOIUrl":"https://doi.org/10.1109/bibe.2019.00044","url":null,"abstract":"<p><p>The Ebola virus disease (EVD) epidemic that occurred in West Africa between 2014-16 resulted in over 28,000 cases and 11,000 deaths - one of the deadliest to date. A generalized model of the spatiotemporal progression of EVD for Liberia, Guinea, and Sierra Leone in 2014-16 remains elusive. There is also a disconnect in the literature on which interventions are most effective in curbing disease progression. To solve these two key issues, we designed a hybrid agent-based and compartmental model that switches from one paradigm to the other on a stochastic threshold. We modeled disease progression with promising accuracy using WHO datasets.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibe.2019.00044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38060154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings 基于知识的生物医学词义消歧与神经概念嵌入
Pub Date : 2017-10-01 Epub Date: 2018-01-11 DOI: 10.1109/BIBE.2017.00-61
Akm Sabbir, Antonio Jimeno-Yepes, Ramakanth Kavuluru

Biomedical word sense disambiguation (WSD) is an important intermediate task in many natural language processing applications such as named entity recognition, syntactic parsing, and relation extraction. In this paper, we employ knowledge-based approaches that also exploit recent advances in neural word/concept embeddings to improve over the state-of-the-art in biomedical WSD using the public MSH WSD dataset [1] as the test set. Our methods involve weak supervision - we do not use any hand-labeled examples for WSD to build our prediction models; however, we employ an existing concept mapping program, MetaMap, to obtain our concept vectors. Over the MSH WSD dataset, our linear time (in terms of numbers of senses and words in the test instance) method achieves an accuracy of 92.24% which is a 3% improvement over the best known results [2] obtained via unsupervised means. A more expensive approach that we developed relies on a nearest neighbor framework and achieves accuracy of 94.34%, essentially cutting the error rate in half. Employing dense vector representations learned from unlabeled free text has been shown to benefit many language processing tasks recently and our efforts show that biomedical WSD is no exception to this trend. For a complex and rapidly evolving domain such as biomedicine, building labeled datasets for larger sets of ambiguous terms may be impractical. Here, we show that weak supervision that leverages recent advances in representation learning can rival supervised approaches in biomedical WSD. However, external knowledge bases (here sense inventories) play a key role in the improvements achieved.

生物医学词义消歧(WSD)是命名实体识别、句法分析和关系提取等许多自然语言处理应用中的一项重要中间任务。在本文中,我们采用了基于知识的方法,并利用神经词/概念嵌入的最新进展,以公共 MSH WSD 数据集 [1] 作为测试集,改进了生物医学 WSD 的先进水平。我们的方法涉及弱监督--我们不使用任何手工标记的 WSD 示例来建立预测模型;但是,我们使用现有的概念映射程序 MetaMap 来获取概念向量。在 MSH WSD 数据集上,我们的线性时间(以测试实例中的感官和单词数量计算)方法实现了 92.24% 的准确率,比通过无监督方法获得的最佳已知结果[2]提高了 3%。我们开发的一种更昂贵的方法依赖于近邻框架,准确率达到 94.34%,基本上将错误率降低了一半。从无标注的自由文本中学习到的密集向量表示最近已被证明有利于许多语言处理任务,我们的努力表明生物医学 WSD 也不例外。对于像生物医学这样复杂且快速发展的领域,为较大的模糊术语集建立标记数据集可能并不现实。在这里,我们展示了利用表征学习的最新进展进行的弱监督可以与生物医学 WSD 中的监督方法相媲美。然而,外部知识库(此处为感官清单)在实现改进方面发挥了关键作用。
{"title":"Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings","authors":"Akm Sabbir, Antonio Jimeno-Yepes, Ramakanth Kavuluru","doi":"10.1109/BIBE.2017.00-61","DOIUrl":"10.1109/BIBE.2017.00-61","url":null,"abstract":"<p><p>Biomedical word sense disambiguation (WSD) is an important intermediate task in many natural language processing applications such as named entity recognition, syntactic parsing, and relation extraction. In this paper, we employ knowledge-based approaches that also exploit recent advances in neural word/concept embeddings to improve over the state-of-the-art in biomedical WSD using the public MSH WSD dataset [1] as the test set. Our methods involve weak supervision - we do not use any hand-labeled examples for WSD to build our prediction models; however, we employ an existing concept mapping program, MetaMap, to obtain our concept vectors. Over the MSH WSD dataset, our linear time (in terms of numbers of senses and words in the test instance) method achieves an accuracy of 92.24% which is a 3% improvement over the best known results [2] obtained via unsupervised means. A more expensive approach that we developed relies on a nearest neighbor framework and achieves accuracy of 94.34%, essentially cutting the error rate in half. Employing dense vector representations learned from unlabeled free text has been shown to benefit many language processing tasks recently and our efforts show that biomedical WSD is no exception to this trend. For a complex and rapidly evolving domain such as biomedicine, building labeled datasets for larger sets of ambiguous terms may be impractical. Here, we show that weak supervision that leverages recent advances in representation learning can rival supervised approaches in biomedical WSD. However, external knowledge bases (here sense inventories) play a key role in the improvements achieved.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792196/pdf/nihms919324.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35792371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Delirium Prediction using Machine Learning Models on Preoperative Electronic Health Records Data. 基于术前电子健康记录数据的机器学习模型预测谵妄。
Pub Date : 2017-10-01 Epub Date: 2018-01-11 DOI: 10.1109/BIBE.2017.00014
Anis Davoudi, Ashkan Ebadi, Parisa Rashidi, Tazcan Ozrazgat-Baslanti, Azra Bihorac, Alberto C Bursian

Electronic Health Records (EHR) are mainly designed to record relevant patient information during their stay in the hospital for administrative purposes. They additionally provide an efficient and inexpensive source of data for medical research, such as patient outcome prediction. In this study, we used preoperative Electronic Health Records to predict postoperative delirium. We compared the performance of seven machine learning models on delirium prediction: linear models, generalized additive models, random forests, support vector machine, neural networks, and extreme gradient boosting. Among the models evaluated in this study, random forests and generalized additive model outperformed the other models in terms of the overall performance metrics for prediction of delirium, particularly with respect to sensitivity. We found that age, alcohol or drug abuse, socioeconomic status, underlying medical issue, severity of medical problem, and attending surgeon can affect the risk of delirium.

电子健康档案(EHR)主要用于记录患者住院期间的相关信息,以供管理之用。此外,它们还为医学研究提供了高效而廉价的数据来源,例如患者预后预测。在这项研究中,我们使用术前电子健康记录来预测术后谵妄。我们比较了7种机器学习模型在谵妄预测上的性能:线性模型、广义加性模型、随机森林、支持向量机、神经网络和极端梯度增强。在本研究评估的模型中,随机森林和广义加性模型在预测谵妄的总体性能指标方面优于其他模型,特别是在敏感性方面。我们发现,年龄、酒精或药物滥用、社会经济地位、潜在的医疗问题、医疗问题的严重程度和主治医生都可能影响谵妄的风险。
{"title":"Delirium Prediction using Machine Learning Models on Preoperative Electronic Health Records Data.","authors":"Anis Davoudi,&nbsp;Ashkan Ebadi,&nbsp;Parisa Rashidi,&nbsp;Tazcan Ozrazgat-Baslanti,&nbsp;Azra Bihorac,&nbsp;Alberto C Bursian","doi":"10.1109/BIBE.2017.00014","DOIUrl":"https://doi.org/10.1109/BIBE.2017.00014","url":null,"abstract":"<p><p>Electronic Health Records (EHR) are mainly designed to record relevant patient information during their stay in the hospital for administrative purposes. They additionally provide an efficient and inexpensive source of data for medical research, such as patient outcome prediction. In this study, we used preoperative Electronic Health Records to predict postoperative delirium. We compared the performance of seven machine learning models on delirium prediction: linear models, generalized additive models, random forests, support vector machine, neural networks, and extreme gradient boosting. Among the models evaluated in this study, random forests and generalized additive model outperformed the other models in terms of the overall performance metrics for prediction of delirium, particularly with respect to sensitivity. We found that age, alcohol or drug abuse, socioeconomic status, underlying medical issue, severity of medical problem, and attending surgeon can affect the risk of delirium.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBE.2017.00014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36647396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
In silico assessment of the effects of material on stent deployment. 材料对支架展开影响的计算机评估。
Pub Date : 2017-10-01 Epub Date: 2018-01-11 DOI: 10.1109/BIBE.2017.00-11
Georgia S Karanasiou, Nikolaos S Tachos, Antonios Sakellarios, Lampros K Michalis, Claire Conway, Elazer R Edelman, Dimitrios I Fotiadis

Coronary stents are expandable scaffolds that are used to widen occluded diseased arteries and restore blood flow. Because of the strain they are exposed to and forces they must resist as well as the importance of surface interactions, material properties are dominant. Indeed, a common differentiating factors amongst commercially available stents is their material. Several performance requirements relate to stent materials including radial strength for adequate arterial support post-deployment. This study investigated the effect of the stent material in three finite element models using different stents made of: (i) Cobalt-Chromium (CoCr), (ii) Stainless Steel (SS316L), and (iii) Platinum Chromium (PtCr). Deployment was investigated in a patient specific arterial geometry, created based on a fusion of angiographic data and intravascular ultrasound images. In silico results show that: (i) the maximum von Mises stress occurs for the CoCr, however the curved areas of the stent links present higher stresses compared to the straight stent segments for all stents, (ii) more areas of high inner arterial stress exist in the case of the CoCr stent deployment, (iii) there is no significant difference in the percentage of arterial stress volume distribution among all models.

冠状动脉支架是一种可扩展的支架,用于扩大闭塞的病变动脉并恢复血液流动。由于它们所暴露的应变和它们必须抵抗的力以及表面相互作用的重要性,材料性质占主导地位。事实上,商用支架之间的一个共同区别因素是它们的材料。一些性能要求与支架材料有关,包括部署后足够动脉支持的径向强度。本研究在三种有限元模型中研究了支架材料的影响,采用不同的支架制成:(i)钴铬(CoCr), (ii)不锈钢(SS316L)和(iii)铂铬(PtCr)。基于血管造影数据和血管内超声图像的融合,研究了患者特定动脉几何形状的部署。计算机模拟结果表明:(1)CoCr的von Mises应力最大,但所有支架的弯曲区域比直支架段的应力更高;(2)CoCr支架部署情况下存在更多的高动脉内应力区域;(3)各模型的动脉应力体积分布百分比无显著差异。
{"title":"<i>In silico</i> assessment of the effects of material on stent deployment.","authors":"Georgia S Karanasiou,&nbsp;Nikolaos S Tachos,&nbsp;Antonios Sakellarios,&nbsp;Lampros K Michalis,&nbsp;Claire Conway,&nbsp;Elazer R Edelman,&nbsp;Dimitrios I Fotiadis","doi":"10.1109/BIBE.2017.00-11","DOIUrl":"https://doi.org/10.1109/BIBE.2017.00-11","url":null,"abstract":"<p><p>Coronary stents are expandable scaffolds that are used to widen occluded diseased arteries and restore blood flow. Because of the strain they are exposed to and forces they must resist as well as the importance of surface interactions, material properties are dominant. Indeed, a common differentiating factors amongst commercially available stents is their material. Several performance requirements relate to stent materials including radial strength for adequate arterial support post-deployment. This study investigated the effect of the stent material in three finite element models using different stents made of: (i) Cobalt-Chromium (CoCr), (ii) Stainless Steel (SS316L), and (iii) Platinum Chromium (PtCr). Deployment was investigated in a patient specific arterial geometry, created based on a fusion of angiographic data and intravascular ultrasound images. <i>In silico</i> results show that: (i) the maximum von Mises stress occurs for the CoCr, however the curved areas of the stent links present higher stresses compared to the straight stent segments for all stents, (ii) more areas of high inner arterial stress exist in the case of the CoCr stent deployment, (iii) there is no significant difference in the percentage of arterial stress volume distribution among all models.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBE.2017.00-11","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36371103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identification of signaling pathways related to drug efficacy in hepatocellular carcinoma via integration of phosphoproteomic, genomic and clinical data. 通过整合磷蛋白组学、基因组学和临床数据,鉴定与肝细胞癌药物疗效相关的信号通路。
Pub Date : 2013-11-01 DOI: 10.1109/BIBE.2013.6701683
Ioannis N Melas, Douglas A Lauffenburger, Leonidas G Alexopoulos

Hepatocellular Carcinoma (HCC) is one of the leading causes of death worldwide, with only a handful of treatments effective in unresectable HCC. Most of the clinical trials for HCC using new generation interventions (drug-targeted therapies) have poor efficacy whereas just a few of them show some promising clinical outcomes [1]. This is amongst the first studies where the mode of action of some of the compounds extensively used in clinical trials is interrogated on the phosphoproteomic level, in an attempt to build predictive models for clinical efficacy. Signaling data are combined with previously published gene expression and clinical data within a consistent framework that identifies drug effects on the phosphoproteomic level and translates them to the gene expression level. The interrogated drugs are then correlated with genes differentially expressed in normal versus tumor tissue, and genes predictive of patient survival. Although the number of clinical trial results considered is small, our approach shows potential for discerning signaling activities that may help predict drug efficacy for HCC.

肝细胞癌(HCC)是世界范围内死亡的主要原因之一,只有少数治疗方法对不可切除的HCC有效。大多数采用新一代干预措施(药物靶向治疗)治疗HCC的临床试验疗效不佳,只有少数临床试验显示出一些有希望的结果[1]。这是第一批在磷蛋白组学水平上对临床试验中广泛使用的一些化合物的作用模式进行研究的研究之一,试图建立临床疗效的预测模型。信号数据与先前发表的基因表达和临床数据相结合,在一致的框架内确定药物对磷蛋白组水平的影响,并将其转化为基因表达水平。然后,被询问的药物与正常组织与肿瘤组织中差异表达的基因以及预测患者生存的基因相关。尽管考虑的临床试验结果数量很少,但我们的方法显示了识别信号活动的潜力,可能有助于预测HCC的药物疗效。
{"title":"Identification of signaling pathways related to drug efficacy in hepatocellular carcinoma via integration of phosphoproteomic, genomic and clinical data.","authors":"Ioannis N Melas,&nbsp;Douglas A Lauffenburger,&nbsp;Leonidas G Alexopoulos","doi":"10.1109/BIBE.2013.6701683","DOIUrl":"https://doi.org/10.1109/BIBE.2013.6701683","url":null,"abstract":"<p><p>Hepatocellular Carcinoma (HCC) is one of the leading causes of death worldwide, with only a handful of treatments effective in unresectable HCC. Most of the clinical trials for HCC using new generation interventions (drug-targeted therapies) have poor efficacy whereas just a few of them show some promising clinical outcomes [1]. This is amongst the first studies where the mode of action of some of the compounds extensively used in clinical trials is interrogated on the phosphoproteomic level, in an attempt to build predictive models for clinical efficacy. Signaling data are combined with previously published gene expression and clinical data within a consistent framework that identifies drug effects on the phosphoproteomic level and translates them to the gene expression level. The interrogated drugs are then correlated with genes differentially expressed in normal versus tumor tissue, and genes predictive of patient survival. Although the number of clinical trial results considered is small, our approach shows potential for discerning signaling activities that may help predict drug efficacy for HCC.</p>","PeriodicalId":87347,"journal":{"name":"Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBE.2013.6701683","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33094824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings. IEEE International Symposium on Bioinformatics and Bioengineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1