Proceedings. IEEE International Conference on Bioinformatics and Biomedicine最新文献_第4页

Transformer-based Multi-target Regression on Electronic Health Records for Primordial Prevention of Cardiovascular Disease. 基于变压器的电子健康记录多目标回归初步预防心血管疾病。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2021-12-01 DOI: 10.1109/bibm52615.2021.9669441

Raphael Poulain, Mehak Gupta, Randi Foraker, Rahmatollah Beheshti

Machine learning algorithms have been widely used to capture the static and temporal patterns within electronic health records (EHRs). While many studies focus on the (primary) prevention of diseases, primordial prevention (preventing the factors that are known to increase the risk of a disease occurring) is still widely under-investigated. In this study, we propose a multi-target regression model leveraging transformers to learn the bidirectional representations of EHR data and predict the future values of 11 major modifiable risk factors of cardiovascular disease (CVD). Inspired by the proven results of pre-training in natural language processing studies, we apply the same principles on EHR data, dividing the training of our model into two phases: pre-training and fine-tuning. We use the fine-tuned transformer model in a "multi-target regression" theme. Following this theme, we combine the 11 disjoint prediction tasks by adding shared and target-specific layers to the model and jointly train the entire model. We evaluate the performance of our proposed method on a large publicly available EHR dataset. Through various experiments, we demonstrate that the proposed method obtains a significant improvement (12.6% MAE on average across all 11 different outputs) over the baselines.

机器学习算法已被广泛用于捕获电子健康记录(EHRs)中的静态和时间模式。虽然许多研究侧重于疾病的(初级)预防，但初级预防(预防已知会增加疾病发生风险的因素)仍未得到广泛调查。在这项研究中，我们提出了一个多目标回归模型，利用变压器来学习电子病历数据的双向表示，并预测心血管疾病(CVD) 11个主要可改变危险因素的未来值。受自然语言处理研究中预训练结果的启发，我们将相同的原理应用于EHR数据，将模型的训练分为两个阶段:预训练和微调。我们在“多目标回归”主题中使用微调变压器模型。根据这一主题，我们通过在模型中添加共享层和目标特定层，将11个不相交的预测任务组合在一起，共同训练整个模型。我们在一个大型公开可用的EHR数据集上评估了我们提出的方法的性能。通过各种实验，我们证明了所提出的方法在基线上获得了显着的改进(所有11个不同输出的平均MAE为12.6%)。

{"title":"Transformer-based Multi-target Regression on Electronic Health Records for Primordial Prevention of Cardiovascular Disease.","authors":"Raphael Poulain, Mehak Gupta, Randi Foraker, Rahmatollah Beheshti","doi":"10.1109/bibm52615.2021.9669441","DOIUrl":"https://doi.org/10.1109/bibm52615.2021.9669441","url":null,"abstract":"Machine learning algorithms have been widely used to capture the static and temporal patterns within electronic health records (EHRs). While many studies focus on the (primary) prevention of diseases, primordial prevention (preventing the factors that are known to increase the risk of a disease occurring) is still widely under-investigated. In this study, we propose a multi-target regression model leveraging transformers to learn the bidirectional representations of EHR data and predict the future values of 11 major modifiable risk factors of cardiovascular disease (CVD). Inspired by the proven results of pre-training in natural language processing studies, we apply the same principles on EHR data, dividing the training of our model into two phases: pre-training and fine-tuning. We use the fine-tuned transformer model in a \"multi-target regression\" theme. Following this theme, we combine the 11 disjoint prediction tasks by adding shared and target-specific layers to the model and jointly train the entire model. We evaluate the performance of our proposed method on a large publicly available EHR dataset. Through various experiments, we demonstrate that the proposed method obtains a significant improvement (12.6% MAE on average across all 11 different outputs) over the baselines.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2021 ","pages":"726-731"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9859711/pdf/nihms-1865432.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9166302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Tracing Filaments in Simulated 3D Cryo-Electron Tomography Maps Using a Fast Dynamic Programming Algorithm. 利用快速动态规划算法在模拟三维冷冻电子断层扫描图中追踪细丝。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2021-12-01 DOI: 10.1109/bibm52615.2021.9669318

Salim Sazzed, Peter Scheible, Jing He, Willy Wriggers

We propose a fast, dynamic programming-based framework for tracing actin filaments in 3D maps of subcellular components in cryo-electron tomography. The approach can identify high-density filament segments in various orientations, but it takes advantage of the arrangement of actin filaments within cells into more or less tightly aligned bundles. Assuming that the tomogram can be rotated such that the filaments can be oriented to be directed in a dominant direction (i.e., the $X$ , $Y$ , or $Z$ axis), the proposed framework first identifies local seed points that form the origin of candidate filament segments (CFSs), which are then grown from the seeds using a fast dynamic programming algorithm. The CFS length $l$ can be tuned to the nominal resolution of the tomogram or the separation of desired features, or it can be used to restrict the curvature of filaments that deviate from the overall bundle direction. In subsequent steps, the CFSs are filtered based on backward tracing and path density analysis. Finally, neighboring CFSs are fused based on a collinearity criterion to bridge any noise artifacts in the 3D map that would otherwise fractionalize the tracing. We validate our proposed framework on simulated tomograms that closely mimic the features and appearance of experimental maps.

我们提出了一个快速的，基于动态规划的框架，用于在低温电子断层扫描的亚细胞成分的3D地图中追踪肌动蛋白丝。这种方法可以识别不同方向的高密度丝段，但它利用了细胞内肌动蛋白丝排列成或多或少紧密排列的束的优势。假设断层图可以旋转，使得细丝可以定向到主导方向(即X, Y或Z轴)，所提出的框架首先确定形成候选细丝片段(CFSs)起源的局部种子点，然后使用快速动态规划算法从种子中生长。CFS长度l可以调整到层析图的标称分辨率或所需特征的分离，或者它可以用来限制偏离整体束方向的细丝的曲率。在随后的步骤中，基于反向跟踪和路径密度分析对CFSs进行过滤。最后，基于共线性准则融合相邻的CFSs，以桥接3D地图中的任何噪声伪影，否则将分割跟踪。我们在模拟层析图上验证了我们提出的框架，这些层析图密切模仿实验地图的特征和外观。

{"title":"Tracing Filaments in Simulated 3D Cryo-Electron Tomography Maps Using a Fast Dynamic Programming Algorithm.","authors":"Salim Sazzed, Peter Scheible, Jing He, Willy Wriggers","doi":"10.1109/bibm52615.2021.9669318","DOIUrl":"https://doi.org/10.1109/bibm52615.2021.9669318","url":null,"abstract":"We propose a fast, dynamic programming-based framework for tracing actin filaments in 3D maps of subcellular components in cryo-electron tomography. The approach can identify high-density filament segments in various orientations, but it takes advantage of the arrangement of actin filaments within cells into more or less tightly aligned bundles. Assuming that the tomogram can be rotated such that the filaments can be oriented to be directed in a dominant direction (i.e., the <math><mi>X</mi></math>, <math><mi>Y</mi></math>, or <math><mi>Z</mi></math> axis), the proposed framework first identifies local seed points that form the origin of candidate filament segments (CFSs), which are then grown from the seeds using a fast dynamic programming algorithm. The CFS length <math><mrow><mi>l</mi></mrow></math> can be tuned to the nominal resolution of the tomogram or the separation of desired features, or it can be used to restrict the curvature of filaments that deviate from the overall bundle direction. In subsequent steps, the CFSs are filtered based on backward tracing and path density analysis. Finally, neighboring CFSs are fused based on a collinearity criterion to bridge any noise artifacts in the 3D map that would otherwise fractionalize the tracing. We validate our proposed framework on simulated tomograms that closely mimic the features and appearance of experimental maps.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2021 ","pages":"2553-2559"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353374/pdf/nihms-1823578.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9852614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

TomoSim: Simulation of Filamentous Cryo-Electron Tomograms. TomoSim：丝状低温电子断层扫描模拟。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2021-12-01 DOI: 10.1109/bibm52615.2021.9669370

Peter Scheible, Salim Sazzed, Jing He, Willy Wriggers

As automated filament tracing algorithms in cryo-electron tomography (cryo-ET) continue to improve, the validation of these approaches has become more incumbent. Having a known ground truth on which to base predictions is crucial to reliably test predicted cytoskeletal filaments because the detailed structure of the filaments in experimental tomograms is obscured by a low resolution, as well as by noise and missing Fourier space wedge artifacts. We present a software tool for the realistic simulation of tomographic maps (TomoSim) based on a known filament trace. The parameters of the simulated map are automatically matched to those of a corresponding experimental map. We describe the computational details of the first prototype of our approach, which includes wedge masking in Fourier space, noise color, and signal-to-noise matching. We also discuss current and potential future applications of the approach in the validation of concurrent filament tracing methods in cryo-ET.

随着低温电子断层成像（cryo-ET）中的自动细丝追踪算法不断改进，对这些方法的验证变得越来越重要。由于低分辨率以及噪声和缺失的傅立叶空间楔形伪影掩盖了实验断层图中细丝的详细结构，因此拥有一个已知的基本事实作为预测依据对于可靠地测试预测的细胞骨架细丝至关重要。我们介绍了一种基于已知丝状物轨迹的层析图真实模拟软件工具（TomoSim）。模拟图的参数会自动与相应实验图的参数相匹配。我们描述了我们方法的第一个原型的计算细节，包括傅立叶空间的楔形掩蔽、噪声颜色和信噪比匹配。我们还讨论了该方法在低温电子显微镜中验证并行丝状追踪方法的当前和未来潜在应用。

引用次数: 0

Extracting Disease-Relevant Features with Adversarial Regularization. 对抗正则化提取疾病相关特征。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2021-12-01 DOI: 10.1109/bibm52615.2021.9669878

Junxiang Chen, Li Sun, Ke Yu, Kayhan Batmanghelich

Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.

提取隐藏表型在医疗数据分析中是必不可少的，因为它有助于疾病分型、诊断和了解疾病病因。由于隐性表型通常是全面描述疾病的低维表示，因此我们需要一种降维方法来捕获尽可能多的疾病相关信息。然而，大多数无监督或自监督方法无法实现目标，因为它们学习的是包含疾病相关和疾病无关信息的整体表示。有监督的方法只能捕获预测目标临床变量的信息，但学习到的表征通常不能推广到疾病的各个方面。因此，我们开发了一种基于信息论的降维方法来提取疾病相关特征(drf)。我们建议使用那些弱定义疾病的临床变量作为所谓的锚点。我们推导了一个公式，该公式使DRF预测锚点，同时通过对抗性正则化强制剩余表示与锚点无关。我们将我们的方法应用于慢性阻塞性肺疾病(COPD)的大规模研究。我们的实验表明:(1)学习drf在预测锚点方面与原始表征一样具有预测性，尽管它的维度明显较低。(2)与监督表示相比，学习到的drf对训练过程中未使用的其他相关疾病指标更具预测性。(3)学习到的drf与非成像生物学测量(如基因表达)有关，表明drf包含与疾病潜在生物学相关的信息。

{"title":"Extracting Disease-Relevant Features with Adversarial Regularization.","authors":"Junxiang Chen, Li Sun, Ke Yu, Kayhan Batmanghelich","doi":"10.1109/bibm52615.2021.9669878","DOIUrl":"https://doi.org/10.1109/bibm52615.2021.9669878","url":null,"abstract":"Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":" ","pages":"3464-3471"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8863436/pdf/nihms-1778852.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39659267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Lexical-based Formal Concept Analysis Method to Identify Missing Concepts in the NCI Thesaurus. 一种基于词汇的形式概念分析方法识别NCI词库中的缺失概念。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2020-12-01 Epub Date: 2021-01-13 DOI: 10.1109/bibm49941.2020.9313186

Fengbo Zheng, Licong Cui

Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the Disease or Disorder sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.

生物医学术语越来越多地用于现代生物医学研究和应用，以方便数据管理和确保语义互操作性。作为进化过程的一部分，为了响应不断发展的领域知识和新兴应用，生物医学术语中定期添加新概念。大多数现有的概念充实方法都是通过直接从外部资源导入知识来提出新概念的。在本文中，我们引入了一种基于形式概念分析(FCA)的词法方法，通过利用其固有知识-概念名称来识别给定术语中潜在的缺失概念。我们首先基于概念的词汇特征构建FCA形式语境。然后，我们执行多阶段交叉来形式化新概念并检测潜在的缺失概念。我们将我们的方法应用于国家癌症研究所(NCI)同义词库(19.08d版本)中的疾病或紊乱子层次结构，并确定了总共8,983个可能缺失的概念。作为对我们的方法的初步评估，以验证可能缺失的概念，我们进一步检查了它们是否包含在统一医学语言系统(UMLS)中的任何外部源术语中。结果显示，在UMLS中发现了8,937个可能缺失的概念中的592个。

{"title":"A Lexical-based Formal Concept Analysis Method to Identify Missing Concepts in the NCI Thesaurus.","authors":"Fengbo Zheng, Licong Cui","doi":"10.1109/bibm49941.2020.9313186","DOIUrl":"https://doi.org/10.1109/bibm49941.2020.9313186","url":null,"abstract":"Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the Disease or Disorder sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2020 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm49941.2020.9313186","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39579552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Comparison of Convolutional Neural Network Architectures and their Influence on Patient Classification Tasks Relating to Altered Mental Status. 卷积神经网络结构的比较及其对精神状态改变患者分类任务的影响。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2020-12-01 Epub Date: 2021-01-13 DOI: 10.1109/bibm49941.2020.9313156

Kevin Gagnon, Tami L Crawford, Jihad Obeid

With the pervasiveness of Electronic Health Records in many hospital systems, the application of machine learning techniques to the field of health informatics has become much more feasible as large amounts of data become more accessible. In our experiment, we evaluated several different convolutional neural network architectures that are typically used in text classification tasks. We then tested those models based on 1,113 histories of present illness. (HPI) notes. This data was run over both sequential and multi-channel architectures, as well as a structure that implemented attention methods meant to focus the model on learning the influential data points within the text. We found that the multi-channel model performed the best with an accuracy of 92%, while the attention and sequential models performed worse with an accuracy of 90% and 89% respectively.

随着电子健康记录在许多医院系统中的普及，机器学习技术在健康信息学领域的应用变得更加可行，因为大量数据变得更容易访问。在我们的实验中，我们评估了几种不同的卷积神经网络架构，它们通常用于文本分类任务。然后，我们根据1113例当前疾病的历史对这些模型进行了测试。(HPI)指出。该数据在顺序和多通道架构以及实现注意力方法的结构上运行，这意味着模型将重点放在学习文本中有影响的数据点上。我们发现，多通道模型表现最好，准确率为92%，而注意力和顺序模型表现较差，准确率分别为90%和89%。

引用次数: 0

Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes. 使用诊断代码从电子病历中识别肥厚性心肌病患者的机器学习模型的解释性分析。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2020-12-01 Epub Date: 2021-01-13 DOI: 10.1109/bibm49941.2020.9313231

Nasibeh Zanjirani Farahani, Shivaram Poigai Arunachalam, Divaakar Siva Baala Sundaram, Kalyan Pasupathy, Moein Enayati, Adelaide M Arruda-Olson

Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients. We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of "yes definite HCM", "no HCM phenotype", and "possible HCM" after a manual review of medical records and imaging tests. In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with "yes definite", "possible" and "no" HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.

肥厚性心肌病(HCM)是一种遗传性心脏病，是年轻人心源性猝死(SCD)的主要原因。尽管存在众所周知的危险因素和现有的临床实践指南，HCM患者仍未得到充分诊断和治疗。开发基于电子健康记录(EHR)数据的机器学习模型可以帮助更好地诊断HCM，从而改善数百名患者的生活。使用HCM计费代码的自动表型分析在文献中受到了有限的关注，之前的出版物很少。在本文中，我们提出了一个新的预测模型，帮助医生做出诊断决策，通过从类似患者的历史数据中学习的信息。我们收集了11,562名在1995年至2019年期间就诊过梅奥诊所的已知或疑似HCM患者的队列。从EHR数据仓库中提取这些患者的所有现有计费代码。通过使用HCM诊断超声心动图(echo)或心脏磁共振(CMR)成像的金标准成像测试，确认HCM诊断提供了训练机器学习模型的目标地面真值标记。因此，在人工审查医疗记录和影像学检查后，患者被标记为“有明确的HCM”、“无HCM表型”和“可能的HCM”三类。在本研究中，由于账单码在广泛用例中的实际应用和预期准确性，采用随机森林来研究账单码对HCM患者识别的预测性能。我们的模型在发现“是，明确”、“可能”和“不”HCM患者方面表现良好，准确率为71%，加权召回率为70%，精度为75%，加权F1评分为72%。此外，我们提供了基于多维尺度和主成分分析的可视化，为临床医生的解释提供见解。该模型可用于HCM患者的EHR数据识别，并帮助临床医生进行诊断决策。

{"title":"Explanatory Analysis of a Machine Learning Model to Identify Hypertrophic Cardiomyopathy Patients from EHR Using Diagnostic Codes.","authors":"Nasibeh Zanjirani Farahani, Shivaram Poigai Arunachalam, Divaakar Siva Baala Sundaram, Kalyan Pasupathy, Moein Enayati, Adelaide M Arruda-Olson","doi":"10.1109/bibm49941.2020.9313231","DOIUrl":"https://doi.org/10.1109/bibm49941.2020.9313231","url":null,"abstract":"Hypertrophic cardiomyopathy (HCM) is a genetic heart disease that is the leading cause of sudden cardiac death (SCD) in young adults. Despite the well-known risk factors and existing clinical practice guidelines, HCM patients are underdiagnosed and sub-optimally managed. Developing machine learning models on electronic health record (EHR) data can help in better diagnosis of HCM and thus improve hundreds of patient lives. Automated phenotyping using HCM billing codes has received limited attention in the literature with a small number of prior publications. In this paper, we propose a novel predictive model that helps physicians in making diagnostic decisions, by means of information learned from historical data of similar patients. We assembled a cohort of 11,562 patients with known or suspected HCM who have visited Mayo Clinic between the years 1995 to 2019. All existing billing codes of these patients were extracted from the EHR data warehouse. Target ground truth labeling for training the machine learning model was provided by confirmed HCM diagnosis using the gold standard imaging tests for HCM diagnosis echocardiography (echo), or cardiac magnetic resonance (CMR) imaging. As the result, patients were labeled into three categories of \"yes definite HCM\", \"no HCM phenotype\", and \"possible HCM\" after a manual review of medical records and imaging tests. In this study, a random forest was adopted to investigate the predictive performance of billing codes for the identification of HCM patients due to its practical application and expected accuracy in a wide range of use cases. Our model performed well in finding patients with \"yes definite\", \"possible\" and \"no\" HCM with an accuracy of 71%, weighted recall of 70%, the precision of 75%, and weighted F1 score of 72%. Furthermore, we provided visualizations based on multidimensional scaling and the principal component analysis to provide insights for clinicians' interpretation. This model can be used for the identification of HCM patients using their EHR data, and help clinicians in their diagnosis decision making.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2020 ","pages":"1932-1937"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm49941.2020.9313231","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39227791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Attention Mechanism with BERT for Content Annotation and Categorization of Pregnancy-Related Questions on a Community Q&A Site. 基于BERT的社区问答网站妊娠相关问题内容标注与分类注意机制

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2020-12-01 Epub Date: 2021-01-13 DOI: 10.1109/bibm49941.2020.9313379

Xiao Luo, Haoran Ding, Matthew Tang, Priyanka Gandhi, Zhan Zhang, Zhe He

In recent years, the social web has been increasingly used for health information seeking, sharing, and subsequent health-related research. Women often use the Internet or social networking sites to seek information related to pregnancy in different stages. They may ask questions about birth control, trying to conceive, labor, or taking care of a newborn or baby. Classifying different types of questions about pregnancy information (e.g., before, during, and after pregnancy) can inform the design of social media and professional websites for pregnancy education and support. This research aims to investigate the attention mechanism built-in or added on top of the BERT model in classifying and annotating the pregnancy-related questions posted on a community Q&A site. We evaluated two BERT-based models and compared them against the traditional machine learning models for question classification. Most importantly, we investigated two attention mechanisms: the built-in self-attention mechanism of BERT and the additional attention layer on top of BERT for relevant term annotation. The classification performance showed that the BERT-based models worked better than the traditional models, and BERT with an additional attention layer can achieve higher overall precision than the basic BERT model. The results also showed that both attention mechanisms work differently on annotating relevant content, and they could serve as feature selection methods for text mining in general.

近年来，社交网络越来越多地用于健康信息的搜索、共享和随后的健康相关研究。女性经常使用互联网或社交网站来寻找与不同阶段怀孕有关的信息。他们可能会询问有关节育、尝试怀孕、分娩或照顾新生儿或婴儿的问题。对怀孕信息的不同类型问题进行分类(如怀孕前、怀孕中、怀孕后)，可以为怀孕教育和支持的社交媒体和专业网站的设计提供信息。本研究旨在探讨在BERT模型的基础上，对社区问答网站上的妊娠相关问题进行分类和标注的注意机制。我们评估了两种基于bert的模型，并将它们与传统的机器学习模型进行了问题分类比较。最重要的是，我们研究了两种注意机制:BERT的内置自注意机制和BERT上附加的相关术语注释注意层。分类性能表明，基于BERT的模型比传统模型的分类效果更好，并且增加注意层的BERT比基本BERT模型的整体精度更高。结果还表明，两种注意机制在注释相关内容方面的作用不同，它们可以作为文本挖掘的特征选择方法。

{"title":"Attention Mechanism with BERT for Content Annotation and Categorization of Pregnancy-Related Questions on a Community Q&A Site.","authors":"Xiao Luo, Haoran Ding, Matthew Tang, Priyanka Gandhi, Zhan Zhang, Zhe He","doi":"10.1109/bibm49941.2020.9313379","DOIUrl":"https://doi.org/10.1109/bibm49941.2020.9313379","url":null,"abstract":"In recent years, the social web has been increasingly used for health information seeking, sharing, and subsequent health-related research. Women often use the Internet or social networking sites to seek information related to pregnancy in different stages. They may ask questions about birth control, trying to conceive, labor, or taking care of a newborn or baby. Classifying different types of questions about pregnancy information (e.g., before, during, and after pregnancy) can inform the design of social media and professional websites for pregnancy education and support. This research aims to investigate the attention mechanism built-in or added on top of the BERT model in classifying and annotating the pregnancy-related questions posted on a community Q&A site. We evaluated two BERT-based models and compared them against the traditional machine learning models for question classification. Most importantly, we investigated two attention mechanisms: the built-in self-attention mechanism of BERT and the additional attention layer on top of BERT for relevant term annotation. The classification performance showed that the BERT-based models worked better than the traditional models, and BERT with an additional attention layer can achieve higher overall precision than the basic BERT model. The results also showed that both attention mechanisms work differently on annotating relevant content, and they could serve as feature selection methods for text mining in general.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2020 ","pages":"1077-1081"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm49941.2020.9313379","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25431135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A health consumer ontology of fast food information. 健康消费者本体快餐信息。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2020-12-01 Epub Date: 2021-01-13 DOI: 10.1109/bibm49941.2020.9313375

Muhammad Amith, Jing Wang, Grace Xiong, Kirk Roberts, Cui Tao

A variety of severe health issues can be attributed to poor nutrition and poor eating behaviors. Research has explored the impact of nutritional knowledge on an individual's inclination to purchase and consume certain foods. This paper introduces the Ontology of Fast Food Facts, a knowledge base that models consumer nutritional data from major fast food establishments. This artifact serves as an aggregate knowledge base to centralize nutritional information for consumers. As a semantically-linked data source, the Ontology of Fast Food Facts could engender methods and tools to further the research and impact the health consumers' diet and behavior, which is a factor in many severe health outcomes. We describe the initial development of this ontology and future directions we plan with this knowledge base.

各种严重的健康问题可归因于营养不良和不良的饮食习惯。研究已经探索了营养知识对个人购买和消费某些食物的倾向的影响。本文介绍了快餐事实本体，这是一个对主要快餐企业的消费者营养数据进行建模的知识库。这个工件作为一个集合知识库，为消费者集中营养信息。作为一个语义关联的数据源，快餐事实本体可以产生方法和工具来进一步研究和影响健康消费者的饮食和行为，这是许多严重的健康后果的一个因素。我们描述了这个本体的初步发展以及我们利用这个知识库计划的未来方向。

引用次数: 2

NECo: A node embedding algorithm for multiplex heterogeneous networks. NECo：多路异构网络的节点嵌入算法。

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Pub Date : 2020-12-01 Epub Date: 2021-01-13 DOI: 10.1109/bibm49941.2020.9313595

Cagatay Dursun, Jennifer R Smith, G Thomas Hayman, Anne E Kwitek, Serdar Bozdag

Complex diseases such as hypertension, cancer, and diabetes cause nearly 70% of the deaths in the U.S. and involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity and mortality from complex diseases is an important and challenging task. With the generation of an unprecedented amount of multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly node embeddings, the low-dimensional representations of nodes in a network are utilized for gene function prediction. Integrated network analysis of multi-omics data alleviates the issues related to missing data and lack of context-specific datasets. Most of the node embedding methods, however, are unable to integrate multiple types of datasets from genes and phenotypes. To address this limitation, we developed a node embedding algorithm called Node Embeddings of Complex networks (NECo) that can utilize multilayered heterogeneous networks of genes and phenotypes. We evaluated the performance of NECo using genotypic and phenotypic datasets from rat (Rattus norvegicus) disease models to classify hypertension disease-related genes. Our method significantly outperformed the state-of-the-art node embedding methods, with AUC of 94.97% compared 85.98% in the second-best performer, and predicted genes not previously implicated in hypertension.

在美国，高血压、癌症和糖尿病等复杂疾病导致近 70% 的死亡，涉及多个基因及其与环境因素的相互作用。因此，识别遗传因素以了解和降低复杂疾病的发病率和死亡率是一项重要而具有挑战性的任务。随着数量空前的多组学数据集的产生，基于网络的方法已成为表示多层复杂分子相互作用的流行方法。特别是节点嵌入，网络中节点的低维表示被用于基因功能预测。对多组学数据进行综合网络分析可以缓解数据缺失和缺乏特定背景数据集的问题。然而，大多数节点嵌入方法都无法整合来自基因和表型的多种类型数据集。为了解决这一局限性，我们开发了一种称为复杂网络节点嵌入（NECo）的节点嵌入算法，它可以利用基因和表型的多层异构网络。我们利用大鼠（Rattus norvegicus）疾病模型的基因型和表型数据集评估了 NECo 的性能，以对高血压疾病相关基因进行分类。我们的方法明显优于最先进的节点嵌入方法，AUC 为 94.97%，而第二名为 85.98%，并且预测了以前未涉及高血压的基因。

{"title":"NECo: A node embedding algorithm for multiplex heterogeneous networks.","authors":"Cagatay Dursun, Jennifer R Smith, G Thomas Hayman, Anne E Kwitek, Serdar Bozdag","doi":"10.1109/bibm49941.2020.9313595","DOIUrl":"10.1109/bibm49941.2020.9313595","url":null,"abstract":"Complex diseases such as hypertension, cancer, and diabetes cause nearly 70% of the deaths in the U.S. and involve multiple genes and their interactions with environmental factors. Therefore, identification of genetic factors to understand and decrease the morbidity and mortality from complex diseases is an important and challenging task. With the generation of an unprecedented amount of multi-omics datasets, network-based methods have become popular to represent the multilayered complex molecular interactions. Particularly node embeddings, the low-dimensional representations of nodes in a network are utilized for gene function prediction. Integrated network analysis of multi-omics data alleviates the issues related to missing data and lack of context-specific datasets. Most of the node embedding methods, however, are unable to integrate multiple types of datasets from genes and phenotypes. To address this limitation, we developed a node embedding algorithm called Node Embeddings of Complex networks (NECo) that can utilize multilayered heterogeneous networks of genes and phenotypes. We evaluated the performance of NECo using genotypic and phenotypic datasets from rat (Rattus norvegicus) disease models to classify hypertension disease-related genes. Our method significantly outperformed the state-of-the-art node embedding methods, with AUC of 94.97% compared 85.98% in the second-best performer, and predicted genes not previously implicated in hypertension.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2020 ","pages":"146-149"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8466723/pdf/nihms-1741786.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39468722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0