首页 > 最新文献

JMIR bioinformatics and biotechnology最新文献

英文 中文
Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization. 遗传性疾病患者全外显子组测序数据中单核苷酸变异的诊断:使用AI变异优先排序(预印本)
Pub Date : 2022-09-15 DOI: 10.2196/37701
Yu-Shan Huang, Ching Hsu, Yu-Chang Chune, I-Cheng Liao, Hsin Wang, Yi-Lin Lin, Wuh-Liang Hwu, Ni-Chung Lee, Feipei Lai

Background: In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines.

Objective: This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation.

Methods: We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model.

Results: We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108).

Conclusions: We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that

背景:近年来,由于新一代测序(NGS)技术的快速发展,整个人类基因组可以在短时间内完成测序。因此,NGS 技术正被广泛引入临床诊断实践,尤其是遗传性疾病的诊断。虽然单核苷酸变异(SNV)的外显子组数据可以通过这些方法生成,但处理患者的 DNA 序列数据需要多种工具和复杂的生物信息学管道:本研究旨在帮助医生在短时间内自动解读由 NGS 生成的遗传变异信息。目前,为了确定遗传病患者的真正病因变异,医生往往需要手动查看每个变异的众多特征,并在不同的数据库中搜索文献,以了解遗传变异的影响:我们构建了一个机器学习模型,用于预测外显子组数据中的致病变异。我们收集了来自全外显子组测序(WES)和基因面板的测序数据作为训练集,然后整合了来自多个遗传数据库的变异注释进行模型训练。建立的模型对 SNV 进行排序,并输出最可能的致病候选者。为了测试模型,我们收集了台大医院 108 位罕见遗传疾病患者的 WES 数据。我们将测序数据和通过关键字提取工具从患者电子病历中自动提取的表型信息应用到机器学习模型中:结果:在平均每人 741 个候选变异体中,我们成功找到了 92.5%(124/134)的致病变异体。人工智能变异体排序器能将约61.1%(66/108)的患者的目标基因排在前列,其次是变异体排序器,将44.4%(48/108)的患者的目标基因排在前列。累积排名结果显示,人工智能变体优先器在排名 1、5、10 和 20 时的准确率最高。这也表明,人工智能变体优先器比其他工具具有更好的性能。在通过查询数据库采用人类表型本体(HPO)术语后,前10名的排序率可提高到93.5%(101/108):我们成功地将 WES 的测序数据和关键词提取工具自动提取的患者疾病自由文本表型信息用于模型训练和测试。通过解释我们的模型,我们确定了哪些变异特征是重要的。此外,我们还在测试数据集中找到了目标变体,并取得了令人满意的结果。在通过查找数据库采用 HPO 术语后,排名前 10 的列表可增加到 93.5%(101/108)。该模型的性能与人工分析相似,并已用于帮助台湾大学医院进行基因诊断。
{"title":"Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization.","authors":"Yu-Shan Huang, Ching Hsu, Yu-Chang Chune, I-Cheng Liao, Hsin Wang, Yi-Lin Lin, Wuh-Liang Hwu, Ni-Chung Lee, Feipei Lai","doi":"10.2196/37701","DOIUrl":"10.2196/37701","url":null,"abstract":"<p><strong>Background: </strong>In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines.</p><p><strong>Objective: </strong>This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation.</p><p><strong>Methods: </strong>We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient's electronic medical records into our machine learning model.</p><p><strong>Results: </strong>We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108).</p><p><strong>Conclusions: </strong>We successfully applied sequencing data from WES and free-text phenotypic information of patient's disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e37701"},"PeriodicalIF":0.0,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11168239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45401615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Crowding in Emergency Departments With Early Prediction of Hospital Admission of Adult Patients Using Biomarkers Collected at Triage: Retrospective Cohort Study. 利用分诊时收集的生物标志物早期预测成年急诊科患者入院:减少美国急诊科拥挤的模型开发(预印本)
Pub Date : 2022-09-13 DOI: 10.2196/38845
Ann Corneille Monahan, Sue S Feldman, Tony P Fitzgerald

Background: Emergency department crowding continues to threaten patient safety and cause poor patient outcomes. Prior models designed to predict hospital admission have had biases. Predictive models that successfully estimate the probability of patient hospital admission would be useful in reducing or preventing emergency department "boarding" and hospital "exit block" and would reduce emergency department crowding by initiating earlier hospital admission and avoiding protracted bed procurement processes.

Objective: To develop a model to predict imminent adult patient hospital admission from the emergency department early in the patient visit by utilizing existing clinical descriptors (ie, patient biomarkers) that are routinely collected at triage and captured in the hospital's electronic medical records. Biomarkers are advantageous for modeling due to their early and routine collection at triage; instantaneous availability; standardized definition, measurement, and interpretation; and their freedom from the confines of patient histories (ie, they are not affected by inaccurate patient reports on medical history, unavailable reports, or delayed report retrieval).

Methods: This retrospective cohort study evaluated 1 year of consecutive data events among adult patients admitted to the emergency department and developed an algorithm that predicted which patients would require imminent hospital admission. Eight predictor variables were evaluated for their roles in the outcome of the patient emergency department visit. Logistic regression was used to model the study data.

Results: The 8-predictor model included the following biomarkers: age, systolic blood pressure, diastolic blood pressure, heart rate, respiration rate, temperature, gender, and acuity level. The model used these biomarkers to identify emergency department patients who required hospital admission. Our model performed well, with good agreement between observed and predicted admissions, indicating a well-fitting and well-calibrated model that showed good ability to discriminate between patients who would and would not be admitted.

Conclusions: This prediction model based on primary data identified emergency department patients with an increased risk of hospital admission. This actionable information can be used to improve patient care and hospital operations, especially by reducing emergency department crowding by looking ahead to predict which patients are likely to be admitted following triage, thereby providing needed information to initiate the complex admission and bed assignment processes much earlier in the care continuum.

背景:急诊科的拥挤状况继续威胁着患者的安全,并导致不良的患者预后。先前设计的入院预测模型存在偏差。成功估算出患者入院概率的预测模型将有助于减少或防止急诊科 "住院 "和医院 "出院阻塞",并通过提前入院和避免漫长的床位采购过程来减少急诊科拥挤现象:目的:利用现有的临床描述指标(即患者生物标志物),开发一种模型,预测急诊科成人患者在就诊初期即将入院的情况,这些临床描述指标在分诊时已常规收集并记录在医院的电子病历中。生物标志物在建模方面具有以下优势:分诊时的早期常规收集;即时可用性;标准化定义、测量和解释;不受患者病史的限制(即不受患者病史报告不准确、报告不可用或报告检索延迟的影响):这项回顾性队列研究评估了急诊科收治的成年患者一年来的连续数据事件,并开发了一种算法来预测哪些患者需要立即入院治疗。研究评估了八个预测变量在急诊科患者就诊结果中的作用。研究数据采用了逻辑回归模型:8 个预测模型包括以下生物标志物:年龄、收缩压、舒张压、心率、呼吸频率、体温、性别和严重程度。该模型利用这些生物标志物来识别需要入院的急诊科患者。我们的模型表现良好,观察到的入院人数与预测的入院人数之间有很好的一致性,这表明我们的模型拟合良好、校准准确,能够很好地区分需要入院和不需要入院的患者:结论:这一基于原始数据的预测模型可识别出入院风险较高的急诊科患者。这种可操作的信息可用于改善患者护理和医院运营,尤其是通过提前预测哪些患者可能在分诊后入院,从而减少急诊科的拥挤情况,从而为在护理过程中更早地启动复杂的入院和床位分配流程提供所需的信息。
{"title":"Reducing Crowding in Emergency Departments With Early Prediction of Hospital Admission of Adult Patients Using Biomarkers Collected at Triage: Retrospective Cohort Study.","authors":"Ann Corneille Monahan, Sue S Feldman, Tony P Fitzgerald","doi":"10.2196/38845","DOIUrl":"10.2196/38845","url":null,"abstract":"<p><strong>Background: </strong>Emergency department crowding continues to threaten patient safety and cause poor patient outcomes. Prior models designed to predict hospital admission have had biases. Predictive models that successfully estimate the probability of patient hospital admission would be useful in reducing or preventing emergency department \"boarding\" and hospital \"exit block\" and would reduce emergency department crowding by initiating earlier hospital admission and avoiding protracted bed procurement processes.</p><p><strong>Objective: </strong>To develop a model to predict imminent adult patient hospital admission from the emergency department early in the patient visit by utilizing existing clinical descriptors (ie, patient biomarkers) that are routinely collected at triage and captured in the hospital's electronic medical records. Biomarkers are advantageous for modeling due to their early and routine collection at triage; instantaneous availability; standardized definition, measurement, and interpretation; and their freedom from the confines of patient histories (ie, they are not affected by inaccurate patient reports on medical history, unavailable reports, or delayed report retrieval).</p><p><strong>Methods: </strong>This retrospective cohort study evaluated 1 year of consecutive data events among adult patients admitted to the emergency department and developed an algorithm that predicted which patients would require imminent hospital admission. Eight predictor variables were evaluated for their roles in the outcome of the patient emergency department visit. Logistic regression was used to model the study data.</p><p><strong>Results: </strong>The 8-predictor model included the following biomarkers: age, systolic blood pressure, diastolic blood pressure, heart rate, respiration rate, temperature, gender, and acuity level. The model used these biomarkers to identify emergency department patients who required hospital admission. Our model performed well, with good agreement between observed and predicted admissions, indicating a well-fitting and well-calibrated model that showed good ability to discriminate between patients who would and would not be admitted.</p><p><strong>Conclusions: </strong>This prediction model based on primary data identified emergency department patients with an increased risk of hospital admission. This actionable information can be used to improve patient care and hospital operations, especially by reducing emergency department crowding by looking ahead to predict which patients are likely to be admitted following triage, thereby providing needed information to initiate the complex admission and bed assignment processes much earlier in the care continuum.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e38845"},"PeriodicalIF":0.0,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135233/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48343850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seasonality of Hashimoto Thyroiditis: Infodemiology Study of Google Trends Data. 桥本甲状腺炎的季节性:谷歌趋势数据的信息学研究
Pub Date : 2022-09-01 DOI: 10.2196/38976
Robert Marcec, Josip Stjepanovic, Robert Likic

Background: Hashimoto thyroiditis (HT) is an autoimmune thyroid disease and the leading cause of hypothyroidism in areas with sufficient iodine intake. The quality-of-life impact and financial burden of hypothyroidism and HT highlight the need for additional research investigating the disease etiology with the aim of revealing potential modifiable risk factors.

Objective: Implementation of measures against such risk factors, once identified, has the potential to lessen the financial burden while also improving the quality of life of many individuals. Therefore, we aimed to examine the potential seasonality of HT in Europe using the Google Trends data to explore whether there is a seasonal characteristic of Google searches regarding HT, examine the potential impact of the countries' geographic location on the potential seasonality, and identify potential modifiable risk factors for HT, thereby inspiring future research on the topic.

Methods: Monthly Google Trends data on the search topic "Hashimoto thyroiditis" were retrieved in a 17-year time frame from January 2004 to December 2020 for 36 European countries. A cosinor model analysis was conducted to evaluate potential seasonality. Simple linear regression was used to estimate the potential effect of latitude and longitude on seasonal amplitude and phase of the model outputs.

Results: Of 36 included European countries, significant seasonality was observed in 30 (83%) countries. Most phase peaks occurred in spring (14/30, 46.7%) and winter (8/30, 26.7%). A statistically significant effect was observed regarding the effect of geographical latitude on cosinor model amplitude (y = -3.23 + 0.13 x; R2=0.29; P=.002). Seasonal increases in HT search volume may therefore be a consequence of an increased incidence or higher disease activity. It is particularly interesting that in most countries, a seasonal peak occurred in spring and winter months; when viewed in the context of the statistically significant impact of geographical latitude on seasonality amplitude, this may indicate the potential role of vitamin D levels in the seasonality of HT.

Conclusions: Significant seasonality of HT Google Trends search volume was observed in our study, with seasonal peaks in most countries occurring in spring and winter and with a significant impact of latitude on seasonality amplitude. Further studies on the topic of seasonality in HT and factors impacting it are required.

桥本甲状腺炎(HT)是一种自身免疫性甲状腺疾病,是碘摄入充足地区甲状腺功能减退的主要原因。甲状腺功能减退和HT对生活质量的影响和经济负担突出表明,需要对疾病病因进行进一步研究,以揭示潜在的可改变风险因素。一旦发现针对这些风险因素的措施,就有可能减轻经济负担,同时提高许多人的生活质量。因此,我们旨在使用谷歌趋势数据来研究欧洲HT的潜在季节性,以探索谷歌搜索是否存在HT的季节性特征,研究各国地理位置对潜在季节性的潜在影响,并确定HT的潜在可修改风险因素,从而启发未来对该主题的研究。在2004年1月至2020年12月的17年时间框架内,检索了36个欧洲国家关于搜索主题“桥本甲状腺炎”的谷歌月度趋势数据。进行了余弦模型分析以评估潜在的季节性。使用简单线性回归来估计纬度和经度对模型输出的季节振幅和相位的潜在影响。在36个被纳入的欧洲国家中,有30个(83%)国家观察到显著的季节性。大多数相位峰值出现在春季(14/30,46.7%)和冬季(8/30,26.7%)。地理纬度对余弦模型振幅的影响具有统计学意义(y=–3.23+0.13 x;R2=0.29;P=0.002)。因此,HT搜索量的季节性增加可能是发病率增加或疾病活动增加的结果。特别有趣的是,在大多数国家,季节性高峰出现在春季和冬季;从地理纬度对季节性振幅的统计显著影响来看,这可能表明维生素D水平在HT季节性中的潜在作用。在我们的研究中观察到HT谷歌趋势搜索量的显著季节性,大多数国家的季节性峰值出现在春季和冬季,纬度对季节性振幅有显著影响。需要进一步研究耐高温的季节性及其影响因素。
{"title":"Seasonality of Hashimoto Thyroiditis: Infodemiology Study of Google Trends Data.","authors":"Robert Marcec, Josip Stjepanovic, Robert Likic","doi":"10.2196/38976","DOIUrl":"10.2196/38976","url":null,"abstract":"<p><strong>Background: </strong>Hashimoto thyroiditis (HT) is an autoimmune thyroid disease and the leading cause of hypothyroidism in areas with sufficient iodine intake. The quality-of-life impact and financial burden of hypothyroidism and HT highlight the need for additional research investigating the disease etiology with the aim of revealing potential modifiable risk factors.</p><p><strong>Objective: </strong>Implementation of measures against such risk factors, once identified, has the potential to lessen the financial burden while also improving the quality of life of many individuals. Therefore, we aimed to examine the potential seasonality of HT in Europe using the Google Trends data to explore whether there is a seasonal characteristic of Google searches regarding HT, examine the potential impact of the countries' geographic location on the potential seasonality, and identify potential modifiable risk factors for HT, thereby inspiring future research on the topic.</p><p><strong>Methods: </strong>Monthly Google Trends data on the search topic \"Hashimoto thyroiditis\" were retrieved in a 17-year time frame from January 2004 to December 2020 for 36 European countries. A cosinor model analysis was conducted to evaluate potential seasonality. Simple linear regression was used to estimate the potential effect of latitude and longitude on seasonal amplitude and phase of the model outputs.</p><p><strong>Results: </strong>Of 36 included European countries, significant seasonality was observed in 30 (83%) countries. Most phase peaks occurred in spring (14/30, 46.7%) and winter (8/30, 26.7%). A statistically significant effect was observed regarding the effect of geographical latitude on cosinor model amplitude (y = -3.23 + 0.13 x; R<sup>2</sup>=0.29; P=.002). Seasonal increases in HT search volume may therefore be a consequence of an increased incidence or higher disease activity. It is particularly interesting that in most countries, a seasonal peak occurred in spring and winter months; when viewed in the context of the statistically significant impact of geographical latitude on seasonality amplitude, this may indicate the potential role of vitamin D levels in the seasonality of HT.</p><p><strong>Conclusions: </strong>Significant seasonality of HT Google Trends search volume was observed in our study, with seasonal peaks in most countries occurring in spring and winter and with a significant impact of latitude on seasonality amplitude. Further studies on the topic of seasonality in HT and factors impacting it are required.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e38976"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49510453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Application of Machine Learning in Predicting Mortality Risk in Patients With Severe Femoral Neck Fractures: Prediction Model Development Study. 机器学习在预测严重股骨颈骨折患者死亡风险中的应用:预测模型开发研究(预印本)
Pub Date : 2022-08-19 DOI: 10.2196/38226
Lingxiao Xu, Jun Liu, Chunxia Han, Zisheng Ai

Background: Femoral neck fracture (FNF) accounts for approximately 3.58% of all fractures in the entire body, exhibiting an increasing trend each year. According to a survey, in 1990, the total number of hip fractures in men and women worldwide was approximately 338,000 and 917,000, respectively. In China, FNFs account for 48.22% of hip fractures. Currently, many studies have been conducted on postdischarge mortality and mortality risk in patients with FNF. However, there have been no definitive studies on in-hospital mortality or its influencing factors in patients with severe FNF admitted to the intensive care unit.

Objective: In this paper, 3 machine learning methods were used to construct a nosocomial death prediction model for patients admitted to intensive care units to assist clinicians in early clinical decision-making.

Methods: A retrospective analysis was conducted using information of a patient with FNF from the Medical Information Mart for Intensive Care III. After balancing the data set using the Synthetic Minority Oversampling Technique algorithm, patients were randomly separated into a 70% training set and a 30% testing set for the development and validation, respectively, of the prediction model. Random forest, extreme gradient boosting, and backpropagation neural network prediction models were constructed with nosocomial death as the outcome. Model performance was assessed using the area under the receiver operating characteristic curve, accuracy, precision, sensitivity, and specificity. The predictive value of the models was verified in comparison to the traditional logistic model.

Results: A total of 366 patients with FNFs were selected, including 48 cases (13.1%) of in-hospital death. Data from 636 patients were obtained by balancing the data set with the in-hospital death group to survival group as 1:1. The 3 machine learning models exhibited high predictive accuracy, and the area under the receiver operating characteristic curve of the random forest, extreme gradient boosting, and backpropagation neural network were 0.98, 0.97, and 0.95, respectively, all with higher predictive performance than the traditional logistic regression model. Ranking the importance of the feature variables, the top 10 feature variables that were meaningful for predicting the risk of in-hospital death of patients were the Simplified Acute Physiology Score II, lactate, creatinine, gender, vitamin D, calcium, creatine kinase, creatine kinase isoenzyme, white blood cell, and age.

Conclusions: Death risk assessment models constructed using machine learning have positive significance for predicting the in-hospital mortality of patients with severe disease and provide a valid basis for reducing in-hospital mortality and improving patient prognosis.

背景:股骨颈骨折(FNF)约占全身骨折总数的 3.58%,并呈逐年上升趋势。一项调查显示,1990 年,全球男性和女性髋部骨折的总人数分别约为 33.8 万和 91.7 万。在中国,FNF 占髋部骨折的 48.22%。目前,已有许多关于 FNF 患者出院后死亡率和死亡风险的研究。然而,对于重症监护室收治的严重髋部骨折患者的院内死亡率及其影响因素,目前还没有确切的研究:本文使用 3 种机器学习方法构建了重症监护病房住院患者的非医院死亡预测模型,以协助临床医生进行早期临床决策:方法:使用重症监护医学信息库 III 中的 FNF 患者信息进行回顾性分析。使用合成少数群体过度取样技术算法平衡数据集后,将患者随机分为 70% 的训练集和 30% 的测试集,分别用于开发和验证预测模型。随机森林、极端梯度提升和反向传播神经网络预测模型均以非处方性死亡为结果。使用接收者操作特征曲线下面积、准确度、精确度、灵敏度和特异性评估了模型的性能。与传统的逻辑模型相比,这些模型的预测价值得到了验证:结果:共选取了 366 例 FNF 患者,其中包括 48 例(13.1%)院内死亡病例。通过平衡数据集,获得了 636 例患者的数据,其中院内死亡组与生存组的比例为 1:1。3种机器学习模型均表现出较高的预测准确性,随机森林、极梯度提升和反向传播神经网络的接收操作特征曲线下面积分别为0.98、0.97和0.95,预测性能均高于传统的逻辑回归模型。对特征变量的重要性进行排序,对预测患者院内死亡风险有意义的前10个特征变量分别是简化急性生理学评分II、乳酸、肌酐、性别、维生素D、钙、肌酸激酶、肌酸激酶同工酶、白细胞和年龄:利用机器学习构建的死亡风险评估模型对预测重症患者的院内死亡率具有积极意义,为降低院内死亡率和改善患者预后提供了有效依据。
{"title":"The Application of Machine Learning in Predicting Mortality Risk in Patients With Severe Femoral Neck Fractures: Prediction Model Development Study.","authors":"Lingxiao Xu, Jun Liu, Chunxia Han, Zisheng Ai","doi":"10.2196/38226","DOIUrl":"10.2196/38226","url":null,"abstract":"<p><strong>Background: </strong>Femoral neck fracture (FNF) accounts for approximately 3.58% of all fractures in the entire body, exhibiting an increasing trend each year. According to a survey, in 1990, the total number of hip fractures in men and women worldwide was approximately 338,000 and 917,000, respectively. In China, FNFs account for 48.22% of hip fractures. Currently, many studies have been conducted on postdischarge mortality and mortality risk in patients with FNF. However, there have been no definitive studies on in-hospital mortality or its influencing factors in patients with severe FNF admitted to the intensive care unit.</p><p><strong>Objective: </strong>In this paper, 3 machine learning methods were used to construct a nosocomial death prediction model for patients admitted to intensive care units to assist clinicians in early clinical decision-making.</p><p><strong>Methods: </strong>A retrospective analysis was conducted using information of a patient with FNF from the Medical Information Mart for Intensive Care III. After balancing the data set using the Synthetic Minority Oversampling Technique algorithm, patients were randomly separated into a 70% training set and a 30% testing set for the development and validation, respectively, of the prediction model. Random forest, extreme gradient boosting, and backpropagation neural network prediction models were constructed with nosocomial death as the outcome. Model performance was assessed using the area under the receiver operating characteristic curve, accuracy, precision, sensitivity, and specificity. The predictive value of the models was verified in comparison to the traditional logistic model.</p><p><strong>Results: </strong>A total of 366 patients with FNFs were selected, including 48 cases (13.1%) of in-hospital death. Data from 636 patients were obtained by balancing the data set with the in-hospital death group to survival group as 1:1. The 3 machine learning models exhibited high predictive accuracy, and the area under the receiver operating characteristic curve of the random forest, extreme gradient boosting, and backpropagation neural network were 0.98, 0.97, and 0.95, respectively, all with higher predictive performance than the traditional logistic regression model. Ranking the importance of the feature variables, the top 10 feature variables that were meaningful for predicting the risk of in-hospital death of patients were the Simplified Acute Physiology Score II, lactate, creatinine, gender, vitamin D, calcium, creatine kinase, creatine kinase isoenzyme, white blood cell, and age.</p><p><strong>Conclusions: </strong>Death risk assessment models constructed using machine learning have positive significance for predicting the in-hospital mortality of patients with severe disease and provide a valid basis for reducing in-hospital mortality and improving patient prognosis.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"1 1","pages":"e38226"},"PeriodicalIF":0.0,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135225/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42491600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monitoring Physical Behavior in Rehabilitation Using a Machine Learning-Based Algorithm for Thigh-Mounted Accelerometers: Development and Validation Study. 监测康复中的身体行为:基于机器学习的大腿加速度计算法的开发和验证研究(预印本)
Pub Date : 2022-07-26 DOI: 10.2196/38512
Frederik Skovbjerg, Helene Honoré, Inger Mechlenburg, Matthijs Lipperts, Rikke Gade, Erhard Trillingsgaard Næss-Schmidt

Background: Physical activity is emerging as an outcome measure. Accelerometers have become an important tool in monitoring physical behavior, and newer analytical approaches of recognition methods increase the degree of details. Many studies have achieved high performance in the classification of physical behaviors through the use of multiple wearable sensors; however, multiple wearables can be impractical and lower compliance.

Objective: The aim of this study was to develop and validate an algorithm for classifying several daily physical behaviors using a single thigh-mounted accelerometer and a supervised machine-learning scheme.

Methods: We collected training data by adding the behavior classes-running, cycling, stair climbing, wheelchair ambulation, and vehicle driving-to an existing algorithm with the classes of sitting, lying, standing, walking, and transitioning. After combining the training data, we used a random forest learning scheme for model development. We validated the algorithm through a simulated free-living procedure using chest-mounted cameras for establishing the ground truth. Furthermore, we adjusted our algorithm and compared the performance with an existing algorithm based on vector thresholds.

Results: We developed an algorithm to classify 11 physical behaviors relevant for rehabilitation. In the simulated free-living validation, the performance of the algorithm decreased to 57% as an average for the 11 classes (F-measure). After merging classes into sedentary behavior, standing, walking, running, and cycling, the result revealed high performance in comparison to both the ground truth and the existing algorithm.

Conclusions: Using a single thigh-mounted accelerometer, we obtained high classification levels within specific behaviors. The behaviors classified with high levels of performance mostly occur in populations with higher levels of functioning. Further development should aim at describing behaviors within populations with lower levels of functioning.

背景:体力活动正逐渐成为一种衡量结果的指标。加速度计已成为监测身体行为的重要工具,较新的识别分析方法增加了细节程度。许多研究通过使用多个可穿戴传感器实现了高性能的身体行为分类;然而,多个可穿戴设备可能并不实用,而且会降低依从性:本研究旨在开发并验证一种算法,利用单个大腿安装的加速度计和有监督的机器学习方案对几种日常身体行为进行分类:我们收集了训练数据,将跑步、骑自行车、爬楼梯、坐轮椅和驾驶汽车等行为类别添加到现有的坐姿、躺姿、站姿、行走和转换类别算法中。合并训练数据后,我们使用随机森林学习方案进行模型开发。我们通过模拟自由生活过程,使用胸前安装的摄像头建立地面实况,对算法进行了验证。此外,我们还调整了算法,并将其性能与基于向量阈值的现有算法进行了比较:我们开发了一种算法,用于对与康复相关的 11 种身体行为进行分类。在模拟自由生活验证中,该算法的性能在 11 个类别中平均下降了 57%(F-measure)。在将类别合并为久坐行为、站立、行走、跑步和骑自行车后,结果显示,与地面实况和现有算法相比,该算法具有很高的性能:结论:使用单个安装在大腿上的加速度计,我们在特定行为中获得了较高的分类水平。性能高的分类行为大多出现在功能水平较高的人群中。进一步发展的目标应该是描述功能水平较低人群的行为。
{"title":"Monitoring Physical Behavior in Rehabilitation Using a Machine Learning-Based Algorithm for Thigh-Mounted Accelerometers: Development and Validation Study.","authors":"Frederik Skovbjerg, Helene Honoré, Inger Mechlenburg, Matthijs Lipperts, Rikke Gade, Erhard Trillingsgaard Næss-Schmidt","doi":"10.2196/38512","DOIUrl":"10.2196/38512","url":null,"abstract":"<p><strong>Background: </strong>Physical activity is emerging as an outcome measure. Accelerometers have become an important tool in monitoring physical behavior, and newer analytical approaches of recognition methods increase the degree of details. Many studies have achieved high performance in the classification of physical behaviors through the use of multiple wearable sensors; however, multiple wearables can be impractical and lower compliance.</p><p><strong>Objective: </strong>The aim of this study was to develop and validate an algorithm for classifying several daily physical behaviors using a single thigh-mounted accelerometer and a supervised machine-learning scheme.</p><p><strong>Methods: </strong>We collected training data by adding the behavior classes-running, cycling, stair climbing, wheelchair ambulation, and vehicle driving-to an existing algorithm with the classes of sitting, lying, standing, walking, and transitioning. After combining the training data, we used a random forest learning scheme for model development. We validated the algorithm through a simulated free-living procedure using chest-mounted cameras for establishing the ground truth. Furthermore, we adjusted our algorithm and compared the performance with an existing algorithm based on vector thresholds.</p><p><strong>Results: </strong>We developed an algorithm to classify 11 physical behaviors relevant for rehabilitation. In the simulated free-living validation, the performance of the algorithm decreased to 57% as an average for the 11 classes (F-measure). After merging classes into sedentary behavior, standing, walking, running, and cycling, the result revealed high performance in comparison to both the ground truth and the existing algorithm.</p><p><strong>Conclusions: </strong>Using a single thigh-mounted accelerometer, we obtained high classification levels within specific behaviors. The behaviors classified with high levels of performance mostly occur in populations with higher levels of functioning. Further development should aim at describing behaviors within populations with lower levels of functioning.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e38512"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135216/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44711973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of a Multiepitope Vaccine Against SARS-CoV-2: Immunoinformatics Study. SARS-CoV-2多表位疫苗的研制:免疫信息学研究
Pub Date : 2022-07-19 eCollection Date: 2022-01-01 DOI: 10.2196/36100
Fatemeh Ghafouri, Reza Ahangari Cohan, Hilda Samimi, Ali Hosseini Rad S M, Mahmood Naderi, Farshid Noorbakhsh, Vahid Haghpanah
Background Since the first appearance of SARS-CoV-2 in China in December 2019, the world witnessed the emergence of the SARS-CoV-2 outbreak. Due to the high transmissibility rate of the virus, there is an urgent need to design and develop vaccines against SARS-CoV-2 to prevent more cases affected by the virus. Objective A computational approach is proposed for vaccine design against the SARS-CoV-2 spike (S) protein, as the key target for neutralizing antibodies, and envelope (E) protein, which contains a conserved sequence feature. Methods We used previously reported epitopes of S protein detected experimentally and further identified a collection of predicted B-cell and major histocompatibility (MHC) class II–restricted T-cell epitopes derived from E proteins with an identical match to SARS-CoV-2 E protein. Results The in silico design of our candidate vaccine against the S and E proteins of SARS-CoV-2 demonstrated a high affinity to MHC class II molecules and effective results in immune response simulations. Conclusions Based on the results of this study, the multiepitope vaccine designed against the S and E proteins of SARS-CoV-2 may be considered as a new, safe, and efficient approach to combatting the COVID-19 pandemic.
背景:自2019年12月中国首次出现SARS-CoV-2以来,世界见证了SARS-CoV-2疫情的出现。由于该病毒的高传播率,迫切需要设计和开发针对SARS-CoV-2的疫苗,以防止更多的病例感染该病毒。目的:提出了一种针对SARS-CoV-2刺突蛋白(S)和包膜蛋白(E)的疫苗设计计算方法,S蛋白是中和抗体的关键靶点,E蛋白具有保守序列特征。方法:利用先前报道的实验检测到的S蛋白表位,进一步鉴定了来自E蛋白的预测b细胞和主要组织相容性(MHC) ii类限制性t细胞表位,这些表位与SARS-CoV-2 E蛋白完全匹配。结果:我们设计的SARS-CoV-2 S和E蛋白候选疫苗与MHC II类分子具有高亲和力,在免疫反应模拟中取得了有效的结果。结论:基于本研究结果,针对SARS-CoV-2的S和E蛋白设计的多表位疫苗可能被认为是一种新的、安全的、有效的对抗COVID-19大流行的方法。
{"title":"Development of a Multiepitope Vaccine Against SARS-CoV-2: Immunoinformatics Study.","authors":"Fatemeh Ghafouri,&nbsp;Reza Ahangari Cohan,&nbsp;Hilda Samimi,&nbsp;Ali Hosseini Rad S M,&nbsp;Mahmood Naderi,&nbsp;Farshid Noorbakhsh,&nbsp;Vahid Haghpanah","doi":"10.2196/36100","DOIUrl":"https://doi.org/10.2196/36100","url":null,"abstract":"Background Since the first appearance of SARS-CoV-2 in China in December 2019, the world witnessed the emergence of the SARS-CoV-2 outbreak. Due to the high transmissibility rate of the virus, there is an urgent need to design and develop vaccines against SARS-CoV-2 to prevent more cases affected by the virus. Objective A computational approach is proposed for vaccine design against the SARS-CoV-2 spike (S) protein, as the key target for neutralizing antibodies, and envelope (E) protein, which contains a conserved sequence feature. Methods We used previously reported epitopes of S protein detected experimentally and further identified a collection of predicted B-cell and major histocompatibility (MHC) class II–restricted T-cell epitopes derived from E proteins with an identical match to SARS-CoV-2 E protein. Results The in silico design of our candidate vaccine against the S and E proteins of SARS-CoV-2 demonstrated a high affinity to MHC class II molecules and effective results in immune response simulations. Conclusions Based on the results of this study, the multiepitope vaccine designed against the S and E proteins of SARS-CoV-2 may be considered as a new, safe, and efficient approach to combatting the COVID-19 pandemic.","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e36100"},"PeriodicalIF":0.0,"publicationDate":"2022-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40657989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Digital Phenotyping in Health Using Machine Learning Approaches: Scoping Review. 使用机器学习方法的健康数字表型:范围审查(预印本)
Pub Date : 2022-07-18 DOI: 10.2196/39618
Schenelle Dayna Dlima, Santosh Shevade, Sonia Rebecca Menezes, Aakash Ganju

Background: Digital phenotyping is the real-time collection of individual-level active and passive data from users in naturalistic and free-living settings via personal digital devices, such as mobile phones and wearable devices. Given the novelty of research in this field, there is heterogeneity in the clinical use cases, types of data collected, modes of data collection, data analysis methods, and outcomes measured.

Objective: The primary aim of this scoping review was to map the published research on digital phenotyping and to outline study characteristics, data collection and analysis methods, machine learning approaches, and future implications.

Methods: We utilized an a priori approach for the literature search and data extraction and charting process, guided by the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews). We identified relevant studies published in 2020, 2021, and 2022 on PubMed and Google Scholar using search terms related to digital phenotyping. The titles, abstracts, and keywords were screened during the first stage of the screening process, and the second stage involved screening the full texts of the shortlisted articles. We extracted and charted the descriptive characteristics of the final studies, which were countries of origin, study design, clinical areas, active and/or passive data collected, modes of data collection, data analysis approaches, and limitations.

Results: A total of 454 articles on PubMed and Google Scholar were identified through search terms associated with digital phenotyping, and 46 articles were deemed eligible for inclusion in this scoping review. Most studies evaluated wearable data and originated from North America. The most dominant study design was observational, followed by randomized trials, and most studies focused on psychiatric disorders, mental health disorders, and neurological diseases. A total of 7 studies used machine learning approaches for data analysis, with random forest, logistic regression, and support vector machines being the most common.

Conclusions: Our review provides foundational as well as application-oriented approaches toward digital phenotyping in health. Future work should focus on more prospective, longitudinal studies that include larger data sets from diverse populations, address privacy and ethical concerns around data collection from consumer technologies, and build "digital phenotypes" to personalize digital health interventions and treatment plans.

背景:数字表型是指通过个人数字设备(如手机和可穿戴设备),在自然和自由生活环境中实时收集用户的个人主动和被动数据。由于该领域的研究较为新颖,因此在临床用例、收集的数据类型、数据收集模式、数据分析方法和测量结果等方面存在差异:本次范围界定综述的主要目的是绘制已发表的数字表型研究图,并概述研究特点、数据收集和分析方法、机器学习方法以及未来影响:我们以 PRISMA-ScR(系统综述和荟萃分析的首选报告项目扩展,用于范围界定综述)为指导,采用先验方法进行文献检索、数据提取和图表绘制。我们使用与数字表型相关的检索词,在 PubMed 和 Google Scholar 上确定了 2020、2021 和 2022 年发表的相关研究。在筛选过程的第一阶段筛选了标题、摘要和关键词,第二阶段筛选了入围文章的全文。我们提取并绘制了最终研究的描述性特征,包括来源国、研究设计、临床领域、收集的主动和/或被动数据、数据收集模式、数据分析方法和局限性:通过与数字表型相关的搜索词,在 PubMed 和 Google Scholar 上共找到 454 篇文章,其中 46 篇文章被认为符合纳入本次范围界定综述的条件。大多数研究都对可穿戴数据进行了评估,并且都来自北美。最主要的研究设计是观察性研究,其次是随机试验,大多数研究集中于精神疾病、心理健康疾病和神经系统疾病。共有 7 项研究使用了机器学习方法进行数据分析,其中最常见的是随机森林、逻辑回归和支持向量机:我们的综述为健康领域的数字表型研究提供了基础性和面向应用的方法。未来的工作应侧重于更多的前瞻性纵向研究,包括来自不同人群的更大数据集,解决与消费技术数据收集有关的隐私和伦理问题,并建立 "数字表型",以个性化数字健康干预和治疗计划。
{"title":"Digital Phenotyping in Health Using Machine Learning Approaches: Scoping Review.","authors":"Schenelle Dayna Dlima, Santosh Shevade, Sonia Rebecca Menezes, Aakash Ganju","doi":"10.2196/39618","DOIUrl":"10.2196/39618","url":null,"abstract":"<p><strong>Background: </strong>Digital phenotyping is the real-time collection of individual-level active and passive data from users in naturalistic and free-living settings via personal digital devices, such as mobile phones and wearable devices. Given the novelty of research in this field, there is heterogeneity in the clinical use cases, types of data collected, modes of data collection, data analysis methods, and outcomes measured.</p><p><strong>Objective: </strong>The primary aim of this scoping review was to map the published research on digital phenotyping and to outline study characteristics, data collection and analysis methods, machine learning approaches, and future implications.</p><p><strong>Methods: </strong>We utilized an a priori approach for the literature search and data extraction and charting process, guided by the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews). We identified relevant studies published in 2020, 2021, and 2022 on PubMed and Google Scholar using search terms related to digital phenotyping. The titles, abstracts, and keywords were screened during the first stage of the screening process, and the second stage involved screening the full texts of the shortlisted articles. We extracted and charted the descriptive characteristics of the final studies, which were countries of origin, study design, clinical areas, active and/or passive data collected, modes of data collection, data analysis approaches, and limitations.</p><p><strong>Results: </strong>A total of 454 articles on PubMed and Google Scholar were identified through search terms associated with digital phenotyping, and 46 articles were deemed eligible for inclusion in this scoping review. Most studies evaluated wearable data and originated from North America. The most dominant study design was observational, followed by randomized trials, and most studies focused on psychiatric disorders, mental health disorders, and neurological diseases. A total of 7 studies used machine learning approaches for data analysis, with random forest, logistic regression, and support vector machines being the most common.</p><p><strong>Conclusions: </strong>Our review provides foundational as well as application-oriented approaches toward digital phenotyping in health. Future work should focus on more prospective, longitudinal studies that include larger data sets from diverse populations, address privacy and ethical concerns around data collection from consumer technologies, and build \"digital phenotypes\" to personalize digital health interventions and treatment plans.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e39618"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48140965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Analysis of Different Distance-Linkage Methods for Clustering Gene Expression Data and Observing Pleiotropy: Empirical Study. 不同距离连锁方法对基因表达数据聚类和观察两性性的分析:实证研究
Pub Date : 2022-06-17 DOI: 10.2196/30890
Joydhriti Choudhury, Faisal Bin Ashraf

Background: Large amounts of biological data have been generated over the last few decades, encouraging scientists to look for connections between genes that cause various diseases. Clustering illustrates such a relationship between numerous species and genes. Finding an appropriate distance-linkage metric to construct clusters from diverse biological data sets has thus become critical. Pleiotropy is also important for a gene's expression to vary and create varied consequences in living things. Finding the pleiotropy of genes responsible for various diseases has become a major research challenge.

Objective: Our goal was to establish the optimal distance-linkage strategy for creating reliable clusters from diverse data sets and identifying the common genes that cause various tumors to observe genes with pleiotropic effect.

Methods: We considered 4 linking methods-single, complete, average, and ward-and 3 distance metrics-Euclidean, maximum, and Manhattan distance. For assessing the quality of different sets of clusters, we used a fitness function that combines silhouette width and within-cluster distance.

Results: According to our findings, the maximum distance measure produces the highest-quality clusters. Moreover, for medium data set, the average linkage method, and for large data set, the ward linkage method works best. The outcome is not improved by using ensemble clustering. We also discovered genes that cause 3 different cancers and used gene enrichment to confirm our findings.

Conclusions: Accuracy is crucial in clustering, and we investigated the accuracy of numerous clustering techniques in our research. Other studies may aid related works if the data set is similar to ours.

在过去的几十年里,已经产生了大量的生物学数据,鼓励科学家寻找导致各种疾病的基因之间的联系。聚类说明了许多物种和基因之间的这种关系。因此,从不同的生物数据集中找到合适的距离链接度量来构建聚类变得至关重要。多细胞性对于基因表达的变化和在生物中产生各种后果也很重要。寻找导致各种疾病的基因的多效性已成为一项重大的研究挑战。我们的目标是建立最佳的距离连锁策略,从不同的数据集创建可靠的聚类,并确定导致各种肿瘤的常见基因,以观察具有多效性效应的基因。我们考虑了4种连接方法——单一、完全、平均和ward——以及3种距离度量——欧几里得、最大和曼哈顿距离。为了评估不同聚类集的质量,我们使用了一个结合轮廓宽度和聚类内距离的适应度函数。根据我们的发现,最大距离测量产生最高质量的聚类。此外,对于中等数据集,平均联动法效果最好,而对于大数据集,病房联动法效果最佳。使用集合聚类并不能改善结果。我们还发现了导致3种不同癌症的基因,并利用基因富集来证实我们的发现。准确性在聚类中至关重要,我们在研究中研究了许多聚类技术的准确性。如果数据集与我们的数据集相似,其他研究可能有助于相关工作。
{"title":"An Analysis of Different Distance-Linkage Methods for Clustering Gene Expression Data and Observing Pleiotropy: Empirical Study.","authors":"Joydhriti Choudhury, Faisal Bin Ashraf","doi":"10.2196/30890","DOIUrl":"10.2196/30890","url":null,"abstract":"<p><strong>Background: </strong>Large amounts of biological data have been generated over the last few decades, encouraging scientists to look for connections between genes that cause various diseases. Clustering illustrates such a relationship between numerous species and genes. Finding an appropriate distance-linkage metric to construct clusters from diverse biological data sets has thus become critical. Pleiotropy is also important for a gene's expression to vary and create varied consequences in living things. Finding the pleiotropy of genes responsible for various diseases has become a major research challenge.</p><p><strong>Objective: </strong>Our goal was to establish the optimal distance-linkage strategy for creating reliable clusters from diverse data sets and identifying the common genes that cause various tumors to observe genes with pleiotropic effect.</p><p><strong>Methods: </strong>We considered 4 linking methods-single, complete, average, and ward-and 3 distance metrics-Euclidean, maximum, and Manhattan distance. For assessing the quality of different sets of clusters, we used a fitness function that combines silhouette width and within-cluster distance.</p><p><strong>Results: </strong>According to our findings, the maximum distance measure produces the highest-quality clusters. Moreover, for medium data set, the average linkage method, and for large data set, the ward linkage method works best. The outcome is not improved by using ensemble clustering. We also discovered genes that cause 3 different cancers and used gene enrichment to confirm our findings.</p><p><strong>Conclusions: </strong>Accuracy is crucial in clustering, and we investigated the accuracy of numerous clustering techniques in our research. Other studies may aid related works if the data set is similar to ours.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e30890"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49517943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential Expression of Long Noncoding RNAs in Murine Myoblasts After Short Hairpin RNA-Mediated Dysferlin Silencing In Vitro: Microarray Profiling. shRNA介导体外dysferlin沉默后小鼠成肌细胞中长非编码RNA的差异表达(预印本)
Pub Date : 2022-06-17 DOI: 10.2196/33186
Richa Singhal, Rachel Lukose, Gwenyth Carr, Afsoon Moktar, Ana Lucia Gonzales-Urday, Eric C Rouchka, Bathri N Vajravelu

Background: Long noncoding RNAs (lncRNAs) are noncoding RNA transcripts greater than 200 nucleotides in length and are known to play a role in regulating the transcription of genes involved in vital cellular functions. We hypothesized the disease process in dysferlinopathy is linked to an aberrant expression of lncRNAs and messenger RNAs (mRNAs).

Objective: In this study, we compared the lncRNA and mRNA expression profiles between wild-type and dysferlin-deficient murine myoblasts (C2C12 cells).

Methods: LncRNA and mRNA expression profiling were performed using a microarray. Several lncRNAs with differential expression were validated using quantitative real-time polymerase chain reaction. Gene Ontology (GO) analysis was performed to understand the functional role of the differentially expressed mRNAs. Further bioinformatics analysis was used to explore the potential function, lncRNA-mRNA correlation, and potential targets of the differentially expressed lncRNAs.

Results: We found 3195 lncRNAs and 1966 mRNAs that were differentially expressed. The chromosomal distribution of the differentially expressed lncRNAs and mRNAs was unequal, with chromosome 2 having the highest number of lncRNAs and chromosome 7 having the highest number of mRNAs that were differentially expressed. Pathway analysis of the differentially expressed genes indicated the involvement of several signaling pathways including PI3K-Akt, Hippo, and pathways regulating the pluripotency of stem cells. The differentially expressed genes were also enriched for the GO terms, developmental process and muscle system process. Network analysis identified 8 statistically significant (P<.05) network objects from the upregulated lncRNAs and 3 statistically significant network objects from the downregulated lncRNAs.

Conclusions: Our results thus far imply that dysferlinopathy is associated with an aberrant expression of multiple lncRNAs, many of which may have a specific function in the disease process. GO terms and network analysis suggest a muscle-specific role for these lncRNAs. To elucidate the specific roles of these abnormally expressed noncoding RNAs, further studies engineering their expression are required.

背景:长非编码RNA(lncRNA)是长度超过200个核苷酸的非编码RNA转录本,已知在调节参与重要细胞功能的基因转录中发挥作用。我们推测铁蛋白异常蛋白病的发病过程与lncRNAs和信使RNAs(mRNAs)的异常表达有关:在这项研究中,我们比较了野生型和dysferlin缺陷型小鼠成肌细胞(C2C12细胞)的lncRNA和mRNA表达谱:方法:使用芯片对LncRNA和mRNA表达谱进行分析。使用定量实时聚合酶链反应验证了几种具有差异表达的 lncRNA。为了解差异表达的 mRNA 的功能作用,进行了基因本体(GO)分析。进一步的生物信息学分析用于探索差异表达的lncRNA的潜在功能、lncRNA-mRNA相关性和潜在靶标:结果:我们发现了3195个差异表达的lncRNA和1966个差异表达的mRNA。差异表达的lncRNA和mRNA在染色体上的分布不均,其中2号染色体上的lncRNA数量最多,7号染色体上差异表达的mRNA数量最多。对差异表达基因的通路分析表明,包括PI3K-Akt、Hippo和调节干细胞多能性的通路在内的多个信号通路参与了差异表达基因的表达。差异表达基因还富集于GO术语、发育过程和肌肉系统过程。网络分析发现了 8 个具有统计学意义的(PConclusions:迄今为止,我们的研究结果表明,铁蛋白沉积症与多种lncRNA的异常表达有关,其中许多lncRNA在疾病过程中可能具有特定功能。GO术语和网络分析表明,这些lncRNA具有肌肉特异性作用。要阐明这些异常表达的非编码 RNA 的特定作用,还需要对其表达工程进行进一步研究。
{"title":"Differential Expression of Long Noncoding RNAs in Murine Myoblasts After Short Hairpin RNA-Mediated Dysferlin Silencing In Vitro: Microarray Profiling.","authors":"Richa Singhal, Rachel Lukose, Gwenyth Carr, Afsoon Moktar, Ana Lucia Gonzales-Urday, Eric C Rouchka, Bathri N Vajravelu","doi":"10.2196/33186","DOIUrl":"10.2196/33186","url":null,"abstract":"<p><strong>Background: </strong>Long noncoding RNAs (lncRNAs) are noncoding RNA transcripts greater than 200 nucleotides in length and are known to play a role in regulating the transcription of genes involved in vital cellular functions. We hypothesized the disease process in dysferlinopathy is linked to an aberrant expression of lncRNAs and messenger RNAs (mRNAs).</p><p><strong>Objective: </strong>In this study, we compared the lncRNA and mRNA expression profiles between wild-type and dysferlin-deficient murine myoblasts (C2C12 cells).</p><p><strong>Methods: </strong>LncRNA and mRNA expression profiling were performed using a microarray. Several lncRNAs with differential expression were validated using quantitative real-time polymerase chain reaction. Gene Ontology (GO) analysis was performed to understand the functional role of the differentially expressed mRNAs. Further bioinformatics analysis was used to explore the potential function, lncRNA-mRNA correlation, and potential targets of the differentially expressed lncRNAs.</p><p><strong>Results: </strong>We found 3195 lncRNAs and 1966 mRNAs that were differentially expressed. The chromosomal distribution of the differentially expressed lncRNAs and mRNAs was unequal, with chromosome 2 having the highest number of lncRNAs and chromosome 7 having the highest number of mRNAs that were differentially expressed. Pathway analysis of the differentially expressed genes indicated the involvement of several signaling pathways including PI3K-Akt, Hippo, and pathways regulating the pluripotency of stem cells. The differentially expressed genes were also enriched for the GO terms, developmental process and muscle system process. Network analysis identified 8 statistically significant (P<.05) network objects from the upregulated lncRNAs and 3 statistically significant network objects from the downregulated lncRNAs.</p><p><strong>Conclusions: </strong>Our results thus far imply that dysferlinopathy is associated with an aberrant expression of multiple lncRNAs, many of which may have a specific function in the disease process. GO terms and network analysis suggest a muscle-specific role for these lncRNAs. To elucidate the specific roles of these abnormally expressed noncoding RNAs, further studies engineering their expression are required.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"1 1","pages":"e33186"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42593802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study. 探索使用自然语言处理支持全国性静脉血栓栓塞监测的适用性:模型评估研究。
Pub Date : 2022-05-08 DOI: 10.2196/36877
Aaron Wendelboe, Ibrahim Saber, Justin Dvorak, Alys Adamski, Natalie Feland, Nimia Reyes, Karon Abe, Thomas Ortel, Gary Raskob

Background: Venous thromboembolism (VTE) is a preventable, common vascular disease that has been estimated to affect up to 900,000 people per year. It has been associated with risk factors such as recent surgery, cancer, and hospitalization. VTE surveillance for patient management and safety can be improved via natural language processing (NLP). NLP tools have the ability to access electronic medical records, identify patients that meet the VTE case definition, and subsequently enter the relevant information into a database for hospital review.

Objective: We aimed to evaluate the performance of a VTE identification model of IDEAL-X (Information and Data Extraction Using Adaptive Learning; Emory University)-an NLP tool-in automatically classifying cases of VTE by "reading" unstructured text from diagnostic imaging records collected from 2012 to 2014.

Methods: After accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC), we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified. Experts reviewed the technicians' comments in each record to determine if a VTE event occurred. The performance measures calculated (with 95% CIs) were accuracy, sensitivity, specificity, and positive and negative predictive values. Chi-square tests of homogeneity were conducted to evaluate differences in performance measures by site, using a significance level of .05.

Results: The VTE model of IDEAL-X "read" 1591 records from Duke University and 1487 records from the OUHSC, for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI 93.7%-93.8%), 96.3% sensitivity (95% CI 96.2%-96.4%), 92% specificity (95% CI 91.9%-92%), an 89.1% positive predictive value (95% CI 89%-89.2%), and a 97.3% negative predictive value (95% CI 97.3%-97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI 97.8%-98%) than at the OUHSC (93.3%, 95% CI 93.1%-93.4%; P<.001), but the specificity was higher at the OUHSC (95.9%, 95% CI 95.8%-96%) than at Duke University (86.5%, 95% CI 86.4%-86.7%; P<.001).

Conclusions: The VTE model of IDEAL-X accurately classified cases of VTE from the pilot surveillance systems of two separate health systems in Durham, North Carolina, and Oklahoma City, Oklahoma. NLP is a promising tool for the design and implementation of an automated, cost-effective national surveillance system for VTE. Conducting public health surveillance at a national scale is important for measuring disease burden and the impact of prevention measures. We recommend additional studies to identify how integrating IDEAL-X in a medical record system could further automate the surveillance process.

背景:静脉血栓栓塞症(VTE)是一种可预防的常见血管疾病,据估计每年影响多达 90 万人。它与近期手术、癌症和住院等风险因素有关。通过自然语言处理 (NLP) 可以改善对患者管理和安全的 VTE 监控。NLP 工具能够访问电子病历,识别符合 VTE 病例定义的患者,然后将相关信息输入数据库供医院审查:我们旨在评估 IDEAL-X(埃默里大学自适应学习信息和数据提取工具)--一种 NLP 工具--的 VTE 识别模型的性能,该模型通过 "阅读 "2012 年至 2014 年收集的诊断成像记录中的非结构化文本,自动对 VTE 病例进行分类:我们从杜克大学和俄克拉荷马大学健康科学中心 (OUHSC) 的 VTE 试点监控系统中获取了成像记录,然后使用 IDEAL-X 的 VTE 识别模型对之前人工分类的 VTE 病例进行分类。专家们查看了每份记录中技术人员的注释,以确定是否发生了 VTE 事件。计算出的性能指标(含 95% CI)包括准确性、灵敏度、特异性以及阳性和阴性预测值。在显著性水平为 0.05 的情况下,进行了同质性的卡方检验,以评估不同地点的性能指标差异:结果:IDEAL-X 的 VTE 模型 "读取 "了杜克大学的 1591 条记录和 OUHSC 的 1487 条记录,共计 3078 条记录。综合性能指标为准确率 93.7%(95% CI 93.7%-93.8%)、灵敏度 96.3%(95% CI 96.2%-96.4%)、特异性 92%(95% CI 91.9%-92%)、阳性预测值 89.1%(95% CI 89%-89.2%)和阴性预测值 97.3%(95% CI 97.3%-97.4%)。杜克大学的灵敏度(97.9%,95% CI 97.8%-98%)高于华侨大学医院(93.3%,95% CI 93.1%-93.4%;PPConclusions:IDEAL-X 的 VTE 模型对北卡罗来纳州达勒姆市和俄克拉荷马州俄克拉荷马市两个独立医疗系统试点监控系统中的 VTE 病例进行了准确分类。对于设计和实施自动化、经济高效的 VTE 全国监测系统而言,NLP 是一种很有前途的工具。在全国范围内开展公共卫生监测对于衡量疾病负担和预防措施的影响非常重要。我们建议开展更多研究,以确定如何将 IDEAL-X 集成到病历系统中,从而进一步实现监测过程的自动化。
{"title":"Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study.","authors":"Aaron Wendelboe, Ibrahim Saber, Justin Dvorak, Alys Adamski, Natalie Feland, Nimia Reyes, Karon Abe, Thomas Ortel, Gary Raskob","doi":"10.2196/36877","DOIUrl":"10.2196/36877","url":null,"abstract":"<p><strong>Background: </strong>Venous thromboembolism (VTE) is a preventable, common vascular disease that has been estimated to affect up to 900,000 people per year. It has been associated with risk factors such as recent surgery, cancer, and hospitalization. VTE surveillance for patient management and safety can be improved via natural language processing (NLP). NLP tools have the ability to access electronic medical records, identify patients that meet the VTE case definition, and subsequently enter the relevant information into a database for hospital review.</p><p><strong>Objective: </strong>We aimed to evaluate the performance of a VTE identification model of IDEAL-X (Information and Data Extraction Using Adaptive Learning; Emory University)-an NLP tool-in automatically classifying cases of VTE by \"reading\" unstructured text from diagnostic imaging records collected from 2012 to 2014.</p><p><strong>Methods: </strong>After accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC), we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified. Experts reviewed the technicians' comments in each record to determine if a VTE event occurred. The performance measures calculated (with 95% CIs) were accuracy, sensitivity, specificity, and positive and negative predictive values. Chi-square tests of homogeneity were conducted to evaluate differences in performance measures by site, using a significance level of .05.</p><p><strong>Results: </strong>The VTE model of IDEAL-X \"read\" 1591 records from Duke University and 1487 records from the OUHSC, for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI 93.7%-93.8%), 96.3% sensitivity (95% CI 96.2%-96.4%), 92% specificity (95% CI 91.9%-92%), an 89.1% positive predictive value (95% CI 89%-89.2%), and a 97.3% negative predictive value (95% CI 97.3%-97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI 97.8%-98%) than at the OUHSC (93.3%, 95% CI 93.1%-93.4%; <i>P</i><.001), but the specificity was higher at the OUHSC (95.9%, 95% CI 95.8%-96%) than at Duke University (86.5%, 95% CI 86.4%-86.7%; <i>P</i><.001).</p><p><strong>Conclusions: </strong>The VTE model of IDEAL-X accurately classified cases of VTE from the pilot surveillance systems of two separate health systems in Durham, North Carolina, and Oklahoma City, Oklahoma. NLP is a promising tool for the design and implementation of an automated, cost-effective national surveillance system for VTE. Conducting public health surveillance at a national scale is important for measuring disease burden and the impact of prevention measures. We recommend additional studies to identify how integrating IDEAL-X in a medical record system could further automate the surveillance process.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"3 1","pages":"e36877"},"PeriodicalIF":0.0,"publicationDate":"2022-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10193259/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9501826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR bioinformatics and biotechnology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1