Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction.

IF 1.5 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Integrative Bioinformatics Pub Date : 2024-07-15 eCollection Date: 2024-06-01 DOI:10.1515/jib-2023-0042
Daniel Martins, Maryam Abbasi, Conceição Egas, Joel P Arrais
{"title":"Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction.","authors":"Daniel Martins, Maryam Abbasi, Conceição Egas, Joel P Arrais","doi":"10.1515/jib-2023-0042","DOIUrl":null,"url":null,"abstract":"<p><p>This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jib-2023-0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
检测病例对照队列中的异常值,改进深度学习网络对精神分裂症的预测。
精神分裂症是一种病因不确定的复杂精神障碍,本研究深入探讨了精神分裂症错综复杂的遗传和临床问题。深度学习(DL)有望通过分析大型基因组数据集来发现新的风险因素。然而,根据有关精神分裂症不可忽视的误诊率的报道,病例对照队列可能包含离谱的遗传特征,从而阻碍了分类模型令人信服的性能。研究采用的病例对照数据集来自瑞典人群。研究分两个阶段开发并使用了基于基因注释的 DL 架构。首先,对整个数据集进行模型训练,以突出病例与对照之间的差异。然后,排除可能被错误分类的样本,并在改进后的数据集上重新训练模型,以进行性能评估。结果表明,SCZ 的患病率和误诊率会影响病例对照队列,可能会影响未来依赖此类数据集进行的研究。不过,通过检测和过滤异常值,该研究证明了将 DL 方法应用于大规模生物问题的可行性,得出的结果与 SCZ 的现有遗传率估计更加一致。这种方法不仅促进了对 SCZ 遗传背景的理解,还为在复杂研究中应用 DL 技术以实现心理健康的精准医疗打开了大门。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Integrative Bioinformatics
Journal of Integrative Bioinformatics Medicine-Medicine (all)
CiteScore
3.10
自引率
5.30%
发文量
27
审稿时长
12 weeks
期刊最新文献
MCMVDRP: a multi-channel multi-view deep learning framework for cancer drug response prediction. Leonhard Med, a trusted research environment for processing sensitive research data. Exploring animal behaviour multilayer networks in immersive environments - a conceptual framework. Inferences on the evolution of the ascorbic acid synthesis pathway in insects using Phylogenetic Tree Collapser (PTC), a tool for the automated collapsing of phylogenetic trees using taxonomic information. Specifications of standards in systems and synthetic biology: status, developments, and tools in 2024.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1