首页 > 最新文献

JMIR bioinformatics and biotechnology最新文献

英文 中文
Lung Cancer Diagnosis From Computed Tomography Images Using Deep Learning Algorithms With Random Pixel Swap Data Augmentation: Algorithm Development and Validation Study. 使用随机像素交换数据增强的深度学习算法从计算机断层扫描图像诊断肺癌:算法开发和验证研究。
Pub Date : 2025-09-03 DOI: 10.2196/68848
Ayomide Adeyemi Abe, Mpumelelo Nyathi

Background: Deep learning (DL) shows promise for automated lung cancer diagnosis, but limited clinical data can restrict performance. While data augmentation (DA) helps, existing methods struggle with chest computed tomography (CT) scans across diverse DL architectures.

Objective: This study proposes Random Pixel Swap (RPS), a novel DA technique, to enhance diagnostic performance in both convolutional neural networks and transformers for lung cancer diagnosis from CT scan images.

Methods: RPS generates augmented data by randomly swapping pixels within patient CT scan images. We evaluated it on ResNet, MobileNet, Vision Transformer, and Swin Transformer models, using 2 public CT datasets (Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases [IQ-OTH/NCCD] dataset and chest CT scan images dataset), and measured accuracy and area under the receiver operating characteristic curve (AUROC). Statistical significance was assessed via paired t tests.

Results: The RPS outperformed state-of-the-art DA methods (Cutout, Random Erasing, MixUp, and CutMix), achieving 97.56% accuracy and 98.61% AUROC on the IQ-OTH/NCCD dataset and 97.78% accuracy and 99.46% AUROC on the chest CT scan images dataset. While traditional augmentation approaches (flipping and rotation) remained effective, RPS complemented them, surpassing the performance findings in prior studies and demonstrating the potential of artificial intelligence for early lung cancer detection.

Conclusions: The RPS technique enhances convolutional neural network and transformer models, enabling more accurate automated lung cancer detection from CT scan images.

背景:深度学习(DL)显示了自动化肺癌诊断的前景,但有限的临床数据可能会限制其性能。虽然数据增强(DA)有所帮助,但现有方法在不同DL架构下的胸部计算机断层扫描(CT)扫描方面存在困难。目的:利用随机像素交换(Random Pixel Swap, RPS)技术提高卷积神经网络和变压器在CT扫描肺癌诊断中的诊断性能。方法:RPS通过随机交换患者CT扫描图像中的像素来生成增强数据。我们使用2个公共CT数据集(伊拉克肿瘤教学医院/国家癌症疾病中心[IQ-OTH/NCCD]数据集和胸部CT扫描图像数据集)在ResNet、MobileNet、Vision Transformer和Swin Transformer模型上对其进行了评估,并测量了准确度和接受者工作特征曲线下面积(AUROC)。通过配对t检验评估统计学意义。结果:RPS优于最先进的DA方法(Cutout, Random erase, MixUp和CutMix),在IQ-OTH/NCCD数据集上达到97.56%的准确率和98.61%的AUROC,在胸部CT扫描图像数据集上达到97.78%的准确率和99.46%的AUROC。虽然传统的增强方法(翻转和旋转)仍然有效,但RPS补充了它们,超越了先前研究的性能发现,并展示了人工智能在早期肺癌检测中的潜力。结论:RPS技术增强了卷积神经网络和变压器模型,可以更准确地从CT扫描图像中自动检测肺癌。
{"title":"Lung Cancer Diagnosis From Computed Tomography Images Using Deep Learning Algorithms With Random Pixel Swap Data Augmentation: Algorithm Development and Validation Study.","authors":"Ayomide Adeyemi Abe, Mpumelelo Nyathi","doi":"10.2196/68848","DOIUrl":"10.2196/68848","url":null,"abstract":"<p><strong>Background: </strong>Deep learning (DL) shows promise for automated lung cancer diagnosis, but limited clinical data can restrict performance. While data augmentation (DA) helps, existing methods struggle with chest computed tomography (CT) scans across diverse DL architectures.</p><p><strong>Objective: </strong>This study proposes Random Pixel Swap (RPS), a novel DA technique, to enhance diagnostic performance in both convolutional neural networks and transformers for lung cancer diagnosis from CT scan images.</p><p><strong>Methods: </strong>RPS generates augmented data by randomly swapping pixels within patient CT scan images. We evaluated it on ResNet, MobileNet, Vision Transformer, and Swin Transformer models, using 2 public CT datasets (Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases [IQ-OTH/NCCD] dataset and chest CT scan images dataset), and measured accuracy and area under the receiver operating characteristic curve (AUROC). Statistical significance was assessed via paired t tests.</p><p><strong>Results: </strong>The RPS outperformed state-of-the-art DA methods (Cutout, Random Erasing, MixUp, and CutMix), achieving 97.56% accuracy and 98.61% AUROC on the IQ-OTH/NCCD dataset and 97.78% accuracy and 99.46% AUROC on the chest CT scan images dataset. While traditional augmentation approaches (flipping and rotation) remained effective, RPS complemented them, surpassing the performance findings in prior studies and demonstrating the potential of artificial intelligence for early lung cancer detection.</p><p><strong>Conclusions: </strong>The RPS technique enhances convolutional neural network and transformer models, enabling more accurate automated lung cancer detection from CT scan images.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e68848"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12407498/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systemic Anticancer Therapy Timelines Extraction From Electronic Medical Records Text: Algorithm Development and Validation. 从电子病历文本中提取系统性抗癌治疗时间表:算法开发和验证。
Pub Date : 2025-09-03 DOI: 10.2196/67801
Jiarui Yao, Eli Goldner, Harry Hochheiser, Sean Finan, John Levander, David Harris, Piet C de Groen, Elizabeth Buchbinder, Danielle Bitterman, Jeremy L Warner, Guergana Savova

Background: The systemic treatment of cancer typically requires the use of multiple anticancer agents in combination or sequentially. Clinical narrative texts often contain extensive descriptions of the temporal sequencing of systemic anticancer therapy (SACT), setting up an important task that may be amenable to automated extraction of SACT timelines.

Objective: We aimed to explore automatic methods for extracting patient-level SACT timelines from clinical narratives in the electronic medical records (EMRs).

Methods: We used two datasets from two institutions: (1) a colorectal cancer (CRC) dataset including the entire EMR of the 199 patients in the THYME (Temporal Histories of Your Medical Event) dataset and (2) the 2024 ChemoTimelines shared task dataset including 149 patients with ovarian cancer, breast cancer, and melanoma. We explored finetuning smaller language models trained to attend to events and time expressions, and few-shot prompting of large language models (LLMs). Evaluation used the 2024 ChemoTimelines shared task configuration-Subtask1 involving the construction of SACT timelines from manually annotated SACT event and time expression mentions provided as input in addition to the patient's notes and Subtask2 requiring extraction of SACT timelines directly from the patient's notes.

Results: Our task-specific finetuned EntityBERT model achieved 93% F1-score, outperforming the best results in Subtask1 of the 2024 ChemoTimelines shared task (90%). It ranked second in Subtask2. LLM (LLaMA2, LLaMA3.1, and Mixtral) performance lagged the task-specific finetuned model performance for both the THYME and shared task datasets. On the shared task datasets, the best LLM performance was 77% macro F1-score, 16% points lower than the task-specific finetuned system (Subtask1).

Conclusions: In this paper, we explored approaches for patient-level timeline extraction through the SACT timeline extraction task. Our results and analysis add to the knowledge of extracting treatment timelines from EMR clinical narratives using language modeling methods.

背景:癌症的全身治疗通常需要联合或依次使用多种抗癌药物。临床叙事文本通常包含对系统性抗癌治疗(SACT)时间序列的广泛描述,这可能是一项重要的任务,可以自动提取SACT时间线。目的:我们旨在探索从电子病历(EMRs)的临床叙述中提取患者层面SACT时间线的自动方法。方法:我们使用了来自两个机构的两个数据集:(1)一个结直肠癌(CRC)数据集,包括199名患者在THYME(您的医疗事件的时间历史)数据集中的全部EMR; (2) 2024 chemotimeline共享任务数据集,包括149名卵巢癌、乳腺癌和黑色素瘤患者。我们探索了微调较小的语言模型来训练事件和时间表达式,以及大型语言模型(llm)的少量提示。评估使用了2024 ChemoTimelines共享任务配置- subtask1涉及从手动注释的SACT事件和时间表达提及中构建SACT时间表,这些事件和时间表达提及作为输入提供给患者笔记和Subtask2,需要直接从患者笔记中提取SACT时间表。结果:我们的任务特定微调的EntityBERT模型获得了93%的f1得分,超过了2024 chemotimeline共享任务Subtask1的最佳结果(90%)。它在Subtask2中排名第二。对于THYME和共享任务数据集,LLM (LLaMA2、LLaMA3.1和Mixtral)性能落后于特定于任务的微调模型性能。在共享任务数据集上,LLM的最佳性能为77%的宏f1得分,比特定任务的微调系统(Subtask1)低16%。结论:在本文中,我们通过SACT时间线提取任务探索了患者级时间线提取的方法。我们的结果和分析增加了使用语言建模方法从EMR临床叙述中提取治疗时间表的知识。
{"title":"Systemic Anticancer Therapy Timelines Extraction From Electronic Medical Records Text: Algorithm Development and Validation.","authors":"Jiarui Yao, Eli Goldner, Harry Hochheiser, Sean Finan, John Levander, David Harris, Piet C de Groen, Elizabeth Buchbinder, Danielle Bitterman, Jeremy L Warner, Guergana Savova","doi":"10.2196/67801","DOIUrl":"10.2196/67801","url":null,"abstract":"<p><strong>Background: </strong>The systemic treatment of cancer typically requires the use of multiple anticancer agents in combination or sequentially. Clinical narrative texts often contain extensive descriptions of the temporal sequencing of systemic anticancer therapy (SACT), setting up an important task that may be amenable to automated extraction of SACT timelines.</p><p><strong>Objective: </strong>We aimed to explore automatic methods for extracting patient-level SACT timelines from clinical narratives in the electronic medical records (EMRs).</p><p><strong>Methods: </strong>We used two datasets from two institutions: (1) a colorectal cancer (CRC) dataset including the entire EMR of the 199 patients in the THYME (Temporal Histories of Your Medical Event) dataset and (2) the 2024 ChemoTimelines shared task dataset including 149 patients with ovarian cancer, breast cancer, and melanoma. We explored finetuning smaller language models trained to attend to events and time expressions, and few-shot prompting of large language models (LLMs). Evaluation used the 2024 ChemoTimelines shared task configuration-Subtask1 involving the construction of SACT timelines from manually annotated SACT event and time expression mentions provided as input in addition to the patient's notes and Subtask2 requiring extraction of SACT timelines directly from the patient's notes.</p><p><strong>Results: </strong>Our task-specific finetuned EntityBERT model achieved 93% F1-score, outperforming the best results in Subtask1 of the 2024 ChemoTimelines shared task (90%). It ranked second in Subtask2. LLM (LLaMA2, LLaMA3.1, and Mixtral) performance lagged the task-specific finetuned model performance for both the THYME and shared task datasets. On the shared task datasets, the best LLM performance was 77% macro F1-score, 16% points lower than the task-specific finetuned system (Subtask1).</p><p><strong>Conclusions: </strong>In this paper, we explored approaches for patient-level timeline extraction through the SACT timeline extraction task. Our results and analysis add to the knowledge of extracting treatment timelines from EMR clinical narratives using language modeling methods.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e67801"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408058/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In Silico Analysis and Validation of A Disintegrin and Metalloprotease (ADAM) 17 Gene Missense Variants: Structural Bioinformatics Study. 崩解素和金属蛋白酶(ADAM) 17基因错义变异的硅分析和验证:结构生物信息学研究。
Pub Date : 2025-08-25 DOI: 10.2196/72133
Abdelilah Mechnine, Asmae Saih, Lahcen Wakrim, Ahmed Aarab

Background: The protein A disintegrin and metalloprotease (ADAM) domain containing 17, also called tumor necrosis factor alpha-converting enzyme, is mainly responsible for cleaving a specific sequence Pro-Leu-Ala-Gln-Ala-/-Val-Arg-Ser-Ser-Ser in the membrane-bound precursor of tumor necrosis factor alpha. This cleavage process has significant implications for inflammatory and immune responses, and recent research indicates that genetic variants of ADAM17 may influence susceptibility to and severity of SARS-CoV-2 infection.

Objective: The aim of the study is to identify the most deleterious missense variants of ADAM17 that impact protein stability, structure, and function and to assess specific variants potentially involved in SARS-CoV-2 infection.

Methods: A bioinformatics approach was used on 12,042 single-nucleotide polymorphisms using tools including SIFT (Sorting Intolerant From Tolerant), PolyPhen2.0, PROVEAN (Protein Variation Effect Analyzer), PANTHER (Protein Analysis Through Evolutionary Relationships), SNP&GO (Single Nucleotide Polymorphisms and Gene Ontology), PhD-SNP (Predictor of Human Deleterious Single Nucleotide Polymorphisms), Mutation Assessor, SNAP2 (Screening for Non-Acceptable Polymorphisms 2), MUpro, I-Mutant, iStable, InterPro, Sparks-x, PROCHECK (Programs to Check the Stereochemical Quality of Protein Structures), PyMol, Project HOPE (Have (y)Our Protein Explained), ConSurf, and SWISS-MODEL. Missense variants of ADAM17 were collected from the Ensembl database for analysis.

Results: In total, 7 nonsynonymous single-nucleotide polymorphisms (P556L, G550D, V483A, G479E, G349E, T339P, and D232E) were identified as high-risk pathogenic by all prediction tools, and these variants were found to potentially have deleterious effects on the stability, structure, and function of the ADAM17 protein, potentially destroying the entire cleavage process. Additionally, 4 missense variants (Q658H, D657G, D654N, and F652L) in positions related to SARS-CoV-2 infection exhibited high conservation scores and were predicted to be deleterious, suggesting that they play an important role in SARS-CoV-2 infection.

Conclusions: Specific missense variants of ADAM17 are predicted to be highly pathogenic, potentially affecting protein stability and function and contributing to SARS-CoV-2 pathogenesis. These findings provide a basis for understanding their clinical relevance, aiding in early diagnosis, risk assessment, and therapeutic development.

背景:含有17的蛋白A分解素和金属蛋白酶(ADAM)结构域,也被称为肿瘤坏死因子α -转换酶,主要负责在肿瘤坏死因子α的膜结合前体中切割特定序列Pro-Leu-Ala-Gln-Ala-/- val - arg - ser - ser - ser。这一切割过程对炎症和免疫反应具有重要意义,最近的研究表明,ADAM17的遗传变异可能影响对SARS-CoV-2感染的易感性和严重程度。目的:本研究的目的是鉴定影响蛋白质稳定性、结构和功能的ADAM17最有害的错义变异,并评估可能与SARS-CoV-2感染有关的特定变异。方法:采用生物信息学方法对12042个单核苷酸多态性进行分析,使用的工具包括SIFT(从耐受性中筛选不耐性)、PolyPhen2.0、PROVEAN(蛋白质变异效应分析仪)、PANTHER(通过进化关系分析蛋白质)、SNP&GO(单核苷酸多态性和基因本体)、dr - snp(人类有害单核苷酸多态性预测器)、突变评估器、SNAP2(筛选不可接受多态性2)、MUpro、I-Mutant、iStable、InterPro, Sparks-x, PROCHECK(检查蛋白质结构立体化学质量的程序),PyMol, Project HOPE (Have (y)Our Protein Explained), ConSurf和SWISS-MODEL。从Ensembl数据库中收集ADAM17的错义变体进行分析。结果:所有预测工具共鉴定出7个非同义单核苷酸多态性(P556L、G550D、V483A、G479E、G349E、T339P和D232E)为高危致病性,发现这些变异可能对ADAM17蛋白的稳定性、结构和功能产生有害影响,可能破坏整个裂解过程。此外,位于SARS-CoV-2感染相关位点的4个错sense变异体(Q658H、D657G、D654N和F652L)表现出较高的保守评分,并被预测为有害的,表明它们在SARS-CoV-2感染中起重要作用。结论:预测ADAM17特异性错义变异具有高致病性,可能影响蛋白质的稳定性和功能,并参与SARS-CoV-2的发病机制。这些发现为了解其临床相关性提供了基础,有助于早期诊断、风险评估和治疗发展。
{"title":"In Silico Analysis and Validation of A Disintegrin and Metalloprotease (ADAM) 17 Gene Missense Variants: Structural Bioinformatics Study.","authors":"Abdelilah Mechnine, Asmae Saih, Lahcen Wakrim, Ahmed Aarab","doi":"10.2196/72133","DOIUrl":"10.2196/72133","url":null,"abstract":"<p><strong>Background: </strong>The protein A disintegrin and metalloprotease (ADAM) domain containing 17, also called tumor necrosis factor alpha-converting enzyme, is mainly responsible for cleaving a specific sequence Pro-Leu-Ala-Gln-Ala-/-Val-Arg-Ser-Ser-Ser in the membrane-bound precursor of tumor necrosis factor alpha. This cleavage process has significant implications for inflammatory and immune responses, and recent research indicates that genetic variants of ADAM17 may influence susceptibility to and severity of SARS-CoV-2 infection.</p><p><strong>Objective: </strong>The aim of the study is to identify the most deleterious missense variants of ADAM17 that impact protein stability, structure, and function and to assess specific variants potentially involved in SARS-CoV-2 infection.</p><p><strong>Methods: </strong>A bioinformatics approach was used on 12,042 single-nucleotide polymorphisms using tools including SIFT (Sorting Intolerant From Tolerant), PolyPhen2.0, PROVEAN (Protein Variation Effect Analyzer), PANTHER (Protein Analysis Through Evolutionary Relationships), SNP&GO (Single Nucleotide Polymorphisms and Gene Ontology), PhD-SNP (Predictor of Human Deleterious Single Nucleotide Polymorphisms), Mutation Assessor, SNAP2 (Screening for Non-Acceptable Polymorphisms 2), MUpro, I-Mutant, iStable, InterPro, Sparks-x, PROCHECK (Programs to Check the Stereochemical Quality of Protein Structures), PyMol, Project HOPE (Have (y)Our Protein Explained), ConSurf, and SWISS-MODEL. Missense variants of ADAM17 were collected from the Ensembl database for analysis.</p><p><strong>Results: </strong>In total, 7 nonsynonymous single-nucleotide polymorphisms (P556L, G550D, V483A, G479E, G349E, T339P, and D232E) were identified as high-risk pathogenic by all prediction tools, and these variants were found to potentially have deleterious effects on the stability, structure, and function of the ADAM17 protein, potentially destroying the entire cleavage process. Additionally, 4 missense variants (Q658H, D657G, D654N, and F652L) in positions related to SARS-CoV-2 infection exhibited high conservation scores and were predicted to be deleterious, suggesting that they play an important role in SARS-CoV-2 infection.</p><p><strong>Conclusions: </strong>Specific missense variants of ADAM17 are predicted to be highly pathogenic, potentially affecting protein stability and function and contributing to SARS-CoV-2 pathogenesis. These findings provide a basis for understanding their clinical relevance, aiding in early diagnosis, risk assessment, and therapeutic development.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e72133"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stacked Deep Learning Ensemble for Multiomics Cancer Type Classification: Development and Validation Study. 用于多组学癌症类型分类的堆叠深度学习集成:开发和验证研究。
Pub Date : 2025-08-12 DOI: 10.2196/70709
Amani Ameen, Nofe Alganmi, Nada Bajnaid

Background: Cancer is one of the leading causes of disease burden globally, and early and accurate diagnosis is crucial for effective treatment. This study presents a deep learning-based model designed to classify 5 common types of cancer in Saudi Arabia: breast, colorectal, thyroid, non-Hodgkin lymphoma, and corpus uteri.

Objective: This study aimed to evaluate whether integrating RNA sequencing, somatic mutation, and DNA methylation profiles within a stacking deep learning ensemble improves cancer type classification accuracy relative to the current state-of-the-art multiomics models.

Methods: Using a stacking ensemble learning approach, our model integrates 5 well-established methods: support vector machine, k-nearest neighbors, artificial neural network, convolutional neural network, and random forest. The methodology involves 2 main stages: data preprocessing (including normalization and feature extraction) and ensemble stacking classification. We prepared the data before applying the stacking model.

Results: The stacking ensemble model achieved 98% accuracy with multiomics versus 96% using RNA sequencing and methylation individually, 81% using somatic mutation data, suggesting that multiomics data can be used for diagnosis in primary care settings. The models used in ensemble learning are among the most widely used in cancer classification research. Their prevalent use in previous studies underscores their effectiveness and flexibility, enhancing the performance of multiomics data integration.

Conclusions: This study highlights the importance of advanced machine learning techniques in improving cancer detection and prognosis, contributing valuable insights by applying ensemble learning to integrate multiomics data for more effective cancer classification.

背景:癌症是全球疾病负担的主要原因之一,早期准确诊断对于有效治疗至关重要。本研究提出了一个基于深度学习的模型,旨在对沙特阿拉伯的5种常见癌症进行分类:乳腺癌、结直肠癌、甲状腺癌、非霍奇金淋巴瘤和子宫体癌。目的:本研究旨在评估相对于当前最先进的多组学模型,在堆叠深度学习集成中整合RNA测序、体细胞突变和DNA甲基化谱是否能提高癌症类型分类的准确性。方法:采用堆叠集成学习方法,将支持向量机、k近邻、人工神经网络、卷积神经网络和随机森林5种成熟的方法集成在一起。该方法包括两个主要阶段:数据预处理(包括归一化和特征提取)和集成堆叠分类。我们在应用叠加模型之前对数据进行了准备。结果:多组学叠加集成模型的准确率为98%,而单独使用RNA测序和甲基化的准确率为96%,使用体细胞突变数据的准确率为81%,这表明多组学数据可用于初级保健机构的诊断。集成学习中使用的模型是癌症分类研究中使用最广泛的模型之一。它们在以往研究中的广泛使用强调了它们的有效性和灵活性,提高了多组学数据集成的性能。结论:本研究强调了先进的机器学习技术在改善癌症检测和预后方面的重要性,通过应用集成学习整合多组学数据,为更有效的癌症分类提供了有价值的见解。
{"title":"Stacked Deep Learning Ensemble for Multiomics Cancer Type Classification: Development and Validation Study.","authors":"Amani Ameen, Nofe Alganmi, Nada Bajnaid","doi":"10.2196/70709","DOIUrl":"10.2196/70709","url":null,"abstract":"<p><strong>Background: </strong>Cancer is one of the leading causes of disease burden globally, and early and accurate diagnosis is crucial for effective treatment. This study presents a deep learning-based model designed to classify 5 common types of cancer in Saudi Arabia: breast, colorectal, thyroid, non-Hodgkin lymphoma, and corpus uteri.</p><p><strong>Objective: </strong>This study aimed to evaluate whether integrating RNA sequencing, somatic mutation, and DNA methylation profiles within a stacking deep learning ensemble improves cancer type classification accuracy relative to the current state-of-the-art multiomics models.</p><p><strong>Methods: </strong>Using a stacking ensemble learning approach, our model integrates 5 well-established methods: support vector machine, k-nearest neighbors, artificial neural network, convolutional neural network, and random forest. The methodology involves 2 main stages: data preprocessing (including normalization and feature extraction) and ensemble stacking classification. We prepared the data before applying the stacking model.</p><p><strong>Results: </strong>The stacking ensemble model achieved 98% accuracy with multiomics versus 96% using RNA sequencing and methylation individually, 81% using somatic mutation data, suggesting that multiomics data can be used for diagnosis in primary care settings. The models used in ensemble learning are among the most widely used in cancer classification research. Their prevalent use in previous studies underscores their effectiveness and flexibility, enhancing the performance of multiomics data integration.</p><p><strong>Conclusions: </strong>This study highlights the importance of advanced machine learning techniques in improving cancer detection and prognosis, contributing valuable insights by applying ensemble learning to integrate multiomics data for more effective cancer classification.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e70709"},"PeriodicalIF":0.0,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetic Diversity and Mutation Frequency Databases in Ethnic Populations: Systematic Review. 民族人群遗传多样性和突变频率数据库:系统综述。
Pub Date : 2025-08-11 DOI: 10.2196/69454
Shumaila Khan, Mahmood Alam, Iqbal Qasim, Shahnaz Khan, Wahab Khan, Orken Mamyrbayev, Ainur Akhmediyarova, Nurzhan Mukazhanov, Zhibek Alibiyeva

Background: National and ethnic mutation frequency databases (NEMDBs) play a crucial role in documenting gene variations across populations, offering invaluable insights for gene mutation research and the advancement of precision medicine. These databases provide an essential resource for understanding genetic diversity and its implications for health and disease across different ethnic groups.

Objective: The aim of this study is to systematically evaluate 42 NEMDBs to (1) quantify gaps in standardization (70% nonstandard formats, 50% outdated data), (2) propose artificial intelligence/linked open data solutions for interoperability, and (3) highlight clinical implications for precision medicine across NEMDBs.

Methods: A systematic approach was used to assess the databases based on several criteria, including data collection methods, system design, and querying mechanisms. We analyzed the accessibility and user-centric features of each database, noting their ability to integrate with other systems and their role in advancing genetic disorder research. The review also addressed standardization and data quality challenges prevalent in current NEMDBs.

Results: The analysis of 42 NEMDBs revealed significant issues, with 70% (29/42) lacking standardized data formats and 60% (25/42) having notable gaps in the cross-comparison of genetic variations, and 50% (21/42) of the databases contained incomplete or outdated data, limiting their clinical utility. However, databases developed on open-source platforms, such as LOVD, showed a 40% increase in usability for researchers, highlighting the benefits of using flexible, open-access systems.

Conclusions: We propose cloud-based platforms and linked open data frameworks to address critical gaps in standardization (70% of databases) and outdated data (50%) alongside artificial intelligence-driven models for improved interoperability. These solutions prioritize user-centric design to effectively serve clinicians, researchers, and public stakeholders.

背景:国家和民族突变频率数据库(NEMDBs)在记录人群基因变异方面发挥着至关重要的作用,为基因突变研究和精准医学的进步提供了宝贵的见解。这些数据库为了解遗传多样性及其对不同种族群体的健康和疾病的影响提供了重要资源。目的:本研究的目的是系统地评估42个nemdb,以(1)量化标准化差距(70%非标准格式,50%过时数据),(2)提出人工智能/链接开放数据解决方案,以实现互操作性,(3)突出nemdb对精准医疗的临床意义。方法:采用系统的方法对数据库进行评估,评估标准包括数据收集方法、系统设计和查询机制。我们分析了每个数据库的可访问性和以用户为中心的特征,注意到它们与其他系统集成的能力以及它们在推进遗传疾病研究中的作用。该审查还讨论了当前nemdb中普遍存在的标准化和数据质量挑战。结果:42份nemdb的分析显示存在明显问题,70%(29/42)的数据库缺乏标准化的数据格式,60%(25/42)的数据库在遗传变异交叉比较方面存在明显差距,50%(21/42)的数据库包含不完整或过时的数据,限制了其临床应用。然而,在开源平台(如LOVD)上开发的数据库显示,研究人员的可用性提高了40%,这突出了使用灵活、开放获取系统的好处。结论:我们提出基于云的平台和链接的开放数据框架,以解决标准化(70%的数据库)和过时数据(50%)方面的关键差距,以及人工智能驱动的模型,以提高互操作性。这些解决方案优先考虑以用户为中心的设计,以有效地为临床医生、研究人员和公众利益相关者提供服务。
{"title":"Genetic Diversity and Mutation Frequency Databases in Ethnic Populations: Systematic Review.","authors":"Shumaila Khan, Mahmood Alam, Iqbal Qasim, Shahnaz Khan, Wahab Khan, Orken Mamyrbayev, Ainur Akhmediyarova, Nurzhan Mukazhanov, Zhibek Alibiyeva","doi":"10.2196/69454","DOIUrl":"10.2196/69454","url":null,"abstract":"<p><strong>Background: </strong>National and ethnic mutation frequency databases (NEMDBs) play a crucial role in documenting gene variations across populations, offering invaluable insights for gene mutation research and the advancement of precision medicine. These databases provide an essential resource for understanding genetic diversity and its implications for health and disease across different ethnic groups.</p><p><strong>Objective: </strong>The aim of this study is to systematically evaluate 42 NEMDBs to (1) quantify gaps in standardization (70% nonstandard formats, 50% outdated data), (2) propose artificial intelligence/linked open data solutions for interoperability, and (3) highlight clinical implications for precision medicine across NEMDBs.</p><p><strong>Methods: </strong>A systematic approach was used to assess the databases based on several criteria, including data collection methods, system design, and querying mechanisms. We analyzed the accessibility and user-centric features of each database, noting their ability to integrate with other systems and their role in advancing genetic disorder research. The review also addressed standardization and data quality challenges prevalent in current NEMDBs.</p><p><strong>Results: </strong>The analysis of 42 NEMDBs revealed significant issues, with 70% (29/42) lacking standardized data formats and 60% (25/42) having notable gaps in the cross-comparison of genetic variations, and 50% (21/42) of the databases contained incomplete or outdated data, limiting their clinical utility. However, databases developed on open-source platforms, such as LOVD, showed a 40% increase in usability for researchers, highlighting the benefits of using flexible, open-access systems.</p><p><strong>Conclusions: </strong>We propose cloud-based platforms and linked open data frameworks to address critical gaps in standardization (70% of databases) and outdated data (50%) alongside artificial intelligence-driven models for improved interoperability. These solutions prioritize user-centric design to effectively serve clinicians, researchers, and public stakeholders.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e69454"},"PeriodicalIF":0.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12338852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study. 优化特征选择和机器学习算法用于早期检测糖尿病前期风险:比较研究。
Pub Date : 2025-07-31 DOI: 10.2196/70621
Mahmoud B Almadhoun, M A Burhanuddin

Background: Prediabetes is an intermediate stage between normal glucose metabolism and diabetes and is associated with increased risk of complications like cardiovascular disease and kidney failure.

Objective: It is crucial to recognize individuals with prediabetes early in order to apply timely intervention strategies to decelerate or prohibit diabetes development. This study aims to compare the effectiveness of machine learning (ML) algorithms in predicting prediabetes and identifying its key clinical predictors.

Methods: Multiple ML models are evaluated in this study, including random forest, extreme gradient boosting (XGBoost), support vector machine (SVM), and k-nearest neighbors (KNNs), on a dataset of 4743 individuals. For improved performance and interpretability, key clinical features were selected using LASSO (Least Absolute Shrinkage and Selection Operator) regression and principal component analysis (PCA). To optimize model accuracy and reduce overfitting, we used hyperparameter tuning with RandomizedSearchCV for XGBoost and random forest, and GridSearchCV for SVM and KNN. SHAP (Shapley Additive Explanations) was used to assess model-agnostic feature importance. To resolve data imbalance, SMOTE (Synthetic Minority Oversampling Technique) was applied to ensure reliable classifications.

Results: A cross-validated ROC-AUC (receiver operating characteristic area under the curve) score of 0.9117 highlighted the robustness of random forest in generalizing across datasets among the models tested. XGBoost followed closely, providing balanced accuracy in distinguishing between normal and prediabetic cases. While SVMs and KNNs performed adequately as baseline models, they exhibited limitations in sensitivity. The SHAP analysis indicated that BMI, age, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol emerged as the key predictors across models. The performance was significantly enhanced through hyperparameter tuning; for example, the ROC-AUC for SVM increased from 0.813 (default) to 0.863 (tuned). PCA kept 12 components while maintaining 95% of the variance in the dataset.

Conclusions: It is demonstrated in this research that optimized ML models, especially random forest and XGBoost, are effective tools for assessing early prediabetes risk. Combining SHAP analysis with LASSO and PCA enhances transparency, supporting their integration in real-time clinical decision support systems. Future directions include validating these models in diverse clinical settings and integrating additional biomarkers to improve prediction accuracy, offering a promising avenue for early intervention and personalized treatment strategies in preventive health care.

背景:前驱糖尿病是介于正常糖代谢和糖尿病之间的中间阶段,与心血管疾病和肾衰竭等并发症的风险增加有关。目的:早期识别糖尿病前期个体,及时采取干预措施,减缓或阻止糖尿病的发展是至关重要的。本研究旨在比较机器学习(ML)算法在预测糖尿病前期和识别其关键临床预测因素方面的有效性。方法:本研究在4743个个体的数据集上评估了多个机器学习模型,包括随机森林、极端梯度增强(XGBoost)、支持向量机(SVM)和k近邻(knn)。为了提高性能和可解释性,使用LASSO(最小绝对收缩和选择算子)回归和主成分分析(PCA)选择关键临床特征。为了优化模型精度和减少过拟合,我们对XGBoost和随机森林使用了RandomizedSearchCV,对SVM和KNN使用了GridSearchCV进行超参数调优。采用Shapley加性解释(Shapley Additive explanatory)来评估与模型无关的特征重要性。为了解决数据不平衡问题,采用了SMOTE (Synthetic Minority Oversampling Technique)技术来保证分类的可靠性。结果:经交叉验证的ROC-AUC(曲线下的受试者工作特征面积)得分为0.9117,表明随机森林在模型间泛化的稳健性。XGBoost紧随其后,在区分正常和糖尿病前期病例方面提供了平衡的准确性。虽然支持向量机和knn作为基线模型表现良好,但它们在灵敏度上存在局限性。SHAP分析表明,BMI、年龄、高密度脂蛋白胆固醇和低密度脂蛋白胆固醇是各模型的关键预测因素。通过超参数调优,性能得到显著提高;例如,SVM的ROC-AUC从0.813(默认)增加到0.863(调整)。PCA保留了12个分量,同时保持了数据集中95%的方差。结论:本研究表明,优化的ML模型,特别是随机森林和XGBoost,是评估早期糖尿病前期风险的有效工具。将SHAP分析与LASSO和PCA相结合提高了透明度,支持它们在实时临床决策支持系统中的集成。未来的方向包括在不同的临床环境中验证这些模型,并整合其他生物标志物以提高预测准确性,为预防保健中的早期干预和个性化治疗策略提供有希望的途径。
{"title":"Optimizing Feature Selection and Machine Learning Algorithms for Early Detection of Prediabetes Risk: Comparative Study.","authors":"Mahmoud B Almadhoun, M A Burhanuddin","doi":"10.2196/70621","DOIUrl":"10.2196/70621","url":null,"abstract":"<p><strong>Background: </strong>Prediabetes is an intermediate stage between normal glucose metabolism and diabetes and is associated with increased risk of complications like cardiovascular disease and kidney failure.</p><p><strong>Objective: </strong>It is crucial to recognize individuals with prediabetes early in order to apply timely intervention strategies to decelerate or prohibit diabetes development. This study aims to compare the effectiveness of machine learning (ML) algorithms in predicting prediabetes and identifying its key clinical predictors.</p><p><strong>Methods: </strong>Multiple ML models are evaluated in this study, including random forest, extreme gradient boosting (XGBoost), support vector machine (SVM), and k-nearest neighbors (KNNs), on a dataset of 4743 individuals. For improved performance and interpretability, key clinical features were selected using LASSO (Least Absolute Shrinkage and Selection Operator) regression and principal component analysis (PCA). To optimize model accuracy and reduce overfitting, we used hyperparameter tuning with RandomizedSearchCV for XGBoost and random forest, and GridSearchCV for SVM and KNN. SHAP (Shapley Additive Explanations) was used to assess model-agnostic feature importance. To resolve data imbalance, SMOTE (Synthetic Minority Oversampling Technique) was applied to ensure reliable classifications.</p><p><strong>Results: </strong>A cross-validated ROC-AUC (receiver operating characteristic area under the curve) score of 0.9117 highlighted the robustness of random forest in generalizing across datasets among the models tested. XGBoost followed closely, providing balanced accuracy in distinguishing between normal and prediabetic cases. While SVMs and KNNs performed adequately as baseline models, they exhibited limitations in sensitivity. The SHAP analysis indicated that BMI, age, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol emerged as the key predictors across models. The performance was significantly enhanced through hyperparameter tuning; for example, the ROC-AUC for SVM increased from 0.813 (default) to 0.863 (tuned). PCA kept 12 components while maintaining 95% of the variance in the dataset.</p><p><strong>Conclusions: </strong>It is demonstrated in this research that optimized ML models, especially random forest and XGBoost, are effective tools for assessing early prediabetes risk. Combining SHAP analysis with LASSO and PCA enhances transparency, supporting their integration in real-time clinical decision support systems. Future directions include validating these models in diverse clinical settings and integrating additional biomarkers to improve prediction accuracy, offering a promising avenue for early intervention and personalized treatment strategies in preventive health care.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e70621"},"PeriodicalIF":0.0,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12314567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach. 基于基因表达数据的机器学习的种族特异性前列腺癌检测框架:特征选择优化方法。
Pub Date : 2025-07-31 DOI: 10.2196/72423
David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad

Background: Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.

Objective: We aim to develop a classification method for diagnosing prostate cancer using gene expression in specific populations.

Methods: This research uses differentially expressed gene analysis, receiver operating characteristic analysis, and MSigDB (Molecular Signature Database) verification as a feature selection framework to identify genes for constructing support vector machine models.

Results: Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for White patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved a similarly strong performance, with 97% accuracy for White patients and 95% for African American patients, using only 9 gene features. It was trained on 374 samples and tested on 138 samples.

Conclusions: The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.

背景:以前使用基因表达数据进行前列腺癌检测的机器学习方法已经显示出显著的分类准确性。然而,先前的研究忽略了人群中种族多样性的影响以及基于表达谱选择异常基因的重要性。目的:建立一种基于特定人群基因表达的前列腺癌分类诊断方法。方法:采用差异表达基因分析、接收者操作特征分析和分子特征数据库(MSigDB)验证作为特征选择框架,识别用于构建支持向量机模型的基因。结果:在评估的模型中,基于388个训练样本和92个测试样本,使用139个基因特征实现了最高的观察准确性,白人患者的准确率为98%,非洲裔美国患者的准确率为97%。值得注意的是,另一个模型取得了类似的强劲表现,仅使用9个基因特征,白人患者的准确率为97%,非洲裔美国患者的准确率为95%。对374个样本进行了训练,对138个样本进行了测试。结论:研究结果确定了一种使用增强的特征选择和机器学习来检测前列腺癌的种族特异性诊断方法。这种方法强调了在特定人群中开发无偏见诊断工具的潜力。
{"title":"Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach.","authors":"David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad","doi":"10.2196/72423","DOIUrl":"10.2196/72423","url":null,"abstract":"<p><strong>Background: </strong>Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.</p><p><strong>Objective: </strong>We aim to develop a classification method for diagnosing prostate cancer using gene expression in specific populations.</p><p><strong>Methods: </strong>This research uses differentially expressed gene analysis, receiver operating characteristic analysis, and MSigDB (Molecular Signature Database) verification as a feature selection framework to identify genes for constructing support vector machine models.</p><p><strong>Results: </strong>Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for White patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved a similarly strong performance, with 97% accuracy for White patients and 95% for African American patients, using only 9 gene features. It was trained on 374 samples and tested on 138 samples.</p><p><strong>Conclusions: </strong>The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e72423"},"PeriodicalIF":0.0,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12314727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Natural Language Processing to Identify Symptomatic Adverse Events in Pediatric Oncology: Tutorial for Clinician Researchers. 使用自然语言处理识别儿童肿瘤的症状性不良事件:临床医生研究人员指南。
Pub Date : 2025-07-24 DOI: 10.2196/70751
Clifton P Thornton, Maryam Daniali, Lei Wang, Spandana Makeneni, Allison Barz Leahy

Unlabelled: Artificial intelligence (AI) is poised to become an integral component in health care research and delivery, promising to address complex challenges with unprecedented efficiency and precision. However, many clinicians lack training and experience with AI, and for those who wish to incorporate AI into research and practice, the path forward remains unclear. Technical barriers, institutional constraints, and lack of familiarity with computer and data science frequently stall progress. In this tutorial, we present a transparent account of our experiences as a newly established interdisciplinary team of clinical oncology researchers and data scientists working to develop a natural language processing model to identify symptomatic adverse events during pediatric cancer therapy. We outline the key steps for clinicians to consider as they explore the utility of AI in their inquiry and practice, including building a digital laboratory, curating a large clinical dataset, and developing early-stage AI models. We emphasize the invaluable role of institutional support, including financial and logistical resources, and dedicated and innovative computer and data scientists as equal partners in the research team. Our account highlights both facilitators and barriers encountered spanning financial support, learning curves inherent with interdisciplinary collaboration, and constraints of time and personnel. Through this narrative tutorial, we intend to demystify the process of AI research and equip clinicians with actionable steps to initiate new ventures in oncology research. As AI continues to reshape the research and practice landscapes, sharing insights from past successes and challenges will be essential to informing a clear path forward.

未标记:人工智能(AI)有望成为医疗保健研究和交付的一个组成部分,有望以前所未有的效率和精度解决复杂的挑战。然而,许多临床医生缺乏人工智能的培训和经验,对于那些希望将人工智能纳入研究和实践的人来说,前进的道路仍然不明朗。技术障碍、制度限制以及对计算机和数据科学的不熟悉往往会阻碍进展。在本教程中,我们展示了我们作为一个新成立的跨学科临床肿瘤学研究人员和数据科学家团队的经验,他们致力于开发一种自然语言处理模型,以识别儿童癌症治疗期间的症状性不良事件。我们概述了临床医生在探索人工智能在其调查和实践中的效用时需要考虑的关键步骤,包括建立数字实验室、管理大型临床数据集和开发早期人工智能模型。我们强调机构支持的宝贵作用,包括财政和后勤资源,以及专注和创新的计算机和数据科学家在研究团队中的平等伙伴地位。我们的描述强调了在财务支持、跨学科合作固有的学习曲线以及时间和人员限制方面遇到的促进因素和障碍。通过这个叙事教程,我们打算揭开人工智能研究过程的神秘面纱,并为临床医生提供可操作的步骤,以启动肿瘤学研究的新冒险。随着人工智能继续重塑研究和实践领域,分享过去成功和挑战的见解对于明确前进道路至关重要。
{"title":"Using Natural Language Processing to Identify Symptomatic Adverse Events in Pediatric Oncology: Tutorial for Clinician Researchers.","authors":"Clifton P Thornton, Maryam Daniali, Lei Wang, Spandana Makeneni, Allison Barz Leahy","doi":"10.2196/70751","DOIUrl":"10.2196/70751","url":null,"abstract":"<p><strong>Unlabelled: </strong>Artificial intelligence (AI) is poised to become an integral component in health care research and delivery, promising to address complex challenges with unprecedented efficiency and precision. However, many clinicians lack training and experience with AI, and for those who wish to incorporate AI into research and practice, the path forward remains unclear. Technical barriers, institutional constraints, and lack of familiarity with computer and data science frequently stall progress. In this tutorial, we present a transparent account of our experiences as a newly established interdisciplinary team of clinical oncology researchers and data scientists working to develop a natural language processing model to identify symptomatic adverse events during pediatric cancer therapy. We outline the key steps for clinicians to consider as they explore the utility of AI in their inquiry and practice, including building a digital laboratory, curating a large clinical dataset, and developing early-stage AI models. We emphasize the invaluable role of institutional support, including financial and logistical resources, and dedicated and innovative computer and data scientists as equal partners in the research team. Our account highlights both facilitators and barriers encountered spanning financial support, learning curves inherent with interdisciplinary collaboration, and constraints of time and personnel. Through this narrative tutorial, we intend to demystify the process of AI research and equip clinicians with actionable steps to initiate new ventures in oncology research. As AI continues to reshape the research and practice landscapes, sharing insights from past successes and challenges will be essential to informing a clear path forward.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e70751"},"PeriodicalIF":0.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288697/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing AI and Quantum Computing for Revolutionizing Drug Discovery and Approval Processes: Case Example for Collagen Toxicity. 利用人工智能和量子计算革新药物发现和批准过程:胶原蛋白毒性的案例。
Pub Date : 2025-07-22 DOI: 10.2196/69800
David Melvin Braga, Bharat Rawal

Unlabelled: Artificial intelligence (AI) and quantum computing will change the course of new drug discovery and approval. By generating computational data, predicting the efficacy of pharmaceuticals, and assessing their safety, AI and quantum computing can accelerate and optimize the process of identifying potential drug candidates. In this viewpoint, we demonstrate how computational models obtained from digital computers, AI, and quantum computing can reduce the number of laboratory and animal experiments; thus, computer-aided drug development can help to provide safe and effective combinations while minimizing the costs and time in drug development. To support this argument, 83 academic publications were reviewed, pharmaceutical manufacturers were interviewed, and AI was used to run computational data for determining the toxicity of collagen as a case example. The research evidence to date has mainly focused on the ability to create computational in silico data for comparison to actual laboratory data and the use of these data to discover or approve newly discovered drugs. In this context, "in silico" describes scientific studies performed using computer algorithms, simulations, or digital models to analyze biological, chemical, or physical processes without the need for laboratory (in vitro) or live (in vivo) experiments. Digital computers, AI, and quantum computing offer unique capabilities to tackle complex problems in drug discovery, which is a critical challenge in pharmaceutical research. Regulatory agents will need to adapt to these new technologies. Regulatory processes may become more streamlined, using adaptive clinical trials, accelerating pathways, and better integrating digital data to reduce the time and cost of bringing new drugs to market. Computational data methods could be used to reduce the cost and time involved in experimental drug discovery, allowing researchers to simulate biological interactions and screen large compound libraries more efficiently. Creating in silico data for drug discovery involves several stages, each using specific methods such as simulations, synthetic data generation, data augmentation, and tools to generate, collect, and affect human interaction to identify and develop new drugs.

未标注:人工智能(AI)和量子计算将改变新药发现和批准的过程。通过生成计算数据,预测药物的功效,评估其安全性,人工智能和量子计算可以加速和优化识别潜在候选药物的过程。在这个观点中,我们展示了从数字计算机、人工智能和量子计算中获得的计算模型如何减少实验室和动物实验的数量;因此,计算机辅助药物开发可以帮助提供安全有效的组合,同时最大限度地减少药物开发的成本和时间。为了支持这一观点,我们审查了83份学术出版物,采访了制药商,并以人工智能为例运行计算数据,以确定胶原蛋白的毒性。迄今为止的研究证据主要集中在创建计算机计算机数据以与实际实验室数据进行比较的能力,以及使用这些数据来发现或批准新发现的药物。在此背景下,“在硅”描述了使用计算机算法、模拟或数字模型来分析生物、化学或物理过程的科学研究,而不需要实验室(体外)或活体(体内)实验。数字计算机、人工智能和量子计算为解决药物发现中的复杂问题提供了独特的能力,这是药物研究中的一个关键挑战。监管机构需要适应这些新技术。监管流程可能会变得更加精简,使用适应性临床试验、加速途径和更好地整合数字数据,以减少将新药推向市场的时间和成本。计算数据方法可用于减少实验药物发现的成本和时间,使研究人员能够模拟生物相互作用并更有效地筛选大型化合物库。为药物发现创建计算机数据涉及几个阶段,每个阶段都使用特定的方法,如模拟、合成数据生成、数据增强和工具来生成、收集和影响人类相互作用,以识别和开发新药。
{"title":"Harnessing AI and Quantum Computing for Revolutionizing Drug Discovery and Approval Processes: Case Example for Collagen Toxicity.","authors":"David Melvin Braga, Bharat Rawal","doi":"10.2196/69800","DOIUrl":"10.2196/69800","url":null,"abstract":"<p><strong>Unlabelled: </strong>Artificial intelligence (AI) and quantum computing will change the course of new drug discovery and approval. By generating computational data, predicting the efficacy of pharmaceuticals, and assessing their safety, AI and quantum computing can accelerate and optimize the process of identifying potential drug candidates. In this viewpoint, we demonstrate how computational models obtained from digital computers, AI, and quantum computing can reduce the number of laboratory and animal experiments; thus, computer-aided drug development can help to provide safe and effective combinations while minimizing the costs and time in drug development. To support this argument, 83 academic publications were reviewed, pharmaceutical manufacturers were interviewed, and AI was used to run computational data for determining the toxicity of collagen as a case example. The research evidence to date has mainly focused on the ability to create computational in silico data for comparison to actual laboratory data and the use of these data to discover or approve newly discovered drugs. In this context, \"in silico\" describes scientific studies performed using computer algorithms, simulations, or digital models to analyze biological, chemical, or physical processes without the need for laboratory (in vitro) or live (in vivo) experiments. Digital computers, AI, and quantum computing offer unique capabilities to tackle complex problems in drug discovery, which is a critical challenge in pharmaceutical research. Regulatory agents will need to adapt to these new technologies. Regulatory processes may become more streamlined, using adaptive clinical trials, accelerating pathways, and better integrating digital data to reduce the time and cost of bringing new drugs to market. Computational data methods could be used to reduce the cost and time involved in experimental drug discovery, allowing researchers to simulate biological interactions and screen large compound libraries more efficiently. Creating in silico data for drug discovery involves several stages, each using specific methods such as simulations, synthetic data generation, data augmentation, and tools to generate, collect, and affect human interaction to identify and develop new drugs.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e69800"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12306909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing a Finite Element Model to Determine the Different Fixation Positions of Tracheal Catheters in the Oral Cavity for Minimizing the Risk of Oral Mucosal Pressure Injury: Comparison Study. 设计有限元模型确定气管导管在口腔内不同固定位置以降低口腔粘膜压力损伤风险的比较研究。
Pub Date : 2025-07-11 DOI: 10.2196/69298
Zhiwei Wang, Zhenghui Dong, Xiaoyan He, ZhenZhen Tao, Jinfang Qi, Yatian Zhang, Xian Ma

Background: Despite being an important life-saving medical device to ensure smooth breathing in critically ill patients, the tracheal tube causes damage to the oral mucosa of patients during use, which increases not only the pain but also the risk of infection.

Objective: This study aimed to establish finite element models for different fixation positions of tracheal catheters in the oral cavity to identify the optimal fixation position that minimizes the risk of oral mucosal pressure injury.

Methods: Computed tomography data of the head and face from healthy male subjects were selected, and a 3D finite element model was created using Mimics 21 and Geomagic Wrap 2021 software. A pressure sensor was used to measure the actual pressure exerted by the oral soft tissue on the upper and lower lips, as well as the left and right mouth corners of the tracheal catheter. The generated model was imported into Ansys Workbench 22.0 software, where all materials were assigned appropriate values, and boundary conditions were established. Vertical loads of 2.6 N and 3.43 N were applied to the upper and lower lips, while horizontal loads of 1.76 N and 1.82 N were applied to the left and right corners of the mouth, respectively, to observe the stress distribution characteristics of the skin, mucosa, and muscle tissue in four fixation areas.

Results: The mean (SD) equivalent stress and shear stress of the skin and mucosal tissues were the lowest in the left mouth corner (28.42 [0.65] kPa and 6.58 [0.16] kPa, respectively) and progressively increased in the right mouth corner (30.72 [0.98] kPa and 7.05 [0.32] kPa, respectively), upper lip (35.20 [0.99] kPa and 7.70 [0.17] kPa, respectively), and lower lip (41.79 [0.48] kPa and 10.02 [0.44] kPa, respectively; P<.001 for both stresses). The equivalent stress and shear stress of the muscle tissue were the lowest in the right mouth angle (34.35 [0.52] kPa and 5.69 [0.29] kPa, respectively) and progressively increased in the left mouth corner (35.64 [1.18] kPa and 5.74 [0.30] kPa, respectively), upper lip (43.17 [0.58] kPa and 8.91 [0.55] kPa, respectively), and lower lip (43.17 [0.58] kPa and 11.96 [0.50] kPa, respectively; P<.001 for both stresses). The equivalent stress and shear stress of muscle tissues were significantly greater than those of skin and mucosal tissues in the four fixed positions, and the difference was statistically significant (P<.05).

Conclusions: Fixation of the tracheal catheter at the left and right oral corners results in the lowest equivalent and shear stresses, while the lower lip exhibited the highest stresses. We recommend minimizing the contact time and area of the lower lip during tracheal catheter fixation, and to alternately replace the contact area at the left and right oral corners to prevent oral mucosal pressure injuries.

背景:气管管是保障危重患者呼吸顺畅的重要救生医疗器械,但在使用过程中,气管管对患者口腔黏膜造成损伤,不仅增加了患者的疼痛,而且增加了患者感染的风险。目的:本研究旨在建立气管导管在口腔内不同固定位置的有限元模型,以确定最大限度降低口腔黏膜压力损伤风险的最佳固定位置。方法:选取健康男性受试者头部和面部的ct数据,利用Mimics 21和Geomagic Wrap 2021软件建立三维有限元模型。使用压力传感器测量口腔软组织对上、下唇以及气管导管左右嘴角的实际压力。将生成的模型导入到Ansys Workbench 22.0软件中,在Workbench 22.0中为所有材料赋值,并建立边界条件。上下唇分别施加2.6 N和3.43 N的垂直载荷,嘴角左右分别施加1.76 N和1.82 N的水平载荷,观察四个固定区皮肤、黏膜和肌肉组织的应力分布特征。结果:皮肤和粘膜组织的平均等效应力和剪切应力(SD)在左嘴角最低,分别为28.42 [0.65]kPa和6.58 [0.16]kPa,右嘴角依次为30.72 [0.98]kPa和7.05 [0.32]kPa,上唇依次为35.20 [0.99]kPa和7.70 [0.17]kPa,下唇依次为41.79 [0.48]kPa和10.02 [0.44]kPa;结论:气管导管在左右口角固定时等效应力和剪切应力最小,下唇处应力最大。我们建议在气管导管固定时尽量减少下唇接触时间和面积,并交替更换左右嘴角接触面积,以防止口腔黏膜压力损伤。
{"title":"Designing a Finite Element Model to Determine the Different Fixation Positions of Tracheal Catheters in the Oral Cavity for Minimizing the Risk of Oral Mucosal Pressure Injury: Comparison Study.","authors":"Zhiwei Wang, Zhenghui Dong, Xiaoyan He, ZhenZhen Tao, Jinfang Qi, Yatian Zhang, Xian Ma","doi":"10.2196/69298","DOIUrl":"10.2196/69298","url":null,"abstract":"<p><strong>Background: </strong>Despite being an important life-saving medical device to ensure smooth breathing in critically ill patients, the tracheal tube causes damage to the oral mucosa of patients during use, which increases not only the pain but also the risk of infection.</p><p><strong>Objective: </strong>This study aimed to establish finite element models for different fixation positions of tracheal catheters in the oral cavity to identify the optimal fixation position that minimizes the risk of oral mucosal pressure injury.</p><p><strong>Methods: </strong>Computed tomography data of the head and face from healthy male subjects were selected, and a 3D finite element model was created using Mimics 21 and Geomagic Wrap 2021 software. A pressure sensor was used to measure the actual pressure exerted by the oral soft tissue on the upper and lower lips, as well as the left and right mouth corners of the tracheal catheter. The generated model was imported into Ansys Workbench 22.0 software, where all materials were assigned appropriate values, and boundary conditions were established. Vertical loads of 2.6 N and 3.43 N were applied to the upper and lower lips, while horizontal loads of 1.76 N and 1.82 N were applied to the left and right corners of the mouth, respectively, to observe the stress distribution characteristics of the skin, mucosa, and muscle tissue in four fixation areas.</p><p><strong>Results: </strong>The mean (SD) equivalent stress and shear stress of the skin and mucosal tissues were the lowest in the left mouth corner (28.42 [0.65] kPa and 6.58 [0.16] kPa, respectively) and progressively increased in the right mouth corner (30.72 [0.98] kPa and 7.05 [0.32] kPa, respectively), upper lip (35.20 [0.99] kPa and 7.70 [0.17] kPa, respectively), and lower lip (41.79 [0.48] kPa and 10.02 [0.44] kPa, respectively; P<.001 for both stresses). The equivalent stress and shear stress of the muscle tissue were the lowest in the right mouth angle (34.35 [0.52] kPa and 5.69 [0.29] kPa, respectively) and progressively increased in the left mouth corner (35.64 [1.18] kPa and 5.74 [0.30] kPa, respectively), upper lip (43.17 [0.58] kPa and 8.91 [0.55] kPa, respectively), and lower lip (43.17 [0.58] kPa and 11.96 [0.50] kPa, respectively; P<.001 for both stresses). The equivalent stress and shear stress of muscle tissues were significantly greater than those of skin and mucosal tissues in the four fixed positions, and the difference was statistically significant (P<.05).</p><p><strong>Conclusions: </strong>Fixation of the tracheal catheter at the left and right oral corners results in the lowest equivalent and shear stresses, while the lower lip exhibited the highest stresses. We recommend minimizing the contact time and area of the lower lip during tracheal catheter fixation, and to alternately replace the contact area at the left and right oral corners to prevent oral mucosal pressure injuries.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e69298"},"PeriodicalIF":0.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12274052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR bioinformatics and biotechnology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1