An evaluation of machine learning approaches for early diagnosis of autism spectrum disorder

Healthcare analytics (New York, N.Y.) Pub Date : 2024-01-04 DOI:10.1016/j.health.2023.100293

Rownak Ara Rasul , Promy Saha , Diponkor Bala , S.M. Rakib Ul Karim , Md. Ibrahim Abdullah , Bishwajit Saha

{"title":"An evaluation of machine learning approaches for early diagnosis of autism spectrum disorder","authors":"Rownak Ara Rasul , Promy Saha , Diponkor Bala , S.M. Rakib Ul Karim , Md. Ibrahim Abdullah , Bishwajit Saha","doi":"10.1016/j.health.2023.100293","DOIUrl":null,"url":null,"abstract":"<div><p>Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at <span>GitHub</span><svg><path></path></svg>.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"5 ","pages":"Article 100293"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442523001600/pdfft?md5=e0fd6cd67baa47c33181f21a1d4a70e4&pid=1-s2.0-S2772442523001600-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442523001600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. While its primary origin lies in genetics, early detection is crucial, and leveraging machine learning offers a promising avenue for a faster and more cost-effective diagnosis. This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process. We study eight state-of-the-art classification models to determine their effectiveness in ASD detection. We evaluate the models using accuracy, precision, recall, specificity, F1-score, area under the curve (AUC), kappa, and log loss metrics to find the best classifier for these binary datasets. Among all the classification models, for the children dataset, the SVM and LR models achieve the highest accuracy of 100% and for the adult dataset, the LR model produces the highest accuracy of 97.14%. Our proposed ANN model provides the highest accuracy of 94.24% for the new combined dataset when hyperparameters are precisely tuned for each model. As almost all classification models achieve high accuracy which utilize true labels, we become interested in delving into five popular clustering algorithms to understand model behavior in scenarios without true labels. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI and ARI metrics while demonstrating comparability to the optimal SC achieved by k-means. The implemented code is available at GitHub.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估自闭症谱系障碍早期诊断的机器学习方法

自闭症（ASD）是一种以社交、沟通和重复性活动困难为特征的神经系统疾病。虽然自闭症的主要病因在于遗传，但早期检测至关重要，而利用机器学习为更快、更具成本效益的诊断提供了一条大有可为的途径。本研究采用多种机器学习方法来识别 ASD 的关键特征，旨在提高诊断过程的效率和自动化程度。我们研究了八个最先进的分类模型，以确定它们在 ASD 检测中的有效性。我们使用准确度、精确度、召回率、特异性、F1-分数、曲线下面积（AUC）、卡帕和对数损失指标对模型进行评估，以找到这些二元数据集的最佳分类器。在所有分类模型中，对于儿童数据集，SVM 和 LR 模型的准确率最高，达到 100%；对于成人数据集，LR 模型的准确率最高，达到 97.14%。在对每个模型的超参数进行精确调整后，我们提出的 ANN 模型在新的组合数据集上的准确率最高，达到 94.24%。由于几乎所有使用真实标签的分类模型都能达到很高的准确率，因此我们有兴趣深入研究五种流行的聚类算法，以了解模型在无真实标签情况下的行为。我们计算归一化互信息（NMI）、调整后兰德指数（ARI）和轮廓系数（SC）指标来选择最佳聚类模型。我们的评估发现，就 NMI 和 ARI 指标而言，频谱聚类优于所有其他基准聚类模型，同时与 k-means 实现的最佳 SC 具有可比性。实现代码可在 GitHub 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊