Pub Date : 2024-09-18DOI: 10.1186/s40537-024-00985-8
Esraa Hassan, Samar Elbedwehy, Mahmoud Y. Shams, Tarek Abd El-Hafeez, Nora El-Rashidy
This study introduces a novel deep learning-based approach for classifying poultry audio signals, incorporating a custom Burn Layer to enhance model robustness. The methodology integrates digital audio signal processing, convolutional neural networks (CNNs), and the innovative Burn Layer, which injects controlled random noise during training to reinforce the model's resilience to input signal variations. The proposed architecture is streamlined, with convolutional blocks, densely connected layers, dropout, and an additional Burn Layer to fortify robustness. The model demonstrates efficiency by reducing trainable parameters to 191,235, compared to traditional architectures with over 1.7 million parameters. The proposed model utilizes a Burn Layer with burn intensity as a parameter and an Adamax optimizer to optimize and address the overfitting problem. Thorough evaluation using six standard classification metrics showcases the model's superior performance, achieving exceptional sensitivity (96.77%), specificity (100.00%), precision (100.00%), negative predictive value (NPV) (95.00%), accuracy (98.55%), F1 score (98.36%), and Matthew’s correlation coefficient (MCC) (95.88%). This research contributes valuable insights into the fields of audio signal processing, animal health monitoring, and robust deep-learning classification systems. The proposed model presents a systematic approach for developing and evaluating a deep learning-based poultry audio classification system. It processes raw audio data and labels to generate digital representations, utilizes a Burn Layer for training variability, and constructs a CNN model with convolutional blocks, pooling, and dense layers. The model is optimized using the Adamax algorithm and trained with data augmentation and early-stopping techniques. Rigorous assessment on a test dataset using standard metrics demonstrates the model's robustness and efficiency, with the potential to significantly advance animal health monitoring and disease detection through audio signal analysis.
{"title":"Optimizing poultry audio signal classification with deep learning and burn layer fusion","authors":"Esraa Hassan, Samar Elbedwehy, Mahmoud Y. Shams, Tarek Abd El-Hafeez, Nora El-Rashidy","doi":"10.1186/s40537-024-00985-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00985-8","url":null,"abstract":"<p>This study introduces a novel deep learning-based approach for classifying poultry audio signals, incorporating a custom Burn Layer to enhance model robustness. The methodology integrates digital audio signal processing, convolutional neural networks (CNNs), and the innovative Burn Layer, which injects controlled random noise during training to reinforce the model's resilience to input signal variations. The proposed architecture is streamlined, with convolutional blocks, densely connected layers, dropout, and an additional Burn Layer to fortify robustness. The model demonstrates efficiency by reducing trainable parameters to 191,235, compared to traditional architectures with over 1.7 million parameters. The proposed model utilizes a Burn Layer with burn intensity as a parameter and an Adamax optimizer to optimize and address the overfitting problem. Thorough evaluation using six standard classification metrics showcases the model's superior performance, achieving exceptional sensitivity (96.77%), specificity (100.00%), precision (100.00%), negative predictive value (NPV) (95.00%), accuracy (98.55%), F1 score (98.36%), and Matthew’s correlation coefficient (MCC) (95.88%). This research contributes valuable insights into the fields of audio signal processing, animal health monitoring, and robust deep-learning classification systems. The proposed model presents a systematic approach for developing and evaluating a deep learning-based poultry audio classification system. It processes raw audio data and labels to generate digital representations, utilizes a Burn Layer for training variability, and constructs a CNN model with convolutional blocks, pooling, and dense layers. The model is optimized using the Adamax algorithm and trained with data augmentation and early-stopping techniques. Rigorous assessment on a test dataset using standard metrics demonstrates the model's robustness and efficiency, with the potential to significantly advance animal health monitoring and disease detection through audio signal analysis.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1186/s40537-024-00991-w
Doaa El-Shahat, Ahmed Tolba, Mohamed Abouhawwash, Mohamed Abdel-Basset
In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the Adjusted R2 score, Normalized Root Mean Square Error (NRMSE), Mean Absolute Deviation (MAD), Mean Absolute Error (MAE) and Mean Square Error (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with Adjusted R2 score value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with Adjusted R2 score value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.
2023 年底,在迪拜举行的联合国气候变化大会(COP28)鼓励尽快从化石燃料转向可再生能源。太阳能是最有前途的可持续和可再生能源之一。一般来说,光伏系统将太阳辐照转化为电能。遗憾的是,太阳辐射的不稳定性和间歇性会导致电力生产中断。对太阳辐照度的准确预测可确保即使在没有太阳辐照度的情况下也能持续发电。电池可以储存太阳能,以便在没有太阳能时使用。此外,确定性模型考虑了技术光伏系统的规格,在太阳辐照度较低时可能并不准确。本文对短期太阳辐照度预测中最常用的深度学习(DL)和机器学习(ML)算法进行了比较研究。数据集收集于伊斯兰堡,时间跨度为五年(2015 年至 2019 年),使用精确的气象传感器以小时为间隔进行收集。此外,还为 ML 和 DL 模型引入了网格搜索交叉验证 (GSCV),以优化这些模型的超参数。评估算法时使用了几个性能指标,如调整后 R2 分数、归一化均方根误差(NRMSE)、平均绝对偏差(MAD)、平均绝对误差(MAE)和平均平方误差(MSE)。统计分析结果表明,CNN-LSTM 的调整后 R2 得分为 0.984,优于九种著名的 DL 模型。在 ML 算法中,梯度提升回归是一种有效的预测方法,其调整后 R2 得分为 0.962,优于 6 个 ML 模型的对手。此外,SHAP 和 LIME 是可解释人工智能(XAI)的范例,可用于理解所得结果背后的原因。
{"title":"Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting","authors":"Doaa El-Shahat, Ahmed Tolba, Mohamed Abouhawwash, Mohamed Abdel-Basset","doi":"10.1186/s40537-024-00991-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00991-w","url":null,"abstract":"<p>In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the <i>Adjusted R</i><sup><i>2</i></sup><i> score</i>, <i>Normalized Root Mean Square Error</i> (NRMSE), <i>Mean Absolute Deviation</i> (MAD), <i>Mean Absolute Error</i> (MAE) and <i>Mean Square Error</i> (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with <i>Adjusted R</i><sup><i>2</i></sup><i> score</i> value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with <i>Adjusted R</i><sup><i>2</i></sup><i> score</i> value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"13 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1186/s40537-024-00994-7
Ali Mohammed Alsaffar, Mostafa Nouri-Baygi, Hamed M. Zolbanin
The frequent usage of computer networks and the Internet has made computer networks vulnerable to numerous attacks, highlighting the critical need to enhance the precision of security mechanisms. One of the most essential measures to safeguard networking resources and infrastructures is an intrusion detection system (IDS). IDSs are widely used to detect, identify, and track malicious threats. Although various machine learning algorithms have been used successfully in IDSs, they are still suffering from low prediction performances. One reason behind the low accuracy of IDSs is that existing network traffic datasets have high computational complexities that are mainly caused by redundant, incomplete, and irrelevant features. Furthermore, standalone classifiers exhibit restricted classification performance and typically fail to produce satisfactory outcomes when dealing with imbalanced, multi-category traffic data. To address these issues, we propose an efficient intrusion detection model, which is based on hybrid feature selection and stack ensemble learning. Our hybrid feature selection method, called MI-Boruta, combines mutual information (MI) as a filter method and the Boruta algorithm as a wrapper method to determine optimal features from our datasets. Then, we apply stacked ensemble learning by using random forest (RF), Catboost, and XGBoost algorithms as base learners with multilayer perceptron (MLP) as meta-learner. We test our intrusion detection model on two widely recognized benchmark datasets, namely UNSW-NB15 and CICIDS2017. We show that our proposed IDS outperforms existing IDSs in almost all performance criteria, including accuracy, recall, precision, F1-Score, false positive rate, true positive rate, and error rate.
{"title":"Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning","authors":"Ali Mohammed Alsaffar, Mostafa Nouri-Baygi, Hamed M. Zolbanin","doi":"10.1186/s40537-024-00994-7","DOIUrl":"https://doi.org/10.1186/s40537-024-00994-7","url":null,"abstract":"<p>The frequent usage of computer networks and the Internet has made computer networks vulnerable to numerous attacks, highlighting the critical need to enhance the precision of security mechanisms. One of the most essential measures to safeguard networking resources and infrastructures is an intrusion detection system (IDS). IDSs are widely used to detect, identify, and track malicious threats. Although various machine learning algorithms have been used successfully in IDSs, they are still suffering from low prediction performances. One reason behind the low accuracy of IDSs is that existing network traffic datasets have high computational complexities that are mainly caused by redundant, incomplete, and irrelevant features. Furthermore, standalone classifiers exhibit restricted classification performance and typically fail to produce satisfactory outcomes when dealing with imbalanced, multi-category traffic data. To address these issues, we propose an efficient intrusion detection model, which is based on hybrid feature selection and stack ensemble learning. Our hybrid feature selection method, called MI-Boruta, combines mutual information (MI) as a filter method and the Boruta algorithm as a wrapper method to determine optimal features from our datasets. Then, we apply stacked ensemble learning by using random forest (RF), Catboost, and XGBoost algorithms as base learners with multilayer perceptron (MLP) as meta-learner. We test our intrusion detection model on two widely recognized benchmark datasets, namely UNSW-NB15 and CICIDS2017. We show that our proposed IDS outperforms existing IDSs in almost all performance criteria, including accuracy, recall, precision, F1-Score, false positive rate, true positive rate, and error rate.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"19 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The tumor microenvironment (TME) provides a region for intricate interactions within or between immune and non-immune cells. We aimed to reveal the tissue architecture and comprehensive landscape of cells within the TME of colorectal cancer (CRC).
Methods
Fresh frozen invasive adenocarcinoma of the large intestine tissue from 10× Genomics Datasets was obtained from BioIVT Asterand. The integration of microarray-based spatial transcriptomics (ST) and RNA sequencing (RNA-seq) was applied to characterize gene expression and cell landscape within the TME of CRC tissue architecture. Multiple R packages and deconvolution algorithms including MCPcounter, XCELL, EPIC, and ESTIMATE methods were performed for further immune distribution analysis.
Results
The subpopulations of immune and non-immune cells within the TME of the CRC tissue architecture were appropriately annotated. According to ST and RNA-seq analyses, a heterogeneous spatial atlas of gene distribution and cell landscape was comprehensively characterized. We distinguished between the cancer and stromal regions of CRC tissues. As expected, epithelial cells were located in the cancerous region, whereas fibroblasts were mainly located in the stroma. In addition, the fibroblasts were further subdivided into two subgroups (F1 and F2) according to the differentially expressed genes (DEGs), which were mainly enriched in pathways including hallmark-oxidative-phosphorylation, hallmark-e2f-targets and hallmark-unfolded-protein-response. Furthermore, the top 5 DEGs, SPP1, CXCL10, APOE, APOC1, and LYZ, were found to be closely related to immunoregulation of the TME, methylation, and survival of CRC patients.
Conclusions
This study characterized the heterogeneous spatial landscape of various cell subtypes within the TME of the tissue architecture. The TME-related roles of fibroblast subsets addressed the potential crosstalk among diverse cells.
{"title":"Integrating microarray-based spatial transcriptomics and RNA-seq reveals tissue architecture in colorectal cancer","authors":"Zheng Li, Xiaojie Zhang, Chongyuan Sun, Zefeng Li, He Fei, Dongbing Zhao","doi":"10.1186/s40537-024-00992-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00992-9","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>The tumor microenvironment (TME) provides a region for intricate interactions within or between immune and non-immune cells. We aimed to reveal the tissue architecture and comprehensive landscape of cells within the TME of colorectal cancer (CRC).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Fresh frozen invasive adenocarcinoma of the large intestine tissue from 10× Genomics Datasets was obtained from BioIVT Asterand. The integration of microarray-based spatial transcriptomics (ST) and RNA sequencing (RNA-seq) was applied to characterize gene expression and cell landscape within the TME of CRC tissue architecture. Multiple R packages and deconvolution algorithms including MCPcounter, XCELL, EPIC, and ESTIMATE methods were performed for further immune distribution analysis.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The subpopulations of immune and non-immune cells within the TME of the CRC tissue architecture were appropriately annotated. According to ST and RNA-seq analyses, a heterogeneous spatial atlas of gene distribution and cell landscape was comprehensively characterized. We distinguished between the cancer and stromal regions of CRC tissues. As expected, epithelial cells were located in the cancerous region, whereas fibroblasts were mainly located in the stroma. In addition, the fibroblasts were further subdivided into two subgroups (F1 and F2) according to the differentially expressed genes (DEGs), which were mainly enriched in pathways including hallmark-oxidative-phosphorylation, hallmark-e2f-targets and hallmark-unfolded-protein-response. Furthermore, the top 5 DEGs, SPP1, CXCL10, APOE, APOC1, and LYZ, were found to be closely related to immunoregulation of the TME, methylation, and survival of CRC patients.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>This study characterized the heterogeneous spatial landscape of various cell subtypes within the TME of the tissue architecture. The TME-related roles of fibroblast subsets addressed the potential crosstalk among diverse cells.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"26 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1186/s40537-024-00968-9
Wei Feng, Bingjie Wang, Dan Song, Mengda Li, Anming Chen, Jing Wang, Siyong Lin, Yiran Zhao, Bin Wang, Zongyuan Ge, Shuyi Xu, Yuntao Hu
Diabetic retinopathy (DR) is the most prevalent cause of preventable vision loss worldwide, imposing a significant economic and medical burden on society today, of which early identification is the cornerstones of the management. The diagnosis and severity grading of DR rely on scales based on clinical visualized features, but lack detailed quantitative parameters. Retinal non-perfusion area (NPA) is a pathogenic characteristic of DR that symbolizes retinal hypoxia conditions, and was found to be intimately associated with disease progression, prognosis, and management. However, the practical value of NPA is constrained since it appears on fundus fluorescein angiography (FFA) as distributed, irregularly shaped, darker plaques that are challenging to measure manually. In this study, we propose a deep learning-based method, NPA-Net, for accurate and automatic segmentation of NPAs from FFA images acquired in clinical practice. NPA-Net uses the U-net structure as the basic backbone, which has an encoder-decoder model structure. To enhance the recognition performance of the model for NPA, we adaptively incorporate multi-scale features and contextual information in feature learning and design three modules: Adaptive Encoder Feature Fusion (AEFF) module, Multilayer Deep Supervised Loss, and Atrous Spatial Pyramid Pooling (ASPP) module, which enhance the recognition ability of the model for NPAs of different sizes from different perspectives. We conducted extensive experiments on a clinical dataset with 163 eyes with NPAs manually annotated by ophthalmologists, and NPA-Net achieved better segmentation performance compared to other existing methods with an area under the receiver operating characteristic curve (AUC) of 0.9752, accuracy of 0.9431, sensitivity of 0.8794, specificity of 0.9459, IOU of 0.3876 and Dice of 0.5686. This new automatic segmentation model is useful for identifying NPA in clinical practice, generating quantitative parameters that can be useful for further research as well as guiding DR detection, grading severity, treatment planning, and prognosis.
{"title":"Development and evaluation of a deep learning model for automatic segmentation of non-perfusion area in fundus fluorescein angiography","authors":"Wei Feng, Bingjie Wang, Dan Song, Mengda Li, Anming Chen, Jing Wang, Siyong Lin, Yiran Zhao, Bin Wang, Zongyuan Ge, Shuyi Xu, Yuntao Hu","doi":"10.1186/s40537-024-00968-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00968-9","url":null,"abstract":"<p>Diabetic retinopathy (DR) is the most prevalent cause of preventable vision loss worldwide, imposing a significant economic and medical burden on society today, of which early identification is the cornerstones of the management. The diagnosis and severity grading of DR rely on scales based on clinical visualized features, but lack detailed quantitative parameters. Retinal non-perfusion area (NPA) is a pathogenic characteristic of DR that symbolizes retinal hypoxia conditions, and was found to be intimately associated with disease progression, prognosis, and management. However, the practical value of NPA is constrained since it appears on fundus fluorescein angiography (FFA) as distributed, irregularly shaped, darker plaques that are challenging to measure manually. In this study, we propose a deep learning-based method, NPA-Net, for accurate and automatic segmentation of NPAs from FFA images acquired in clinical practice. NPA-Net uses the U-net structure as the basic backbone, which has an encoder-decoder model structure. To enhance the recognition performance of the model for NPA, we adaptively incorporate multi-scale features and contextual information in feature learning and design three modules: Adaptive Encoder Feature Fusion (AEFF) module, Multilayer Deep Supervised Loss, and Atrous Spatial Pyramid Pooling (ASPP) module, which enhance the recognition ability of the model for NPAs of different sizes from different perspectives. We conducted extensive experiments on a clinical dataset with 163 eyes with NPAs manually annotated by ophthalmologists, and NPA-Net achieved better segmentation performance compared to other existing methods with an area under the receiver operating characteristic curve (AUC) of 0.9752, accuracy of 0.9431, sensitivity of 0.8794, specificity of 0.9459, IOU of 0.3876 and Dice of 0.5686. This new automatic segmentation model is useful for identifying NPA in clinical practice, generating quantitative parameters that can be useful for further research as well as guiding DR detection, grading severity, treatment planning, and prognosis.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"37 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The long-term impacts of COVID-19 on human health are a major concern, yet comprehensive evaluations of its effects on various health conditions are lacking.
Methods
This study aims to evaluate the role of various diseases in relation to COVID-19 by analyzing genetic data from a large-scale population over 2,000,000 individuals. A bidirectional two-sample Mendelian randomization approach was used, with exposures including COVID-19 susceptibility, hospitalization, and severity, and outcomes encompassing 86 different diseases or traits. A reverse Mendelian randomization analysis was performed to assess the impact of these diseases on COVID-19.
Results
Our analysis identified causal relationships between COVID-19 susceptibility and several conditions, including breast cancer (OR = 1.0073, 95% CI = 1.0032–1.0114, p = 5 × 10 − 4), ER + breast cancer (OR = 0.5252, 95% CI = 0.3589–0.7685, p = 9 × 10 − 4), and heart failure (OR = 1.0026, 95% CI = 1.001–1.0042, p = 0.002). COVID-19 hospitalization was causally linked to heart failure (OR = 1.0017, 95% CI = 1.0006–1.0028, p = 0.002) and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, p = 0.0006). COVID-19 severity had causal effects on primary biliary cirrhosis (OR = 2.6333, 95% CI = 1.8274–3.7948, p = 2.059 × 10 − 7), celiac disease (OR = 0.0708, 95% CI = 0.0538–0.0932, p = 9.438 × 10–80), and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, p = 0.0006). Reverse MR analysis indicated that rheumatoid arthritis, diabetic nephropathy, multiple sclerosis, and total testosterone (female) influence COVID-19 outcomes. We assessed heterogeneity and horizontal pleiotropy to ensure result reliability and employed the Steiger directionality test to confirm the direction of causality.
Conclusions
This study provides a comprehensive analysis of the causal relationships between COVID-19 and diverse health conditions. Our findings highlight the long-term impacts of COVID-19 on human health, emphasizing the need for continuous monitoring and targeted interventions for affected individuals. Future research should explore these relationships to develop comprehensive healthcare strategies.
背景COVID-19对人类健康的长期影响是一个重大问题,但目前还缺乏对其对各种健康状况影响的全面评估。方法本研究旨在通过分析超过200万人的大规模人群的遗传数据,评估各种疾病与COVID-19的关系。研究采用了双向双样本孟德尔随机化方法,暴露包括 COVID-19 易感性、住院和严重程度,结果包括 86 种不同的疾病或性状。结果我们的分析确定了 COVID-19 易感性与几种疾病之间的因果关系,包括乳腺癌(OR = 1.0073, 95% CI = 1.0032-1.0114, p = 5 × 10 - 4)、ER + 乳腺癌(OR = 0.5252, 95% CI = 0.3589-0.7685, p = 9 × 10 - 4)和心力衰竭(OR = 1.0026, 95% CI = 1.001-1.0042, p = 0.002)。COVID-19住院与心力衰竭(OR = 1.0017,95% CI = 1.0006-1.0028,p = 0.002)和阿尔茨海默病(OR = 1.5092,95% CI = 1.1942-1.9072,p = 0.0006)有因果关系。COVID-19 严重程度对原发性胆汁性肝硬化(OR = 2.6333,95% CI = 1.8274-3.7948,p = 2.059 × 10-7)、乳糜泻(OR = 0.0708,95% CI = 0.0538-0.0932,p = 9.438 × 10-80)和阿尔茨海默病(OR = 1.5092,95% CI = 1.1942-1.9072,p = 0.0006)有因果效应。反向 MR 分析表明,类风湿性关节炎、糖尿病肾病、多发性硬化症和总睾酮(女性)会影响 COVID-19 的结果。我们评估了异质性和水平多向性,以确保结果的可靠性,并采用 Steiger 方向性检验来确认因果关系的方向。我们的研究结果突显了 COVID-19 对人类健康的长期影响,强调了对受影响人群进行持续监测和有针对性干预的必要性。未来的研究应探讨这些关系,以制定全面的医疗保健策略。
{"title":"Leveraging large-scale genetic data to assess the causal impact of COVID-19 on multisystemic diseases","authors":"Xiangyang Zhang, Zhaohui Jiang, Jiayao Ma, Yaru Qi, Yin Li, Yan Zhang, Yihan Liu, Chaochao Wei, Yihong Chen, Ping Liu, Yinghui Peng, Jun Tan, Ying Han, Shan Zeng, Changjing Cai, Hong Shen","doi":"10.1186/s40537-024-00997-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00997-4","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>The long-term impacts of COVID-19 on human health are a major concern, yet comprehensive evaluations of its effects on various health conditions are lacking.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>This study aims to evaluate the role of various diseases in relation to COVID-19 by analyzing genetic data from a large-scale population over 2,000,000 individuals. A bidirectional two-sample Mendelian randomization approach was used, with exposures including COVID-19 susceptibility, hospitalization, and severity, and outcomes encompassing 86 different diseases or traits. A reverse Mendelian randomization analysis was performed to assess the impact of these diseases on COVID-19.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Our analysis identified causal relationships between COVID-19 susceptibility and several conditions, including breast cancer (OR = 1.0073, 95% CI = 1.0032–1.0114, <i>p</i> = 5 × 10 − 4), ER + breast cancer (OR = 0.5252, 95% CI = 0.3589–0.7685, <i>p</i> = 9 × 10 − 4), and heart failure (OR = 1.0026, 95% CI = 1.001–1.0042, <i>p</i> = 0.002). COVID-19 hospitalization was causally linked to heart failure (OR = 1.0017, 95% CI = 1.0006–1.0028, <i>p</i> = 0.002) and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, <i>p</i> = 0.0006). COVID-19 severity had causal effects on primary biliary cirrhosis (OR = 2.6333, 95% CI = 1.8274–3.7948, <i>p</i> = 2.059 × 10 − 7), celiac disease (OR = 0.0708, 95% CI = 0.0538–0.0932, <i>p</i> = 9.438 × 10–80), and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, <i>p</i> = 0.0006). Reverse MR analysis indicated that rheumatoid arthritis, diabetic nephropathy, multiple sclerosis, and total testosterone (female) influence COVID-19 outcomes. We assessed heterogeneity and horizontal pleiotropy to ensure result reliability and employed the Steiger directionality test to confirm the direction of causality.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>This study provides a comprehensive analysis of the causal relationships between COVID-19 and diverse health conditions. Our findings highlight the long-term impacts of COVID-19 on human health, emphasizing the need for continuous monitoring and targeted interventions for affected individuals. Future research should explore these relationships to develop comprehensive healthcare strategies.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"1 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The image object recognition and detection technology are widely used in many scenarios. In recent years, big data has become increasingly abundant, and big data-driven artificial intelligence models have attracted more and more attention. Evolutionary computation has also provided a powerful driving force for the optimization and improvement of deep learning models. In this paper, we propose an image object detection method based on self-supervised and data-driven learning. Differ from other methods, our approach stands out due to its innovative use of multispectral data fusion and evolutionary computation for model optimization. Specifically, our method uniquely combines visible light images and infrared images to detect and identify image targets. Firstly, we utilize a self-supervised learning method and the AutoEncoder model to perform high-dimensional feature extraction on the two types of images. Secondly, we fuse the extracted features from the visible light and infrared images to detect and identify objects. Thirdly, we introduce a model parameter optimization method using evolutionary learning algorithms to enhance model performance. Validation on public datasets shows that our method achieves comparable or superior performance to existing methods.
{"title":"Evolutionary computation-based self-supervised learning for image processing: a big data-driven approach to feature extraction and fusion for multispectral object detection","authors":"Xiaoyang Shen, Haibin Li, Achyut Shankar, Wattana Viriyasitavat, Vinay Chamola","doi":"10.1186/s40537-024-00988-5","DOIUrl":"https://doi.org/10.1186/s40537-024-00988-5","url":null,"abstract":"<p>The image object recognition and detection technology are widely used in many scenarios. In recent years, big data has become increasingly abundant, and big data-driven artificial intelligence models have attracted more and more attention. Evolutionary computation has also provided a powerful driving force for the optimization and improvement of deep learning models. In this paper, we propose an image object detection method based on self-supervised and data-driven learning. Differ from other methods, our approach stands out due to its innovative use of multispectral data fusion and evolutionary computation for model optimization. Specifically, our method uniquely combines visible light images and infrared images to detect and identify image targets. Firstly, we utilize a self-supervised learning method and the AutoEncoder model to perform high-dimensional feature extraction on the two types of images. Secondly, we fuse the extracted features from the visible light and infrared images to detect and identify objects. Thirdly, we introduce a model parameter optimization method using evolutionary learning algorithms to enhance model performance. Validation on public datasets shows that our method achieves comparable or superior performance to existing methods.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"6 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1186/s40537-024-00965-y
Asefeh Asemi, Adeleh Asemi, Andrea Ko
This article presents an investment recommender system based on an Adaptive Neuro-Fuzzy Inference System (ANFIS) and pre-trained weights from a Multimodal Neural Network (MNN). The model is designed to support the investment process for the customers and takes into consideration seven factors to implement the proposed investment system model through the customer or potential investor data set. The system takes input from a web-based questionnaire that collects data on investors' preferences and investment goals. The data is then preprocessed and clustered using ETL tools, JMP, MATLAB, and Python. The ANFIS-based recommender system is designed with three inputs and one output and trained using a hybrid approach over three epochs with 188 data pairs and 18 fuzzy rules. The system's performance is evaluated using metrics such as RMSE, accuracy, precision, recall, and F1-score. The system is also designed to incorporate expert feedback and opinions from investors to customize and improve investment recommendations. The article concludes that the proposed ANFIS-based investment recommender system is effective and accurate in generating investment recommendations that meet investors' preferences and goals.