Journal of Big Data最新文献

Optimizing poultry audio signal classification with deep learning and burn layer fusion 利用深度学习和燃烧层融合优化家禽音频信号分类

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-18 DOI: 10.1186/s40537-024-00985-8

Esraa Hassan, Samar Elbedwehy, Mahmoud Y. Shams, Tarek Abd El-Hafeez, Nora El-Rashidy

This study introduces a novel deep learning-based approach for classifying poultry audio signals, incorporating a custom Burn Layer to enhance model robustness. The methodology integrates digital audio signal processing, convolutional neural networks (CNNs), and the innovative Burn Layer, which injects controlled random noise during training to reinforce the model's resilience to input signal variations. The proposed architecture is streamlined, with convolutional blocks, densely connected layers, dropout, and an additional Burn Layer to fortify robustness. The model demonstrates efficiency by reducing trainable parameters to 191,235, compared to traditional architectures with over 1.7 million parameters. The proposed model utilizes a Burn Layer with burn intensity as a parameter and an Adamax optimizer to optimize and address the overfitting problem. Thorough evaluation using six standard classification metrics showcases the model's superior performance, achieving exceptional sensitivity (96.77%), specificity (100.00%), precision (100.00%), negative predictive value (NPV) (95.00%), accuracy (98.55%), F1 score (98.36%), and Matthew’s correlation coefficient (MCC) (95.88%). This research contributes valuable insights into the fields of audio signal processing, animal health monitoring, and robust deep-learning classification systems. The proposed model presents a systematic approach for developing and evaluating a deep learning-based poultry audio classification system. It processes raw audio data and labels to generate digital representations, utilizes a Burn Layer for training variability, and constructs a CNN model with convolutional blocks, pooling, and dense layers. The model is optimized using the Adamax algorithm and trained with data augmentation and early-stopping techniques. Rigorous assessment on a test dataset using standard metrics demonstrates the model's robustness and efficiency, with the potential to significantly advance animal health monitoring and disease detection through audio signal analysis.

本研究介绍了一种新颖的基于深度学习的家禽音频信号分类方法，该方法结合了定制的 "燃烧层"（Burn Layer），以增强模型的鲁棒性。该方法整合了数字音频信号处理、卷积神经网络（CNN）和创新的 "燃烧层"（Burn Layer）。"燃烧层 "在训练过程中注入受控随机噪声，以增强模型对输入信号变化的适应能力。所提出的架构非常精简，包括卷积块、密集连接层、剔除层和额外的 "燃烧层"（Burn Layer），以加强鲁棒性。与拥有 170 多万个参数的传统架构相比，该模型将可训练参数减少到 191,235 个，从而提高了效率。所提出的模型利用燃烧层（以燃烧强度作为参数）和 Adamax 优化器来优化和解决过拟合问题。使用六个标准分类指标进行的全面评估显示了该模型的卓越性能，实现了出色的灵敏度（96.77%）、特异度（100.00%）、精确度（100.00%）、负预测值（NPV）（95.00%）、准确度（98.55%）、F1 分数（98.36%）和马修相关系数（MCC）（95.88%）。这项研究为音频信号处理、动物健康监测和鲁棒深度学习分类系统等领域提供了有价值的见解。所提出的模型为开发和评估基于深度学习的家禽音频分类系统提供了一种系统方法。它处理原始音频数据和标签以生成数字表征，利用燃烧层（Burn Layer）进行可变性训练，并利用卷积块、池化和密集层构建 CNN 模型。该模型使用 Adamax 算法进行优化，并使用数据增强和早期停止技术进行训练。在测试数据集上使用标准指标进行的严格评估证明了该模型的稳健性和效率，有望通过音频信号分析大大推进动物健康监测和疾病检测。

{"title":"Optimizing poultry audio signal classification with deep learning and burn layer fusion","authors":"Esraa Hassan, Samar Elbedwehy, Mahmoud Y. Shams, Tarek Abd El-Hafeez, Nora El-Rashidy","doi":"10.1186/s40537-024-00985-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00985-8","url":null,"abstract":"<p>This study introduces a novel deep learning-based approach for classifying poultry audio signals, incorporating a custom Burn Layer to enhance model robustness. The methodology integrates digital audio signal processing, convolutional neural networks (CNNs), and the innovative Burn Layer, which injects controlled random noise during training to reinforce the model's resilience to input signal variations. The proposed architecture is streamlined, with convolutional blocks, densely connected layers, dropout, and an additional Burn Layer to fortify robustness. The model demonstrates efficiency by reducing trainable parameters to 191,235, compared to traditional architectures with over 1.7 million parameters. The proposed model utilizes a Burn Layer with burn intensity as a parameter and an Adamax optimizer to optimize and address the overfitting problem. Thorough evaluation using six standard classification metrics showcases the model's superior performance, achieving exceptional sensitivity (96.77%), specificity (100.00%), precision (100.00%), negative predictive value (NPV) (95.00%), accuracy (98.55%), F1 score (98.36%), and Matthew’s correlation coefficient (MCC) (95.88%). This research contributes valuable insights into the fields of audio signal processing, animal health monitoring, and robust deep-learning classification systems. The proposed model presents a systematic approach for developing and evaluating a deep learning-based poultry audio classification system. It processes raw audio data and labels to generate digital representations, utilizes a Burn Layer for training variability, and constructs a CNN model with convolutional blocks, pooling, and dense layers. The model is optimized using the Adamax algorithm and trained with data augmentation and early-stopping techniques. Rigorous assessment on a test dataset using standard metrics demonstrates the model's robustness and efficiency, with the potential to significantly advance animal health monitoring and disease detection through audio signal analysis.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting 基于网格搜索交叉验证的机器学习和深度学习模型用于短期太阳辐照度预报

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-18 DOI: 10.1186/s40537-024-00991-w

Doaa El-Shahat, Ahmed Tolba, Mohamed Abouhawwash, Mohamed Abdel-Basset

In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the Adjusted R² score, Normalized Root Mean Square Error (NRMSE), Mean Absolute Deviation (MAD), Mean Absolute Error (MAE) and Mean Square Error (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with Adjusted R² score value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with Adjusted R² score value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.

2023 年底，在迪拜举行的联合国气候变化大会（COP28）鼓励尽快从化石燃料转向可再生能源。太阳能是最有前途的可持续和可再生能源之一。一般来说，光伏系统将太阳辐照转化为电能。遗憾的是，太阳辐射的不稳定性和间歇性会导致电力生产中断。对太阳辐照度的准确预测可确保即使在没有太阳辐照度的情况下也能持续发电。电池可以储存太阳能，以便在没有太阳能时使用。此外，确定性模型考虑了技术光伏系统的规格，在太阳辐照度较低时可能并不准确。本文对短期太阳辐照度预测中最常用的深度学习（DL）和机器学习（ML）算法进行了比较研究。数据集收集于伊斯兰堡，时间跨度为五年（2015 年至 2019 年），使用精确的气象传感器以小时为间隔进行收集。此外，还为 ML 和 DL 模型引入了网格搜索交叉验证 (GSCV)，以优化这些模型的超参数。评估算法时使用了几个性能指标，如调整后 R2 分数、归一化均方根误差（NRMSE）、平均绝对偏差（MAD）、平均绝对误差（MAE）和平均平方误差（MSE）。统计分析结果表明，CNN-LSTM 的调整后 R2 得分为 0.984，优于九种著名的 DL 模型。在 ML 算法中，梯度提升回归是一种有效的预测方法，其调整后 R2 得分为 0.962，优于 6 个 ML 模型的对手。此外，SHAP 和 LIME 是可解释人工智能（XAI）的范例，可用于理解所得结果背后的原因。

{"title":"Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting","authors":"Doaa El-Shahat, Ahmed Tolba, Mohamed Abouhawwash, Mohamed Abdel-Basset","doi":"10.1186/s40537-024-00991-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00991-w","url":null,"abstract":"<p>In late 2023, the United Nations conference on climate change (COP28), which was held in Dubai, encouraged a quick move from fossil fuels to renewable energy. Solar energy is one of the most promising forms of energy that is both sustainable and renewable. Generally, photovoltaic systems transform solar irradiance into electricity. Unfortunately, instability and intermittency in solar radiation can lead to interruptions in electricity production. The accurate forecasting of solar irradiance guarantees sustainable power production even when solar irradiance is not present. Batteries can store solar energy to be used during periods of solar absence. Additionally, deterministic models take into account the specification of technical PV systems and may be not accurate for low solar irradiance. This paper presents a comparative study for the most common Deep Learning (DL) and Machine Learning (ML) algorithms employed for short-term solar irradiance forecasting. The dataset was gathered in Islamabad during a five-year period, from 2015 to 2019, at hourly intervals with accurate meteorological sensors. Furthermore, the Grid Search Cross Validation (GSCV) with five folds is introduced to ML and DL models for optimizing the hyperparameters of these models. Several performance metrics are used to assess the algorithms, such as the <i>Adjusted R</i><sup><i>2</i></sup><i> score</i>, <i>Normalized Root Mean Square Error</i> (NRMSE), <i>Mean Absolute Deviation</i> (MAD), <i>Mean Absolute Error</i> (MAE) and <i>Mean Square Error</i> (MSE). The statistical analysis shows that CNN-LSTM outperforms its counterparts of nine well-known DL models with <i>Adjusted R</i><sup><i>2</i></sup><i> score</i> value of 0.984. For ML algorithms, gradient boosting regression is an effective forecasting method with <i>Adjusted R</i><sup><i>2</i></sup><i> score</i> value of 0.962, beating its rivals of six ML models. Furthermore, SHAP and LIME are examples of explainable Artificial Intelligence (XAI) utilized for understanding the reasons behind the obtained results.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"13 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning 屏蔽网络：利用混合特征选择和堆栈集合学习加强入侵检测

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-18 DOI: 10.1186/s40537-024-00994-7

Ali Mohammed Alsaffar, Mostafa Nouri-Baygi, Hamed M. Zolbanin

The frequent usage of computer networks and the Internet has made computer networks vulnerable to numerous attacks, highlighting the critical need to enhance the precision of security mechanisms. One of the most essential measures to safeguard networking resources and infrastructures is an intrusion detection system (IDS). IDSs are widely used to detect, identify, and track malicious threats. Although various machine learning algorithms have been used successfully in IDSs, they are still suffering from low prediction performances. One reason behind the low accuracy of IDSs is that existing network traffic datasets have high computational complexities that are mainly caused by redundant, incomplete, and irrelevant features. Furthermore, standalone classifiers exhibit restricted classification performance and typically fail to produce satisfactory outcomes when dealing with imbalanced, multi-category traffic data. To address these issues, we propose an efficient intrusion detection model, which is based on hybrid feature selection and stack ensemble learning. Our hybrid feature selection method, called MI-Boruta, combines mutual information (MI) as a filter method and the Boruta algorithm as a wrapper method to determine optimal features from our datasets. Then, we apply stacked ensemble learning by using random forest (RF), Catboost, and XGBoost algorithms as base learners with multilayer perceptron (MLP) as meta-learner. We test our intrusion detection model on two widely recognized benchmark datasets, namely UNSW-NB15 and CICIDS2017. We show that our proposed IDS outperforms existing IDSs in almost all performance criteria, including accuracy, recall, precision, F1-Score, false positive rate, true positive rate, and error rate.

计算机网络和互联网的频繁使用使得计算机网络很容易受到各种攻击，这凸显了提高安全机制精确性的迫切需要。入侵检测系统（IDS）是保护网络资源和基础设施的最基本措施之一。IDS 广泛用于检测、识别和跟踪恶意威胁。虽然各种机器学习算法已成功应用于 IDS，但它们的预测性能仍然很低。IDS 准确率低的一个原因是，现有的网络流量数据集计算复杂度高，这主要是由冗余、不完整和不相关的特征造成的。此外，独立分类器的分类性能有限，在处理不平衡的多类别流量数据时通常无法产生令人满意的结果。为了解决这些问题，我们提出了一种基于混合特征选择和堆栈集合学习的高效入侵检测模型。我们的混合特征选择方法被称为 MI-Boruta，它将互信息（MI）作为一种过滤方法，将 Boruta 算法作为一种包装方法来确定数据集中的最佳特征。然后，我们使用随机森林 (RF)、Catboost 和 XGBoost 算法作为基础学习器，使用多层感知器 (MLP) 作为元学习器，进行堆叠集合学习。我们在两个广受认可的基准数据集（即 UNSW-NB15 和 CICIDS2017）上测试了我们的入侵检测模型。结果表明，我们提出的 IDS 在准确率、召回率、精确度、F1 分数、误报率、真阳性率和错误率等几乎所有性能指标上都优于现有的 IDS。

{"title":"Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning","authors":"Ali Mohammed Alsaffar, Mostafa Nouri-Baygi, Hamed M. Zolbanin","doi":"10.1186/s40537-024-00994-7","DOIUrl":"https://doi.org/10.1186/s40537-024-00994-7","url":null,"abstract":"<p>The frequent usage of computer networks and the Internet has made computer networks vulnerable to numerous attacks, highlighting the critical need to enhance the precision of security mechanisms. One of the most essential measures to safeguard networking resources and infrastructures is an intrusion detection system (IDS). IDSs are widely used to detect, identify, and track malicious threats. Although various machine learning algorithms have been used successfully in IDSs, they are still suffering from low prediction performances. One reason behind the low accuracy of IDSs is that existing network traffic datasets have high computational complexities that are mainly caused by redundant, incomplete, and irrelevant features. Furthermore, standalone classifiers exhibit restricted classification performance and typically fail to produce satisfactory outcomes when dealing with imbalanced, multi-category traffic data. To address these issues, we propose an efficient intrusion detection model, which is based on hybrid feature selection and stack ensemble learning. Our hybrid feature selection method, called MI-Boruta, combines mutual information (MI) as a filter method and the Boruta algorithm as a wrapper method to determine optimal features from our datasets. Then, we apply stacked ensemble learning by using random forest (RF), Catboost, and XGBoost algorithms as base learners with multilayer perceptron (MLP) as meta-learner. We test our intrusion detection model on two widely recognized benchmark datasets, namely UNSW-NB15 and CICIDS2017. We show that our proposed IDS outperforms existing IDSs in almost all performance criteria, including accuracy, recall, precision, F1-Score, false positive rate, true positive rate, and error rate.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"19 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating microarray-based spatial transcriptomics and RNA-seq reveals tissue architecture in colorectal cancer 整合基于芯片的空间转录组学和 RNA-seq 技术揭示结直肠癌的组织结构

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-17 DOI: 10.1186/s40537-024-00992-9

Zheng Li, Xiaojie Zhang, Chongyuan Sun, Zefeng Li, He Fei, Dongbing Zhao

Background

The tumor microenvironment (TME) provides a region for intricate interactions within or between immune and non-immune cells. We aimed to reveal the tissue architecture and comprehensive landscape of cells within the TME of colorectal cancer (CRC).

Methods

Fresh frozen invasive adenocarcinoma of the large intestine tissue from 10× Genomics Datasets was obtained from BioIVT Asterand. The integration of microarray-based spatial transcriptomics (ST) and RNA sequencing (RNA-seq) was applied to characterize gene expression and cell landscape within the TME of CRC tissue architecture. Multiple R packages and deconvolution algorithms including MCPcounter, XCELL, EPIC, and ESTIMATE methods were performed for further immune distribution analysis.

Results

The subpopulations of immune and non-immune cells within the TME of the CRC tissue architecture were appropriately annotated. According to ST and RNA-seq analyses, a heterogeneous spatial atlas of gene distribution and cell landscape was comprehensively characterized. We distinguished between the cancer and stromal regions of CRC tissues. As expected, epithelial cells were located in the cancerous region, whereas fibroblasts were mainly located in the stroma. In addition, the fibroblasts were further subdivided into two subgroups (F1 and F2) according to the differentially expressed genes (DEGs), which were mainly enriched in pathways including hallmark-oxidative-phosphorylation, hallmark-e2f-targets and hallmark-unfolded-protein-response. Furthermore, the top 5 DEGs, SPP1, CXCL10, APOE, APOC1, and LYZ, were found to be closely related to immunoregulation of the TME, methylation, and survival of CRC patients.

Conclusions

This study characterized the heterogeneous spatial landscape of various cell subtypes within the TME of the tissue architecture. The TME-related roles of fibroblast subsets addressed the potential crosstalk among diverse cells.

背景肿瘤微环境（TME）为免疫细胞和非免疫细胞内部或之间错综复杂的相互作用提供了一个区域。我们的目的是揭示结直肠癌（CRC）TME内的组织结构和细胞的综合景观。方法从BioIVT Asterand公司的10×基因组学数据集中获得新鲜冷冻的大肠浸润性腺癌组织。应用基于微阵列的空间转录组学（ST）和 RNA 测序（RNA-seq）的整合来描述 CRC 组织结构中 TME 内的基因表达和细胞景观。结果对 CRC 组织结构 TME 中的免疫和非免疫细胞亚群进行了适当的注释。根据ST和RNA-seq分析，全面描述了基因分布和细胞景观的异质性空间图谱。我们区分了 CRC 组织的癌区和基质区。不出所料，上皮细胞位于癌区，而成纤维细胞主要位于基质区。此外，根据差异表达基因（DEGs），成纤维细胞被进一步细分为两个亚组（F1和F2），主要富集在包括霍尔马克氧化磷酸化、霍尔马克-e2f-靶标和霍尔马克未折叠蛋白反应等通路中。此外，研究还发现前 5 个 DEGs（SPP1、CXCL10、APOE、APOC1 和 LYZ）与 TME 的免疫调节、甲基化和 CRC 患者的生存密切相关。成纤维细胞亚群在 TME 中的相关作用揭示了不同细胞之间的潜在串扰。

{"title":"Integrating microarray-based spatial transcriptomics and RNA-seq reveals tissue architecture in colorectal cancer","authors":"Zheng Li, Xiaojie Zhang, Chongyuan Sun, Zefeng Li, He Fei, Dongbing Zhao","doi":"10.1186/s40537-024-00992-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00992-9","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>The tumor microenvironment (TME) provides a region for intricate interactions within or between immune and non-immune cells. We aimed to reveal the tissue architecture and comprehensive landscape of cells within the TME of colorectal cancer (CRC).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Fresh frozen invasive adenocarcinoma of the large intestine tissue from 10× Genomics Datasets was obtained from BioIVT Asterand. The integration of microarray-based spatial transcriptomics (ST) and RNA sequencing (RNA-seq) was applied to characterize gene expression and cell landscape within the TME of CRC tissue architecture. Multiple R packages and deconvolution algorithms including MCPcounter, XCELL, EPIC, and ESTIMATE methods were performed for further immune distribution analysis.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The subpopulations of immune and non-immune cells within the TME of the CRC tissue architecture were appropriately annotated. According to ST and RNA-seq analyses, a heterogeneous spatial atlas of gene distribution and cell landscape was comprehensively characterized. We distinguished between the cancer and stromal regions of CRC tissues. As expected, epithelial cells were located in the cancerous region, whereas fibroblasts were mainly located in the stroma. In addition, the fibroblasts were further subdivided into two subgroups (F1 and F2) according to the differentially expressed genes (DEGs), which were mainly enriched in pathways including hallmark-oxidative-phosphorylation, hallmark-e2f-targets and hallmark-unfolded-protein-response. Furthermore, the top 5 DEGs, SPP1, CXCL10, APOE, APOC1, and LYZ, were found to be closely related to immunoregulation of the TME, methylation, and survival of CRC patients.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>This study characterized the heterogeneous spatial landscape of various cell subtypes within the TME of the tissue architecture. The TME-related roles of fibroblast subsets addressed the potential crosstalk among diverse cells.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"26 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development and evaluation of a deep learning model for automatic segmentation of non-perfusion area in fundus fluorescein angiography 开发和评估用于自动分割眼底荧光素血管造影非灌注区的深度学习模型

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-12 DOI: 10.1186/s40537-024-00968-9

Wei Feng, Bingjie Wang, Dan Song, Mengda Li, Anming Chen, Jing Wang, Siyong Lin, Yiran Zhao, Bin Wang, Zongyuan Ge, Shuyi Xu, Yuntao Hu

Diabetic retinopathy (DR) is the most prevalent cause of preventable vision loss worldwide, imposing a significant economic and medical burden on society today, of which early identification is the cornerstones of the management. The diagnosis and severity grading of DR rely on scales based on clinical visualized features, but lack detailed quantitative parameters. Retinal non-perfusion area (NPA) is a pathogenic characteristic of DR that symbolizes retinal hypoxia conditions, and was found to be intimately associated with disease progression, prognosis, and management. However, the practical value of NPA is constrained since it appears on fundus fluorescein angiography (FFA) as distributed, irregularly shaped, darker plaques that are challenging to measure manually. In this study, we propose a deep learning-based method, NPA-Net, for accurate and automatic segmentation of NPAs from FFA images acquired in clinical practice. NPA-Net uses the U-net structure as the basic backbone, which has an encoder-decoder model structure. To enhance the recognition performance of the model for NPA, we adaptively incorporate multi-scale features and contextual information in feature learning and design three modules: Adaptive Encoder Feature Fusion (AEFF) module, Multilayer Deep Supervised Loss, and Atrous Spatial Pyramid Pooling (ASPP) module, which enhance the recognition ability of the model for NPAs of different sizes from different perspectives. We conducted extensive experiments on a clinical dataset with 163 eyes with NPAs manually annotated by ophthalmologists, and NPA-Net achieved better segmentation performance compared to other existing methods with an area under the receiver operating characteristic curve (AUC) of 0.9752, accuracy of 0.9431, sensitivity of 0.8794, specificity of 0.9459, IOU of 0.3876 and Dice of 0.5686. This new automatic segmentation model is useful for identifying NPA in clinical practice, generating quantitative parameters that can be useful for further research as well as guiding DR detection, grading severity, treatment planning, and prognosis.

糖尿病视网膜病变（DR）是全球最常见的可预防性视力丧失的原因，给当今社会造成了巨大的经济和医疗负担，而早期识别是治疗的基础。DR 的诊断和严重程度分级依赖于基于临床可视化特征的量表，但缺乏详细的量化参数。视网膜非灌注区（NPA）是 DR 的致病特征，象征着视网膜缺氧状况，并被发现与疾病进展、预后和管理密切相关。然而，NPA 的实用价值受到限制，因为它在眼底荧光素血管造影（FFA）中表现为分布不均、形状不规则、颜色较深的斑块，人工测量难度很大。在本研究中，我们提出了一种基于深度学习的方法 NPA-Net，用于从临床实践中获取的 FFA 图像中准确、自动地分割 NPA。NPA-Net 以 U-net 结构为基本骨干，具有编码器-解码器模型结构。为了提高模型对 NPA 的识别性能，我们在特征学习中自适应地加入了多尺度特征和上下文信息，并设计了三个模块：自适应编码器特征融合（AEFF）模块、多层深度监督损失（Multilayer Deep Supervised Loss）模块和阿特鲁斯空间金字塔池化（ASPP）模块，从不同角度提高了模型对不同规模 NPA 的识别能力。与其他现有方法相比，NPA-Net 获得了更好的分割性能，接收者工作特征曲线下面积（AUC）为 0.9752，准确率为 0.9431，灵敏度为 0.8794，特异性为 0.9459，IOU 为 0.3876，Dice 为 0.5686。这一新的自动分割模型有助于在临床实践中识别 NPA，生成有助于进一步研究的定量参数，并指导 DR 检测、严重程度分级、治疗计划和预后。

{"title":"Development and evaluation of a deep learning model for automatic segmentation of non-perfusion area in fundus fluorescein angiography","authors":"Wei Feng, Bingjie Wang, Dan Song, Mengda Li, Anming Chen, Jing Wang, Siyong Lin, Yiran Zhao, Bin Wang, Zongyuan Ge, Shuyi Xu, Yuntao Hu","doi":"10.1186/s40537-024-00968-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00968-9","url":null,"abstract":"<p>Diabetic retinopathy (DR) is the most prevalent cause of preventable vision loss worldwide, imposing a significant economic and medical burden on society today, of which early identification is the cornerstones of the management. The diagnosis and severity grading of DR rely on scales based on clinical visualized features, but lack detailed quantitative parameters. Retinal non-perfusion area (NPA) is a pathogenic characteristic of DR that symbolizes retinal hypoxia conditions, and was found to be intimately associated with disease progression, prognosis, and management. However, the practical value of NPA is constrained since it appears on fundus fluorescein angiography (FFA) as distributed, irregularly shaped, darker plaques that are challenging to measure manually. In this study, we propose a deep learning-based method, NPA-Net, for accurate and automatic segmentation of NPAs from FFA images acquired in clinical practice. NPA-Net uses the U-net structure as the basic backbone, which has an encoder-decoder model structure. To enhance the recognition performance of the model for NPA, we adaptively incorporate multi-scale features and contextual information in feature learning and design three modules: Adaptive Encoder Feature Fusion (AEFF) module, Multilayer Deep Supervised Loss, and Atrous Spatial Pyramid Pooling (ASPP) module, which enhance the recognition ability of the model for NPAs of different sizes from different perspectives. We conducted extensive experiments on a clinical dataset with 163 eyes with NPAs manually annotated by ophthalmologists, and NPA-Net achieved better segmentation performance compared to other existing methods with an area under the receiver operating characteristic curve (AUC) of 0.9752, accuracy of 0.9431, sensitivity of 0.8794, specificity of 0.9459, IOU of 0.3876 and Dice of 0.5686. This new automatic segmentation model is useful for identifying NPA in clinical practice, generating quantitative parameters that can be useful for further research as well as guiding DR detection, grading severity, treatment planning, and prognosis.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"37 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging large-scale genetic data to assess the causal impact of COVID-19 on multisystemic diseases 利用大规模遗传数据评估 COVID-19 对多系统疾病的因果影响

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-12 DOI: 10.1186/s40537-024-00997-4

Xiangyang Zhang, Zhaohui Jiang, Jiayao Ma, Yaru Qi, Yin Li, Yan Zhang, Yihan Liu, Chaochao Wei, Yihong Chen, Ping Liu, Yinghui Peng, Jun Tan, Ying Han, Shan Zeng, Changjing Cai, Hong Shen

Background

The long-term impacts of COVID-19 on human health are a major concern, yet comprehensive evaluations of its effects on various health conditions are lacking.

Methods

This study aims to evaluate the role of various diseases in relation to COVID-19 by analyzing genetic data from a large-scale population over 2,000,000 individuals. A bidirectional two-sample Mendelian randomization approach was used, with exposures including COVID-19 susceptibility, hospitalization, and severity, and outcomes encompassing 86 different diseases or traits. A reverse Mendelian randomization analysis was performed to assess the impact of these diseases on COVID-19.

Results

Our analysis identified causal relationships between COVID-19 susceptibility and several conditions, including breast cancer (OR = 1.0073, 95% CI = 1.0032–1.0114, p = 5 × 10 − 4), ER + breast cancer (OR = 0.5252, 95% CI = 0.3589–0.7685, p = 9 × 10 − 4), and heart failure (OR = 1.0026, 95% CI = 1.001–1.0042, p = 0.002). COVID-19 hospitalization was causally linked to heart failure (OR = 1.0017, 95% CI = 1.0006–1.0028, p = 0.002) and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, p = 0.0006). COVID-19 severity had causal effects on primary biliary cirrhosis (OR = 2.6333, 95% CI = 1.8274–3.7948, p = 2.059 × 10 − 7), celiac disease (OR = 0.0708, 95% CI = 0.0538–0.0932, p = 9.438 × 10–80), and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, p = 0.0006). Reverse MR analysis indicated that rheumatoid arthritis, diabetic nephropathy, multiple sclerosis, and total testosterone (female) influence COVID-19 outcomes. We assessed heterogeneity and horizontal pleiotropy to ensure result reliability and employed the Steiger directionality test to confirm the direction of causality.

Conclusions

This study provides a comprehensive analysis of the causal relationships between COVID-19 and diverse health conditions. Our findings highlight the long-term impacts of COVID-19 on human health, emphasizing the need for continuous monitoring and targeted interventions for affected individuals. Future research should explore these relationships to develop comprehensive healthcare strategies.

背景COVID-19对人类健康的长期影响是一个重大问题，但目前还缺乏对其对各种健康状况影响的全面评估。方法本研究旨在通过分析超过200万人的大规模人群的遗传数据，评估各种疾病与COVID-19的关系。研究采用了双向双样本孟德尔随机化方法，暴露包括 COVID-19 易感性、住院和严重程度，结果包括 86 种不同的疾病或性状。结果我们的分析确定了 COVID-19 易感性与几种疾病之间的因果关系，包括乳腺癌（OR = 1.0073, 95% CI = 1.0032-1.0114, p = 5 × 10 - 4）、ER + 乳腺癌（OR = 0.5252, 95% CI = 0.3589-0.7685, p = 9 × 10 - 4）和心力衰竭（OR = 1.0026, 95% CI = 1.001-1.0042, p = 0.002）。COVID-19住院与心力衰竭（OR = 1.0017，95% CI = 1.0006-1.0028，p = 0.002）和阿尔茨海默病（OR = 1.5092，95% CI = 1.1942-1.9072，p = 0.0006）有因果关系。COVID-19 严重程度对原发性胆汁性肝硬化（OR = 2.6333，95% CI = 1.8274-3.7948，p = 2.059 × 10-7）、乳糜泻（OR = 0.0708，95% CI = 0.0538-0.0932，p = 9.438 × 10-80）和阿尔茨海默病（OR = 1.5092，95% CI = 1.1942-1.9072，p = 0.0006）有因果效应。反向 MR 分析表明，类风湿性关节炎、糖尿病肾病、多发性硬化症和总睾酮（女性）会影响 COVID-19 的结果。我们评估了异质性和水平多向性，以确保结果的可靠性，并采用 Steiger 方向性检验来确认因果关系的方向。我们的研究结果突显了 COVID-19 对人类健康的长期影响，强调了对受影响人群进行持续监测和有针对性干预的必要性。未来的研究应探讨这些关系，以制定全面的医疗保健策略。

{"title":"Leveraging large-scale genetic data to assess the causal impact of COVID-19 on multisystemic diseases","authors":"Xiangyang Zhang, Zhaohui Jiang, Jiayao Ma, Yaru Qi, Yin Li, Yan Zhang, Yihan Liu, Chaochao Wei, Yihong Chen, Ping Liu, Yinghui Peng, Jun Tan, Ying Han, Shan Zeng, Changjing Cai, Hong Shen","doi":"10.1186/s40537-024-00997-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00997-4","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>The long-term impacts of COVID-19 on human health are a major concern, yet comprehensive evaluations of its effects on various health conditions are lacking.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>This study aims to evaluate the role of various diseases in relation to COVID-19 by analyzing genetic data from a large-scale population over 2,000,000 individuals. A bidirectional two-sample Mendelian randomization approach was used, with exposures including COVID-19 susceptibility, hospitalization, and severity, and outcomes encompassing 86 different diseases or traits. A reverse Mendelian randomization analysis was performed to assess the impact of these diseases on COVID-19.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Our analysis identified causal relationships between COVID-19 susceptibility and several conditions, including breast cancer (OR = 1.0073, 95% CI = 1.0032–1.0114, <i>p</i> = 5 × 10 − 4), ER + breast cancer (OR = 0.5252, 95% CI = 0.3589–0.7685, <i>p</i> = 9 × 10 − 4), and heart failure (OR = 1.0026, 95% CI = 1.001–1.0042, <i>p</i> = 0.002). COVID-19 hospitalization was causally linked to heart failure (OR = 1.0017, 95% CI = 1.0006–1.0028, <i>p</i> = 0.002) and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, <i>p</i> = 0.0006). COVID-19 severity had causal effects on primary biliary cirrhosis (OR = 2.6333, 95% CI = 1.8274–3.7948, <i>p</i> = 2.059 × 10 − 7), celiac disease (OR = 0.0708, 95% CI = 0.0538–0.0932, <i>p</i> = 9.438 × 10–80), and Alzheimer’s disease (OR = 1.5092, 95% CI = 1.1942–1.9072, <i>p</i> = 0.0006). Reverse MR analysis indicated that rheumatoid arthritis, diabetic nephropathy, multiple sclerosis, and total testosterone (female) influence COVID-19 outcomes. We assessed heterogeneity and horizontal pleiotropy to ensure result reliability and employed the Steiger directionality test to confirm the direction of causality.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>This study provides a comprehensive analysis of the causal relationships between COVID-19 and diverse health conditions. Our findings highlight the long-term impacts of COVID-19 on human health, emphasizing the need for continuous monitoring and targeted interventions for affected individuals. Future research should explore these relationships to develop comprehensive healthcare strategies.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"1 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evolutionary computation-based self-supervised learning for image processing: a big data-driven approach to feature extraction and fusion for multispectral object detection 基于进化计算的图像处理自监督学习：用于多光谱物体检测的特征提取和融合的大数据驱动方法

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-12 DOI: 10.1186/s40537-024-00988-5

Xiaoyang Shen, Haibin Li, Achyut Shankar, Wattana Viriyasitavat, Vinay Chamola

The image object recognition and detection technology are widely used in many scenarios. In recent years, big data has become increasingly abundant, and big data-driven artificial intelligence models have attracted more and more attention. Evolutionary computation has also provided a powerful driving force for the optimization and improvement of deep learning models. In this paper, we propose an image object detection method based on self-supervised and data-driven learning. Differ from other methods, our approach stands out due to its innovative use of multispectral data fusion and evolutionary computation for model optimization. Specifically, our method uniquely combines visible light images and infrared images to detect and identify image targets. Firstly, we utilize a self-supervised learning method and the AutoEncoder model to perform high-dimensional feature extraction on the two types of images. Secondly, we fuse the extracted features from the visible light and infrared images to detect and identify objects. Thirdly, we introduce a model parameter optimization method using evolutionary learning algorithms to enhance model performance. Validation on public datasets shows that our method achieves comparable or superior performance to existing methods.

图像物体识别与检测技术在很多场景中都有广泛应用。近年来，大数据日益丰富，大数据驱动的人工智能模型越来越受到关注。进化计算也为深度学习模型的优化和改进提供了强大的驱动力。本文提出了一种基于自监督和数据驱动学习的图像物体检测方法。与其他方法不同的是，我们的方法创新性地使用了多光谱数据融合和进化计算来优化模型。具体来说，我们的方法独特地结合了可见光图像和红外图像来检测和识别图像目标。首先，我们利用自监督学习方法和 AutoEncoder 模型对两类图像进行高维特征提取。其次，我们融合从可见光和红外图像中提取的特征来检测和识别目标。第三，我们利用进化学习算法引入了一种模型参数优化方法，以提高模型性能。在公共数据集上的验证表明，我们的方法取得了与现有方法相当甚至更优的性能。

{"title":"Evolutionary computation-based self-supervised learning for image processing: a big data-driven approach to feature extraction and fusion for multispectral object detection","authors":"Xiaoyang Shen, Haibin Li, Achyut Shankar, Wattana Viriyasitavat, Vinay Chamola","doi":"10.1186/s40537-024-00988-5","DOIUrl":"https://doi.org/10.1186/s40537-024-00988-5","url":null,"abstract":"<p>The image object recognition and detection technology are widely used in many scenarios. In recent years, big data has become increasingly abundant, and big data-driven artificial intelligence models have attracted more and more attention. Evolutionary computation has also provided a powerful driving force for the optimization and improvement of deep learning models. In this paper, we propose an image object detection method based on self-supervised and data-driven learning. Differ from other methods, our approach stands out due to its innovative use of multispectral data fusion and evolutionary computation for model optimization. Specifically, our method uniquely combines visible light images and infrared images to detect and identify image targets. Firstly, we utilize a self-supervised learning method and the AutoEncoder model to perform high-dimensional feature extraction on the two types of images. Secondly, we fuse the extracted features from the visible light and infrared images to detect and identify objects. Thirdly, we introduce a model parameter optimization method using evolutionary learning algorithms to enhance model performance. Validation on public datasets shows that our method achieves comparable or superior performance to existing methods.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"6 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A model for investment type recommender system based on the potential investors based on investors and experts feedback using ANFIS and MNN 基于投资者和专家反馈的潜在投资者投资类型推荐系统模型（使用 ANFIS 和 MNN

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-12 DOI: 10.1186/s40537-024-00965-y

Asefeh Asemi, Adeleh Asemi, Andrea Ko

This article presents an investment recommender system based on an Adaptive Neuro-Fuzzy Inference System (ANFIS) and pre-trained weights from a Multimodal Neural Network (MNN). The model is designed to support the investment process for the customers and takes into consideration seven factors to implement the proposed investment system model through the customer or potential investor data set. The system takes input from a web-based questionnaire that collects data on investors' preferences and investment goals. The data is then preprocessed and clustered using ETL tools, JMP, MATLAB, and Python. The ANFIS-based recommender system is designed with three inputs and one output and trained using a hybrid approach over three epochs with 188 data pairs and 18 fuzzy rules. The system's performance is evaluated using metrics such as RMSE, accuracy, precision, recall, and F1-score. The system is also designed to incorporate expert feedback and opinions from investors to customize and improve investment recommendations. The article concludes that the proposed ANFIS-based investment recommender system is effective and accurate in generating investment recommendations that meet investors' preferences and goals.

Graphical abstract

本文介绍了一种基于自适应神经模糊推理系统（ANFIS）和多模态神经网络（MNN）预训练权重的投资推荐系统。该模型旨在为客户的投资过程提供支持，并考虑了七个因素，通过客户或潜在投资者数据集来实现所建议的投资系统模型。该系统通过网络问卷收集有关投资者偏好和投资目标的数据。然后使用 ETL 工具、JMP、MATLAB 和 Python 对数据进行预处理和聚类。基于 ANFIS 的推荐系统设计了三个输入和一个输出，并使用混合方法对 188 对数据和 18 条模糊规则进行了三次历时训练。该系统的性能使用 RMSE、准确度、精确度、召回率和 F1 分数等指标进行评估。该系统还设计了专家反馈和投资者意见，以定制和改进投资建议。文章的结论是，所提出的基于 ANFIS 的投资推荐系统能有效、准确地生成符合投资者偏好和目标的投资建议。图表摘要

{"title":"A model for investment type recommender system based on the potential investors based on investors and experts feedback using ANFIS and MNN","authors":"Asefeh Asemi, Adeleh Asemi, Andrea Ko","doi":"10.1186/s40537-024-00965-y","DOIUrl":"https://doi.org/10.1186/s40537-024-00965-y","url":null,"abstract":"<p>This article presents an investment recommender system based on an Adaptive Neuro-Fuzzy Inference System (ANFIS) and pre-trained weights from a Multimodal Neural Network (MNN). The model is designed to support the investment process for the customers and takes into consideration seven factors to implement the proposed investment system model through the customer or potential investor data set. The system takes input from a web-based questionnaire that collects data on investors' preferences and investment goals. The data is then preprocessed and clustered using ETL tools, JMP, MATLAB, and Python. The ANFIS-based recommender system is designed with three inputs and one output and trained using a hybrid approach over three epochs with 188 data pairs and 18 fuzzy rules. The system's performance is evaluated using metrics such as RMSE, accuracy, precision, recall, and F1-score. The system is also designed to incorporate expert feedback and opinions from investors to customize and improve investment recommendations. The article concludes that the proposed ANFIS-based investment recommender system is effective and accurate in generating investment recommendations that meet investors' preferences and goals.</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"9 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inhibitory neuron links the causal relationship from air pollution to psychiatric disorders: a large multi-omics analysis 抑制性神经元将空气污染与精神疾病的因果关系联系起来：大型多组学分析

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-11 DOI: 10.1186/s40537-024-00960-3

Xisong Liang, Jie Wen, Chunrun Qu, Nan Zhang, Ziyu Dai, Hao Zhang, Peng Luo, Ming Meng, Zhixiong Liu, Fan Fan, Quan Cheng

Psychiatric disorders are severe health challenges that exert a heavy public burden. Air pollution has been widely reported as related to psychiatric disorder risk, but their casual association and pathological mechanism remained unclear. Herein, we systematically investigated the large genome-wide association studies (6 cohorts with 1,357,645 samples), single-cell RNA (26 samples with 157,488 cells), and bulk-RNAseq (1595 samples) datasets to reveal the genetic causality and biological link between four air pollutants and nine psychiatric disorders. As a result, we identified ten positive genetic correlations between air pollution and psychiatric disorders. Besides, PM2.5 and NO₂ presented significant causal effects on schizophrenia risk which was robust with adjustment of potential confounders. Besides, transcriptome-wide association studies identified the shared genes between PM2.5/NO2 and schizophrenia. We then discovered a schizophrenia-derived inhibitory neuron subtype with highly expressed shared genes and abnormal synaptic and metabolic pathways by scRNA analyses and confirmed their abnormal level and correlations with the shared genes in schizophrenia patients in a large RNA-seq cohort. Comprehensively, we discovered robust genetic causality between PM2.5, NO₂, and schizophrenia and identified an abnormal inhibitory neuron subtype that links schizophrenia pathology and PM2.5/NO2 exposure. These discoveries highlight the schizophrenia risk under air pollutants exposure and provide novel mechanical insights into schizophrenia pathology, contributing to pollutant-related schizophrenia risk control and therapeutic strategies development.

Graphical Abstract

精神疾病是严峻的健康挑战，给公众带来沉重负担。空气污染与精神疾病风险的关系已被广泛报道，但其偶然关联和病理机制仍不清楚。在此，我们系统地研究了大型全基因组关联研究（6 个队列，1,357,645 个样本）、单细胞 RNA（26 个样本，157,488 个细胞）和批量 RNAseq（1595 个样本）数据集，以揭示四种空气污染物与九种精神疾病之间的遗传因果关系和生物学联系。结果，我们发现了空气污染与精神疾病之间的十种正遗传相关性。此外，PM2.5 和二氧化氮对精神分裂症风险具有显著的因果效应，在调整了潜在的混杂因素后，这种效应是稳健的。此外，全转录组关联研究发现了PM2.5/二氧化氮与精神分裂症之间的共有基因。然后，我们通过scRNA分析发现了精神分裂症衍生的抑制性神经元亚型，该亚型具有高表达的共享基因以及异常的突触和代谢通路，并在一个大型RNA-seq队列中证实了精神分裂症患者的异常水平及其与共享基因的相关性。总之，我们发现了PM2.5、二氧化氮和精神分裂症之间的强大遗传因果关系，并确定了一种异常抑制性神经元亚型，它将精神分裂症病理和PM2.5/二氧化氮暴露联系在一起。这些发现凸显了空气污染物暴露下的精神分裂症风险，并为精神分裂症病理提供了新的力学见解，有助于与污染物相关的精神分裂症风险控制和治疗策略的开发。图文摘要

{"title":"Inhibitory neuron links the causal relationship from air pollution to psychiatric disorders: a large multi-omics analysis","authors":"Xisong Liang, Jie Wen, Chunrun Qu, Nan Zhang, Ziyu Dai, Hao Zhang, Peng Luo, Ming Meng, Zhixiong Liu, Fan Fan, Quan Cheng","doi":"10.1186/s40537-024-00960-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00960-3","url":null,"abstract":"<p>Psychiatric disorders are severe health challenges that exert a heavy public burden. Air pollution has been widely reported as related to psychiatric disorder risk, but their casual association and pathological mechanism remained unclear. Herein, we systematically investigated the large genome-wide association studies (6 cohorts with 1,357,645 samples), single-cell RNA (26 samples with 157,488 cells), and bulk-RNAseq (1595 samples) datasets to reveal the genetic causality and biological link between four air pollutants and nine psychiatric disorders. As a result, we identified ten positive genetic correlations between air pollution and psychiatric disorders. Besides, PM2.5 and NO<sub>2</sub> presented significant causal effects on schizophrenia risk which was robust with adjustment of potential confounders. Besides, transcriptome-wide association studies identified the shared genes between PM2.5/NO2 and schizophrenia. We then discovered a schizophrenia-derived inhibitory neuron subtype with highly expressed shared genes and abnormal synaptic and metabolic pathways by scRNA analyses and confirmed their abnormal level and correlations with the shared genes in schizophrenia patients in a large RNA-seq cohort. Comprehensively, we discovered robust genetic causality between PM2.5, NO<sub>2</sub>, and schizophrenia and identified an abnormal inhibitory neuron subtype that links schizophrenia pathology and PM2.5/NO2 exposure. These discoveries highlight the schizophrenia risk under air pollutants exposure and provide novel mechanical insights into schizophrenia pathology, contributing to pollutant-related schizophrenia risk control and therapeutic strategies development.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"58 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling the impact of BDA-AI on sustainable innovation ambidexterity and environmental performance 模拟 BDA-AI 对可持续创新灵活性和环境绩效的影响

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Big Data

Pub Date : 2024-09-08 DOI: 10.1186/s40537-024-00995-6

Chin-Tsu Chen, Asif Khan, Shih-Chih Chen

Data has evolved into one of the principal resources for contemporary businesses. Moreover, corporations have undergone digitalization; consequently, their supply chains generate substantial amounts of data. The theoretical framework of this investigation was built on novel concepts like big data analytics—artificial intelligence (BDA-AI) and supply chain ambidexterity’s (SCA) direct impacts on sustainable supply chain management (SSCM) and indirect impacts on sustainable innovation ambidexterity (SIA) and environmental performance (EP). This study selected employees of manufacturing industries as respondents for environmental performance, sustainable supply chain management, big data analytics, artificial intelligence, and supply chain ambidexterity. The results from this study show that BDA-AI and SCA significantly affect SSCM. SSCM has significant associations with SIA and EP. Finally, SIA has a significant impact on EP. According to the results indicating the indirect impacts, BDA-AI has significant indirect relationships with SIA and EP by having SSCM as the mediating variable. Furthermore, SCA has significant indirect associations with SIA and EP, with SSCM as the mediating variable. Additionally, both BDA-AI and SCA have significant indirect associations with EP, while SIA and SSCM are mediating variables. Finally, SSCM has an indirect association with EP while having SIA as a mediating variable. The findings of this paper provide several theoretical contributions to the research in sustainability and big data analytics artificial intelligence field. Furthermore, based on the suggested framework, this study offers a number of practical implications for decision-makers to improve significantly in the supply chain and BDA-AI. For instance, this paper provides significant insight for logistics and supply chain managers, supporting them in implementing BDA-AI solutions to help SSCM and enhance EP.

数据已发展成为当代企业的主要资源之一。此外，企业经历了数字化，因此其供应链产生了大量数据。本研究的理论框架建立在大数据分析-人工智能（BDA-AI）、供应链灵活性（SCA）对可持续供应链管理（SSCM）的直接影响以及对可持续创新灵活性（SIA）和环境绩效（EP）的间接影响等新概念之上。本研究选取了制造业员工作为环境绩效、可持续供应链管理、大数据分析、人工智能和供应链灵活性的调查对象。研究结果表明，BDA-AI 和 SCA 对 SSCM 有显著影响。SSCM 与 SIA 和 EP 有重大关联。最后，SIA 对 EP 有重大影响。根据间接影响的结果，BDA-AI 与 SIA 和 EP 有明显的间接关系，SSCM 是中介变量。此外，以 SSCM 为中介变量，SCA 与 SIA 和 EP 有明显的间接关系。此外，BDA-AI 和 SCA 与 EP 有显著的间接关联，而 SIA 和 SSCM 是中介变量。最后，SSCM 与 EP 间接相关，而 SIA 是中介变量。本文的研究结果为可持续发展和大数据分析人工智能领域的研究提供了若干理论贡献。此外，基于所建议的框架，本研究还为决策者提供了一些实际意义，以显著改善供应链和 BDA-AI 的状况。例如，本文为物流和供应链管理者提供了重要启示，支持他们实施 BDA-AI 解决方案，以帮助 SSCM 和提升 EP。

{"title":"Modeling the impact of BDA-AI on sustainable innovation ambidexterity and environmental performance","authors":"Chin-Tsu Chen, Asif Khan, Shih-Chih Chen","doi":"10.1186/s40537-024-00995-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00995-6","url":null,"abstract":"<p>Data has evolved into one of the principal resources for contemporary businesses. Moreover, corporations have undergone digitalization; consequently, their supply chains generate substantial amounts of data. The theoretical framework of this investigation was built on novel concepts like big data analytics—artificial intelligence (BDA-AI) and supply chain ambidexterity’s (SCA) direct impacts on sustainable supply chain management (SSCM) and indirect impacts on sustainable innovation ambidexterity (SIA) and environmental performance (EP). This study selected employees of manufacturing industries as respondents for environmental performance, sustainable supply chain management, big data analytics, artificial intelligence, and supply chain ambidexterity. The results from this study show that BDA-AI and SCA significantly affect SSCM. SSCM has significant associations with SIA and EP. Finally, SIA has a significant impact on EP. According to the results indicating the indirect impacts, BDA-AI has significant indirect relationships with SIA and EP by having SSCM as the mediating variable. Furthermore, SCA has significant indirect associations with SIA and EP, with SSCM as the mediating variable. Additionally, both BDA-AI and SCA have significant indirect associations with EP, while SIA and SSCM are mediating variables. Finally, SSCM has an indirect association with EP while having SIA as a mediating variable. The findings of this paper provide several theoretical contributions to the research in sustainability and big data analytics artificial intelligence field. Furthermore, based on the suggested framework, this study offers a number of practical implications for decision-makers to improve significantly in the supply chain and BDA-AI. For instance, this paper provides significant insight for logistics and supply chain managers, supporting them in implementing BDA-AI solutions to help SSCM and enhance EP.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"13 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0