首页 > 最新文献

Journal of Big Data最新文献

英文 中文
High-performance computing in healthcare:an automatic literature analysis perspective 医疗保健领域的高性能计算:自动文献分析视角
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-02 DOI: 10.1186/s40537-024-00929-2
Jieyi Li, Shuai Wang, Stevan Rudinac, Anwar Osseyran

The adoption of high-performance computing (HPC) in healthcare has gained significant attention in recent years, driving advancements in medical research and clinical practice. Exploring the literature on HPC implementation in healthcare is valuable for decision-makers as it provides insights into potential areas for further investigation and investment. However, manually analyzing the vast number of scholarly articles is a challenging and time-consuming task. Fortunately, topic modeling techniques offer the capacity to process extensive volumes of scientific literature, identifying key trends within the field. This paper presents an automatic literature analysis framework based on a state-of-art vector-based topic modeling algorithm with multiple embedding techniques, unveiling the research trends surrounding HPC utilization in healthcare. The proposed pipeline consists of four phases: paper extraction, data preprocessing, topic modeling and outlier detection, followed by visualization. It enables the automatic extraction of meaningful topics, exploration of their interrelationships, and identification of emerging research directions in an intuitive manner. The findings highlight the transition of HPC adoption in healthcare from traditional numerical simulation and surgical visualization to emerging topics such as drug discovery, AI-driven medical image analysis, and genomic analysis, as well as correlations and interdisciplinary connections among application domains.

近年来,高性能计算(HPC)在医疗保健领域的应用备受关注,推动了医学研究和临床实践的进步。探索有关医疗保健领域高性能计算实施情况的文献对决策者来说非常有价值,因为它为进一步调查和投资的潜在领域提供了洞察力。然而,对大量学术文章进行人工分析是一项具有挑战性且耗时的任务。幸运的是,主题建模技术能够处理大量科学文献,识别该领域的关键趋势。本文介绍了一种自动文献分析框架,该框架基于最先进的矢量主题建模算法和多种嵌入技术,揭示了医疗保健领域利用高性能计算技术的研究趋势。所提出的管道包括四个阶段:论文提取、数据预处理、主题建模和离群点检测,然后是可视化。它能以直观的方式自动提取有意义的主题,探索它们之间的相互关系,并确定新兴的研究方向。研究结果凸显了医疗保健领域采用高性能计算技术的过渡情况,即从传统的数值模拟和手术可视化过渡到药物发现、人工智能驱动的医学图像分析和基因组分析等新兴主题,以及各应用领域之间的相关性和跨学科联系。
{"title":"High-performance computing in healthcare:an automatic literature analysis perspective","authors":"Jieyi Li, Shuai Wang, Stevan Rudinac, Anwar Osseyran","doi":"10.1186/s40537-024-00929-2","DOIUrl":"https://doi.org/10.1186/s40537-024-00929-2","url":null,"abstract":"<p>The adoption of high-performance computing (HPC) in healthcare has gained significant attention in recent years, driving advancements in medical research and clinical practice. Exploring the literature on HPC implementation in healthcare is valuable for decision-makers as it provides insights into potential areas for further investigation and investment. However, manually analyzing the vast number of scholarly articles is a challenging and time-consuming task. Fortunately, topic modeling techniques offer the capacity to process extensive volumes of scientific literature, identifying key trends within the field. This paper presents an automatic literature analysis framework based on a state-of-art vector-based topic modeling algorithm with multiple embedding techniques, unveiling the research trends surrounding HPC utilization in healthcare. The proposed pipeline consists of four phases: paper extraction, data preprocessing, topic modeling and outlier detection, followed by visualization. It enables the automatic extraction of meaningful topics, exploration of their interrelationships, and identification of emerging research directions in an intuitive manner. The findings highlight the transition of HPC adoption in healthcare from traditional numerical simulation and surgical visualization to emerging topics such as drug discovery, AI-driven medical image analysis, and genomic analysis, as well as correlations and interdisciplinary connections among application domains.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"91 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational 3D topographic microscopy from terabytes of data per sample 利用每个样本的 TB 级数据进行三维拓扑显微计算
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-05-02 DOI: 10.1186/s40537-024-00901-0
Kevin C. Zhou, Mark Harfouche, Maxwell Zheng, Joakim Jönsson, Kyung Chul Lee, Kanghyun Kim, Ron Appel, Paul Reamey, Thomas Doman, Veton Saliu, Gregor Horstmeyer, Seung Ah Lee, Roarke Horstmeyer

We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across >110 cm2 areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis translation to capture, for each sample of interest, a multi-dimensional, 2.1-terabyte (TB) dataset, consisting of a total of 224,640 9.4-megapixel images. We developed a self-supervised neural network-based algorithm for 3D reconstruction and stitching that jointly estimates an all-in-focus photometric composite and 3D height map across the entire field of view, using multi-view stereo information and image sharpness as a focal metric. The memory-efficient, compressed differentiable representation offered by the neural network effectively enables joint participation of the entire multi-TB dataset during the reconstruction process. Validation experiments on gauge blocks demonstrate a profilometric precision and accuracy of 10 µm or better. To demonstrate the broad utility of our new computational microscope, we applied STARCAM to a variety of decimeter-scale objects, with applications ranging from cultural heritage to industrial inspection.

我们展示了一种大型计算三维地形图显微镜,该显微镜可在多毫米轴向范围内的 110 平方厘米区域内以微米级分辨率进行六百万像素轮廓三维成像。我们的计算显微镜被称为 STARCAM(利用计算阵列显微镜进行扫描地形图全焦重建),它采用并行化的 54 相机架构,具有三轴平移功能,可为每个感兴趣的样本捕捉多维、2.1 TB(兆字节)的数据集,该数据集由总计 224,640 张 940 万像素的图像组成。我们开发了一种基于自监督神经网络的三维重建和拼接算法,该算法利用多视角立体信息和图像锐度作为焦点度量,联合估算整个视场的全焦点光度复合图和三维高度图。在重建过程中,神经网络所提供的记忆效率高的压缩可微分表示法有效地实现了整个多TB数据集的共同参与。在量块上进行的验证实验表明,轮廓测量的精度和准确度达到了 10 微米或更高。为了证明我们的新型计算显微镜的广泛实用性,我们将 STARCAM 应用于从文化遗产到工业检测的各种分米级物体。
{"title":"Computational 3D topographic microscopy from terabytes of data per sample","authors":"Kevin C. Zhou, Mark Harfouche, Maxwell Zheng, Joakim Jönsson, Kyung Chul Lee, Kanghyun Kim, Ron Appel, Paul Reamey, Thomas Doman, Veton Saliu, Gregor Horstmeyer, Seung Ah Lee, Roarke Horstmeyer","doi":"10.1186/s40537-024-00901-0","DOIUrl":"https://doi.org/10.1186/s40537-024-00901-0","url":null,"abstract":"<p>We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across &gt;110 cm<sup>2</sup> areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis translation to capture, for each sample of interest, a multi-dimensional, 2.1-terabyte (TB) dataset, consisting of a total of 224,640 9.4-megapixel images. We developed a self-supervised neural network-based algorithm for 3D reconstruction and stitching that jointly estimates an all-in-focus photometric composite and 3D height map across the entire field of view, using multi-view stereo information and image sharpness as a focal metric. The memory-efficient, compressed differentiable representation offered by the neural network effectively enables joint participation of the entire multi-TB dataset during the reconstruction process. Validation experiments on gauge blocks demonstrate a profilometric precision and accuracy of 10 µm or better. To demonstrate the broad utility of our new computational microscope, we applied STARCAM to a variety of decimeter-scale objects, with applications ranging from cultural heritage to industrial inspection.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"33 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De-occlusion and recognition of frontal face images: a comparative study of multiple imputation methods 正面人脸图像的去剔除和识别:多重估算方法的比较研究
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-29 DOI: 10.1186/s40537-024-00925-6
Joseph Agyapong Mensah, Ezekiel N. N. Nortey, Eric Ocran, Samuel Iddi, Louis Asiedu

Increasingly, automatic face recognition algorithms have become necessary with the development and extensive use of face recognition technology, particularly in the era of machine learning and artificial intelligence. However, the presence of unconstrained environmental conditions degrades the quality of acquired face images and may deteriorate the performance of many classical face recognition algorithms. Due to this backdrop, many researchers have given considerable attention to image restoration and enhancement mechanisms, but with minimal focus on occlusion-related and multiple-constrained problems. Although occlusion robust face recognition modules, via sparse representation have been explored, they require a large number of features to achieve correct computations and to maximize robustness to occlusions. Therefore, such an approach may become deficient in the presence of random occlusions of relatively moderate magnitude. This study assesses the robustness of Principal Component Analysis and Singular Value Decomposition using Discrete Wavelet Transformation for preprocessing and city block distance for classification (DWT-PCA/SVD-L1) face recognition module to image degradations due to random occlusions of varying magnitudes (10% and 20%) in test images acquired with varying expressions. Numerical evaluation of the performance of the DWT-PCA/SVD-L1 face recognition module showed that the use of the de-occluded faces for recognition enhanced significantly the performance of the study recognition module at each level (10% and 20%) of occlusion. The algorithm attained the highest recognition rate of 85.94% and 78.65% at 10% and 20% occlusions respectively, when the MICE de-occluded face images were used for recognition. With the exception of Entropy where MICE de-occluded face images attained the highest average value, the MICE and RegEM result in images of similar quality as measured by their Absolute mean brightness error (AMBE) and peak signal to noise ratio (PSNR). The study therefore recommends MICE as a suitable imputation mechanism for de-occlusion of face images acquired under varying expressions.

随着人脸识别技术的发展和广泛应用,特别是在机器学习和人工智能时代,自动人脸识别算法变得越来越必要。然而,无约束环境条件的存在会降低所获取的人脸图像的质量,并可能使许多经典人脸识别算法的性能下降。在此背景下,许多研究人员对图像修复和增强机制给予了极大关注,但却很少关注与遮挡相关的多重受限问题。虽然人们已经探索了通过稀疏表示的抗遮挡人脸识别模块,但它们需要大量的特征来实现正确的计算,并最大限度地提高对遮挡的鲁棒性。因此,这种方法在出现相对中等程度的随机遮挡时可能会出现缺陷。本研究评估了使用离散小波变换进行预处理的主成分分析和奇异值分解以及用于分类的城市块距离(DWT-PCA/SVD-L1)人脸识别模块对不同表情下获取的测试图像中不同程度(10% 和 20%)的随机遮挡造成的图像质量下降的鲁棒性。对 DWT-PCA/SVD-L1 人脸识别模块的性能进行的数值评估表明,在每个闭塞程度(10% 和 20%)下,使用去闭塞人脸进行识别可显著提高研究识别模块的性能。当使用 MICE 剔除的人脸图像进行识别时,算法在 10%和 20%的闭塞度下分别达到了 85.94% 和 78.65% 的最高识别率。从绝对平均亮度误差(AMBE)和峰值信噪比(PSNR)来看,MICE 和 RegEM 的图像质量相似,但熵值不同,MICE 去噪人脸图像的平均值最高。因此,该研究建议将 MICE 作为一种合适的归因机制,用于在不同表情下获取的人脸图像的去剔除。
{"title":"De-occlusion and recognition of frontal face images: a comparative study of multiple imputation methods","authors":"Joseph Agyapong Mensah, Ezekiel N. N. Nortey, Eric Ocran, Samuel Iddi, Louis Asiedu","doi":"10.1186/s40537-024-00925-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00925-6","url":null,"abstract":"<p>Increasingly, automatic face recognition algorithms have become necessary with the development and extensive use of face recognition technology, particularly in the era of machine learning and artificial intelligence. However, the presence of unconstrained environmental conditions degrades the quality of acquired face images and may deteriorate the performance of many classical face recognition algorithms. Due to this backdrop, many researchers have given considerable attention to image restoration and enhancement mechanisms, but with minimal focus on occlusion-related and multiple-constrained problems. Although occlusion robust face recognition modules, via sparse representation have been explored, they require a large number of features to achieve correct computations and to maximize robustness to occlusions. Therefore, such an approach may become deficient in the presence of random occlusions of relatively moderate magnitude. This study assesses the robustness of Principal Component Analysis and Singular Value Decomposition using Discrete Wavelet Transformation for preprocessing and city block distance for classification (DWT-PCA/SVD-L1) face recognition module to image degradations due to random occlusions of varying magnitudes (10% and 20%) in test images acquired with varying expressions. Numerical evaluation of the performance of the DWT-PCA/SVD-L1 face recognition module showed that the use of the de-occluded faces for recognition enhanced significantly the performance of the study recognition module at each level (10% and 20%) of occlusion. The algorithm attained the highest recognition rate of 85.94% and 78.65% at 10% and 20% occlusions respectively, when the MICE de-occluded face images were used for recognition. With the exception of Entropy where MICE de-occluded face images attained the highest average value, the MICE and RegEM result in images of similar quality as measured by their Absolute mean brightness error (AMBE) and peak signal to noise ratio (PSNR). The study therefore recommends MICE as a suitable imputation mechanism for de-occlusion of face images acquired under varying expressions.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"17 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Profitability trend prediction in crypto financial markets using Fibonacci technical indicator and hybrid CNN model 利用斐波那契技术指标和混合 CNN 模型预测加密货币金融市场的盈利趋势
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-28 DOI: 10.1186/s40537-024-00908-7
Bilal Hassan Ahmed Khattak, Imran Shafi, Chaudhary Hamza Rashid, Mejdl Safran, Sultan Alfarhood, Imran Ashraf

Cryptocurrency has become a popular trading asset due to its security, anonymity, and decentralization. However, predicting the direction of the financial market can be challenging, leading to difficult financial decisions and potential losses. The purpose of this study is to gain insights into the impact of Fibonacci technical indicator (TI) and multi-class classification based on trend direction and price-strength (trend-strength) to improve the performance and profitability of artificial intelligence (AI) models, particularly hybrid convolutional neural network (CNN) incorporating long short-term memory (LSTM), and to modify it to reduce its complexity. The main contribution of this paper lies in its introduction of Fibonacci TI, demonstrating its impact on financial prediction, and incorporation of a multi-classification technique focusing on trend strength, thereby enhancing the depth and accuracy of predictions. Lastly, profitability analysis sheds light on the tangible benefits of utilizing Fibonacci and multi-classification. The research methodology employed to carry out profitability analysis is based on a hybrid investment strategy—direction and strength by employing a six-stage predictive system: data collection, preprocessing, sampling, training and prediction, investment simulation, and evaluation. Empirical findings show that the Fibonacci TI has improved its performance (44% configurations) and profitability (68% configurations) of AI models. Hybrid CNNs showed most performance improvements particularly the C-LSTM model for trend (binary-0.0023) and trend-strength (4 class-0.0020) and 6 class-0.0099). Hybrid CNNs showed improved profitability, particularly in CLSTM, and performance in CLSTM mod. Trend-strength prediction showed max improvements in long strategy ROI (6.89%) and average ROIs for long-short strategy. Regarding the choice between hybrid CNNs, the C-LSTM mod is a viable option for trend-strength prediction at 4-class and 6-class due to better performance and profitability.

加密货币因其安全性、匿名性和去中心化而成为一种流行的交易资产。然而,预测金融市场的走向可能具有挑战性,从而导致艰难的金融决策和潜在的损失。本研究的目的是深入了解斐波那契技术指标(TI)和基于趋势方向和价格强度(趋势强度)的多类分类对提高人工智能(AI)模型,特别是结合了长短期记忆(LSTM)的混合卷积神经网络(CNN)的性能和盈利能力的影响,并对其进行修改以降低其复杂性。本文的主要贡献在于引入了斐波那契 TI,展示了其对金融预测的影响,并纳入了以趋势强度为重点的多重分类技术,从而提高了预测的深度和准确性。最后,盈利能力分析揭示了利用斐波那契和多重分类的实际好处。盈利能力分析所采用的研究方法基于混合投资策略--方向和强度,采用了六阶段预测系统:数据收集、预处理、抽样、训练和预测、投资模拟和评估。实证研究结果表明,斐波那契 TI 提高了人工智能模型的性能(44% 的配置)和盈利能力(68% 的配置)。混合 CNN 的性能提高最多,尤其是 C-LSTM 模型的趋势(二进制-0.0023)和趋势强度(4 级-0.0020)和 6 级-0.0099)。混合 CNN(尤其是 CLSTM)的盈利能力和 CLSTM 模式的性能都有所提高。趋势强度预测在多头策略投资回报率(6.89%)和多空策略平均投资回报率方面都有最大改进。在混合 CNN 的选择方面,由于 C-LSTM mod 具有更好的性能和盈利能力,因此是 4 级和 6 级趋势强度预测的可行选择。
{"title":"Profitability trend prediction in crypto financial markets using Fibonacci technical indicator and hybrid CNN model","authors":"Bilal Hassan Ahmed Khattak, Imran Shafi, Chaudhary Hamza Rashid, Mejdl Safran, Sultan Alfarhood, Imran Ashraf","doi":"10.1186/s40537-024-00908-7","DOIUrl":"https://doi.org/10.1186/s40537-024-00908-7","url":null,"abstract":"<p>Cryptocurrency has become a popular trading asset due to its security, anonymity, and decentralization. However, predicting the direction of the financial market can be challenging, leading to difficult financial decisions and potential losses. The purpose of this study is to gain insights into the impact of Fibonacci technical indicator (TI) and multi-class classification based on trend direction and price-strength (trend-strength) to improve the performance and profitability of artificial intelligence (AI) models, particularly hybrid convolutional neural network (CNN) incorporating long short-term memory (LSTM), and to modify it to reduce its complexity. The main contribution of this paper lies in its introduction of Fibonacci TI, demonstrating its impact on financial prediction, and incorporation of a multi-classification technique focusing on trend strength, thereby enhancing the depth and accuracy of predictions. Lastly, profitability analysis sheds light on the tangible benefits of utilizing Fibonacci and multi-classification. The research methodology employed to carry out profitability analysis is based on a hybrid investment strategy—direction and strength by employing a six-stage predictive system: data collection, preprocessing, sampling, training and prediction, investment simulation, and evaluation. Empirical findings show that the Fibonacci TI has improved its performance (44% configurations) and profitability (68% configurations) of AI models. Hybrid CNNs showed most performance improvements particularly the C-LSTM model for trend (binary-0.0023) and trend-strength (4 class-0.0020) and 6 class-0.0099). Hybrid CNNs showed improved profitability, particularly in CLSTM, and performance in CLSTM mod. Trend-strength prediction showed max improvements in long strategy ROI (6.89%) and average ROIs for long-short strategy. Regarding the choice between hybrid CNNs, the C-LSTM mod is a viable option for trend-strength prediction at 4-class and 6-class due to better performance and profitability.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"60 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140811116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data resolving using Apache Spark for load forecasting and demand response in smart grid: a case study of Low Carbon London Project 使用 Apache Spark 解决智能电网中负荷预测和需求响应的大数据问题:伦敦低碳项目案例研究
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-28 DOI: 10.1186/s40537-024-00909-6
Hussien Ali El-Sayed Ali, M. H. Alham, Doaa Khalil Ibrahim

Using recent information and communication technologies for monitoring and management initiates a revolution in the smart grid. These technologies generate massive data that can only be processed using big data tools. This paper emphasizes the role of big data in resolving load forecasting, renewable energy sources integration, and demand response as significant aspects of smart grids. Meters data from the Low Carbon London Project is investigated as a case study. Because of the immense stream of meters' readings and exogenous data added to load forecasting models, addressing the problem is in the context of big data. Descriptive analytics are developed using Spark SQL to get insights regarding household energy consumption. Spark MLlib is utilized for predictive analytics by building scalable machine learning models accommodating meters' data streams. Multivariate polynomial regression and decision tree models are preferred here based on the big data point of view and the literature that ensures they are accurate and interpretable. The results confirmed the descriptive analytics and data visualization capabilities to provide valuable insights, guide the feature selection process, and enhance load forecasting models' accuracy. Accordingly, proper evaluation of demand response programs and integration of renewable energy resources is accomplished using achieved load forecasting results.

利用最新的信息和通信技术进行监控和管理是智能电网的一场革命。这些技术产生的海量数据只能通过大数据工具进行处理。本文强调了大数据在解决负荷预测、可再生能源整合和需求响应等智能电网重要方面的作用。本文以伦敦低碳项目的电表数据为案例进行研究。由于大量的电表读数和外源数据被添加到负荷预测模型中,因此需要在大数据背景下解决这一问题。使用 Spark SQL 开发了描述性分析方法,以深入了解家庭能源消耗情况。利用 Spark MLlib 建立可扩展的机器学习模型,以适应电表数据流,从而进行预测分析。基于大数据观点和确保其准确性和可解释性的文献,这里首选多变量多项式回归和决策树模型。结果证实,描述性分析和数据可视化功能可提供有价值的见解,指导特征选择过程,并提高负荷预测模型的准确性。因此,利用取得的负荷预测结果,可以对需求响应计划和可再生能源资源整合进行适当评估。
{"title":"Big data resolving using Apache Spark for load forecasting and demand response in smart grid: a case study of Low Carbon London Project","authors":"Hussien Ali El-Sayed Ali, M. H. Alham, Doaa Khalil Ibrahim","doi":"10.1186/s40537-024-00909-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00909-6","url":null,"abstract":"<p>Using recent information and communication technologies for monitoring and management initiates a revolution in the smart grid. These technologies generate massive data that can only be processed using big data tools. This paper emphasizes the role of big data in resolving load forecasting, renewable energy sources integration, and demand response as significant aspects of smart grids. Meters data from the Low Carbon London Project is investigated as a case study. Because of the immense stream of meters' readings and exogenous data added to load forecasting models, addressing the problem is in the context of big data. Descriptive analytics are developed using Spark SQL to get insights regarding household energy consumption. Spark MLlib is utilized for predictive analytics by building scalable machine learning models accommodating meters' data streams. Multivariate polynomial regression and decision tree models are preferred here based on the big data point of view and the literature that ensures they are accurate and interpretable. The results confirmed the descriptive analytics and data visualization capabilities to provide valuable insights, guide the feature selection process, and enhance load forecasting models' accuracy. Accordingly, proper evaluation of demand response programs and integration of renewable energy resources is accomplished using achieved load forecasting results.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"37 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140813019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Green and sustainable AI research: an integrated thematic and topic modeling analysis 绿色和可持续人工智能研究:专题和主题建模综合分析
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-22 DOI: 10.1186/s40537-024-00920-x
Raghu Raman, Debidutta Pattnaik, Hiran H. Lathabai, Chandan Kumar, Kannan Govindan, Prema Nedungadi

This investigation delves into Green AI and Sustainable AI literature through a dual-analytical approach, combining thematic analysis with BERTopic modeling to reveal both broad thematic clusters and nuanced emerging topics. It identifies three major thematic clusters: (1) Responsible AI for Sustainable Development, focusing on integrating sustainability and ethics within AI technologies; (2) Advancements in Green AI for Energy Optimization, centering on energy efficiency; and (3) Big Data-Driven Computational Advances, emphasizing AI’s influence on socio-economic and environmental aspects. Concurrently, BERTopic modeling uncovers five emerging topics: Ethical Eco-Intelligence, Sustainable Neural Computing, Ethical Healthcare Intelligence, AI Learning Quest, and Cognitive AI Innovation, indicating a trend toward embedding ethical and sustainability considerations into AI research. The study reveals novel intersections between Sustainable and Ethical AI and Green Computing, indicating significant research trends and identifying Ethical Healthcare Intelligence and AI Learning Quest as evolving areas within AI’s socio-economic and societal impacts. The study advocates for a unified approach to innovation in AI, promoting environmental sustainability and ethical integrity to foster responsible AI development. This aligns with the Sustainable Development Goals, emphasizing the need for ecological balance, societal welfare, and responsible innovation. This refined focus underscores the critical need for integrating ethical and environmental considerations into the AI development lifecycle, offering insights for future research directions and policy interventions.

本调查通过双重分析方法深入研究绿色人工智能和可持续人工智能文献,将主题分析与 BERTopic 模型相结合,以揭示广泛的主题集群和细微的新兴主题。研究确定了三大主题集群:(1) 负责任的人工智能促进可持续发展,侧重于将可持续性和伦理融入人工智能技术;(2) 绿色人工智能在能源优化方面的进展,以能源效率为中心;(3) 大数据驱动的计算进展,强调人工智能对社会经济和环境方面的影响。同时,BERTopic 模型还揭示了五个新兴主题:伦理生态智能、可持续神经计算、伦理医疗智能、人工智能学习探索和认知人工智能创新,表明了将伦理和可持续发展因素纳入人工智能研究的趋势。研究揭示了可持续和伦理人工智能与绿色计算之间的新交叉点,指出了重要的研究趋势,并将伦理医疗智能和人工智能学习探索确定为人工智能对社会经济和社会影响中不断发展的领域。该研究倡导采用统一的方法进行人工智能创新,促进环境可持续性和道德诚信,以推动负责任的人工智能发展。这与可持续发展目标相一致,强调了生态平衡、社会福利和负责任创新的必要性。这一细化的重点强调了将伦理和环境因素纳入人工智能发展生命周期的迫切需要,为未来的研究方向和政策干预提供了启示。
{"title":"Green and sustainable AI research: an integrated thematic and topic modeling analysis","authors":"Raghu Raman, Debidutta Pattnaik, Hiran H. Lathabai, Chandan Kumar, Kannan Govindan, Prema Nedungadi","doi":"10.1186/s40537-024-00920-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00920-x","url":null,"abstract":"<p>This investigation delves into Green AI and Sustainable AI literature through a dual-analytical approach, combining thematic analysis with BERTopic modeling to reveal both broad thematic clusters and nuanced emerging topics. It identifies three major thematic clusters: (1) Responsible AI for Sustainable Development, focusing on integrating sustainability and ethics within AI technologies; (2) Advancements in Green AI for Energy Optimization, centering on energy efficiency; and (3) Big Data-Driven Computational Advances, emphasizing AI’s influence on socio-economic and environmental aspects. Concurrently, BERTopic modeling uncovers five emerging topics: Ethical Eco-Intelligence, Sustainable Neural Computing, Ethical Healthcare Intelligence, AI Learning Quest, and Cognitive AI Innovation, indicating a trend toward embedding ethical and sustainability considerations into AI research. The study reveals novel intersections between Sustainable and Ethical AI and Green Computing, indicating significant research trends and identifying Ethical Healthcare Intelligence and AI Learning Quest as evolving areas within AI’s socio-economic and societal impacts. The study advocates for a unified approach to innovation in AI, promoting environmental sustainability and ethical integrity to foster responsible AI development. This aligns with the Sustainable Development Goals, emphasizing the need for ecological balance, societal welfare, and responsible innovation. This refined focus underscores the critical need for integrating ethical and environmental considerations into the AI development lifecycle, offering insights for future research directions and policy interventions.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"140 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved deep hashing model for image retrieval with binary code similarities 利用二进制代码相似性进行图像检索的改进型深度散列模型
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-18 DOI: 10.1186/s40537-024-00919-4
Huawen Liu, Zongda Wu, Minghao Yin, Donghua Yu, Xinzhong Zhu, Jungang Lou

The exponential growth of data raises an unprecedented challenge in data analysis: how to retrieve interesting information from such large-scale data. Hash learning is a promising solution to address this challenge, because it may bring many potential advantages, such as extremely high efficiency and low storage cost, after projecting high-dimensional data to compact binary codes. However, traditional hash learning algorithms often suffer from the problem of semantic inconsistency, where images with similar semantic features may have different binary codes. In this paper, we propose a novel end-to-end deep hashing method based on the similarities of binary codes, dubbed CSDH (Code Similarity-based Deep Hashing), for image retrieval. Specifically, it extracts deep features from images to capture semantic information using a pre-trained deep convolutional neural network. Additionally, a hidden and fully connected layer is attached at the end of the deep network to derive hash bits by virtue of an activation function. To preserve the semantic consistency of images, a loss function has been introduced. It takes the label similarities, as well as the Hamming embedding distances, into consideration. By doing so, CSDH can learn more compact and powerful hash codes, which not only can preserve semantic similarity but also have small Hamming distances between similar images. To verify the effectiveness of CSDH, we evaluate CSDH on two public benchmark image collections, i.e., CIFAR-10 and NUS-WIDE, with five classic shallow hashing models and six popular deep hashing ones. The experimental results show that CSDH can achieve competitive performance to the popular deep hashing algorithms.

数据的指数级增长给数据分析带来了前所未有的挑战:如何从如此大规模的数据中检索出有趣的信息。哈希学习是应对这一挑战的一个很有前景的解决方案,因为在将高维数据投射到紧凑的二进制代码后,它可能带来许多潜在的优势,如极高的效率和较低的存储成本。然而,传统的哈希学习算法往往存在语义不一致的问题,即具有相似语义特征的图像可能具有不同的二进制代码。在本文中,我们提出了一种基于二进制代码相似性的新型端到端深度散列方法,称为 CSDH(基于代码相似性的深度散列),用于图像检索。具体来说,它使用预先训练好的深度卷积神经网络从图像中提取深度特征,捕捉语义信息。此外,在深度网络的末端还附加了一个全连接的隐藏层,通过激活函数来推导散列比特。为了保持图像语义的一致性,我们引入了一个损失函数。它将标签相似性和汉明嵌入距离都考虑在内。这样,CSDH 就能学习到更紧凑、更强大的哈希编码,不仅能保持语义的相似性,而且相似图像之间的汉明距离也很小。为了验证 CSDH 的有效性,我们在两个公开的基准图像集(即 CIFAR-10 和 NUS-WIDE)上使用五种经典的浅散列模型和六种流行的深散列模型对 CSDH 进行了评估。实验结果表明,与流行的深度散列算法相比,CSDH 的性能更具竞争力。
{"title":"An improved deep hashing model for image retrieval with binary code similarities","authors":"Huawen Liu, Zongda Wu, Minghao Yin, Donghua Yu, Xinzhong Zhu, Jungang Lou","doi":"10.1186/s40537-024-00919-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00919-4","url":null,"abstract":"<p>The exponential growth of data raises an unprecedented challenge in data analysis: how to retrieve interesting information from such large-scale data. Hash learning is a promising solution to address this challenge, because it may bring many potential advantages, such as extremely high efficiency and low storage cost, after projecting high-dimensional data to compact binary codes. However, traditional hash learning algorithms often suffer from the problem of semantic inconsistency, where images with similar semantic features may have different binary codes. In this paper, we propose a novel end-to-end deep hashing method based on the similarities of binary codes, dubbed CSDH (Code Similarity-based Deep Hashing), for image retrieval. Specifically, it extracts deep features from images to capture semantic information using a pre-trained deep convolutional neural network. Additionally, a hidden and fully connected layer is attached at the end of the deep network to derive hash bits by virtue of an activation function. To preserve the semantic consistency of images, a loss function has been introduced. It takes the label similarities, as well as the Hamming embedding distances, into consideration. By doing so, CSDH can learn more compact and powerful hash codes, which not only can preserve semantic similarity but also have small Hamming distances between similar images. To verify the effectiveness of CSDH, we evaluate CSDH on two public benchmark image collections, i.e., CIFAR-10 and NUS-WIDE, with five classic shallow hashing models and six popular deep hashing ones. The experimental results show that CSDH can achieve competitive performance to the popular deep hashing algorithms.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"25 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting the potential value of vital signs in the real-time prediction of mortality risk in intensive care unit patients 重新审视生命体征在实时预测重症监护室患者死亡风险方面的潜在价值
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-18 DOI: 10.1186/s40537-024-00896-8
Pan Pan, Yue Wang, Chang Liu, Yanhui Tu, Haibo Cheng, Qingyun Yang, Fei Xie, Yuan Li, Lixin Xie, Yuhong Liu
<h3 data-test="abstract-sub-heading">Background</h3><p>Predicting patient mortality risk facilitates early intervention in intensive care unit (ICU) patients at greater risk of disease progression. This study applies machine learning methods to multidimensional clinical data to dynamically predict mortality risk in ICU patients.</p><h3 data-test="abstract-sub-heading">Methods</h3><p>A total of 33,798 patients in the MIMIC-III database were collected. An integrated model NIMRF (Network Integrating Memory Module and Random Forest) based on multidimensional variables such as vital sign variables and laboratory variables was developed to predict the risk of death for ICU patients in four non overlapping time windows of 0–1 h, 1–3 h, 3–6 h, and 6–12 h. Mortality risk in four nonoverlapping time windows of 12 h was externally validated on data from 889 patients in the respiratory critical care unit of the Chinese PLA General Hospital and compared with LSTM, random forest and time-dependent cox regression model (survival analysis) methods. We also interpret the developed model to obtain important factors for predicting mortality risk across time windows. The code can be found in https://github.com/wyuexiao/NIMRF.</p><h3 data-test="abstract-sub-heading">Results</h3><p>The NIMRF model developed in this study could predict the risk of death in four nonoverlapping time windows (0–1 h, 1–3 h, 3–6 h, 6–12 h) after any time point in ICU patients, and in internal data validation, it is suggested that the model is more accurate than LSTM, random forest prediction and time-dependent cox regression model (area under receiver operating characteristic curve, or AUC, 0–1 h: 0.8015 [95% CI 0.7725–0.8304] vs. 0.7144 [95%] CI 0.6824–0.7464] vs. 0.7606 [95% CI 0.7300–0.7913] vs 0.3867 [95% CI 0.3573–0.4161]; 1–3 h: 0.7100 [95% CI 0.6777–0.7423] vs. 0.6389 [95% CI 0.6055–0.6723] vs. 0.6992 [95% CI 0.6667–0.7318] vs 0.3854 [95% CI 0.3559–0.4150]; 3–6 h: 0.6760 [95% CI 0.6425–0.7097] vs. 0.5964 [95% CI 0.5622–0.6306] vs. 0.6760 [95% CI 0.6427–0.7099] vs 0.3967 [95% CI 0.3662–0.4271]; 6–12 h: 0.6380 [0.6031–0.6729] vs. 0.6032 [0.5705–0.6406] vs. 0.6055 [0.5682–0.6383] vs 0.4023 [95% CI 0.3709–0.4337]). External validation was performed on the data of patients in the respiratory critical care unit of the Chinese PLA General Hospital. Compared with LSTM, random forest and time-dependent cox regression model, the NIMRF model was still the best, with an AUC of 0.9366 [95% CI 0.9157–0.9575 for predicting death risk in 0–1 h]. The corresponding AUCs of LSTM, random forest and time-dependent cox regression model were 0.9263 [95% CI 0.9039–0.9486], 0.7437 [95% CI 0.7083–0.7791] and 0.2447 [95% CI 0.2202–0.2692], respectively. Interpretation of the model revealed that vital signs (systolic blood pressure, heart rate, diastolic blood pressure, respiratory rate, and body temperature) were highly correlated with events of death.</p><h3 data-test="abstract-sub-heading">Conclusion</h3><p>
背景预测患者的死亡风险有助于对疾病进展风险较大的重症监护室(ICU)患者进行早期干预。本研究将机器学习方法应用于多维临床数据,以动态预测重症监护病房患者的死亡风险。基于生命体征变量和实验室变量等多维变量,建立了一个集成模型NIMRF(网络集成记忆模块和随机森林),用于预测ICU患者在0-1小时、1-3小时、3-6小时和6-12小时四个非重叠时间窗内的死亡风险。在中国人民解放军总医院呼吸重症监护室 889 名患者的数据中对 12 小时内四个非重叠时间窗的死亡风险进行了外部验证,并与 LSTM、随机森林和时间依赖性 cox 回归模型(生存分析)方法进行了比较。我们还对所开发的模型进行了解释,以获得预测跨时间窗死亡风险的重要因素。本研究开发的 NIMRF 模型可预测 ICU 患者任意时间点后四个非重叠时间窗(0-1 h、1-3 h、3-6 h、6-12 h)内的死亡风险,内部数据验证表明,该模型比 LSTM、随机森林预测和时间依赖性 cox 回归模型更准确(接收器操作特征曲线下面积,或 AUC,0-1 h:0.8015 [95% CI 0.7725-0.8304] vs. 0.7144 [95%] CI 0.6824-0.7464] vs. 0.7606 [95% CI 0.7300-0.7913] vs. 0.3867 [95% CI 0.3573-0.4161]; 1-3 h: 0.7100 [95% CI 0.6777-0.7423] vs. 0.6389 [95% CI 0.6055-0.6723] vs. 0.6992 [95% CI 0.6667-0.7318] vs. 0.3854 [95% CI 0.3559-0.4150]; 3-6 h:0.6760 [95% CI 0.6425-0.7097] vs. 0.5964 [95% CI 0.5622-0.6306] vs. 0.6760 [95% CI 0.6427-0.7099] vs. 0.3967 [95% CI 0.3662-0.6-12小时:0.6380 [0.6031-0.6729] vs. 0.6032 [0.5705-0.6406] vs. 0.6055 [0.5682-0.6383] vs. 0.4023 [95% CI 0.3709-0.4337])。外部验证在中国人民解放军总医院呼吸重症监护室的患者数据中进行。与 LSTM、随机森林和时间依赖性 cox 回归模型相比,NIMRF 模型的 AUC 为 0.9366 [95% CI 0.9157-0.9575(预测 0-1 h 死亡风险)],仍然是最好的。LSTM、随机森林和时间依赖性 cox 回归模型的相应 AUC 分别为 0.9263 [95% CI 0.9039-0.9486]、0.7437 [95% CI 0.7083-0.7791]和 0.2447 [95% CI 0.2202-0.2692]。结论使用 NIMRF 模型可以整合 ICU 多维变量数据,尤其是生命体征变量数据,从而准确预测 ICU 患者的死亡事件。这些预测可以帮助临床医生选择更及时、更精确的治疗方法和干预措施,更重要的是,可以减少侵入性程序,节约医疗成本。
{"title":"Revisiting the potential value of vital signs in the real-time prediction of mortality risk in intensive care unit patients","authors":"Pan Pan, Yue Wang, Chang Liu, Yanhui Tu, Haibo Cheng, Qingyun Yang, Fei Xie, Yuan Li, Lixin Xie, Yuhong Liu","doi":"10.1186/s40537-024-00896-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00896-8","url":null,"abstract":"&lt;h3 data-test=\"abstract-sub-heading\"&gt;Background&lt;/h3&gt;&lt;p&gt;Predicting patient mortality risk facilitates early intervention in intensive care unit (ICU) patients at greater risk of disease progression. This study applies machine learning methods to multidimensional clinical data to dynamically predict mortality risk in ICU patients.&lt;/p&gt;&lt;h3 data-test=\"abstract-sub-heading\"&gt;Methods&lt;/h3&gt;&lt;p&gt;A total of 33,798 patients in the MIMIC-III database were collected. An integrated model NIMRF (Network Integrating Memory Module and Random Forest) based on multidimensional variables such as vital sign variables and laboratory variables was developed to predict the risk of death for ICU patients in four non overlapping time windows of 0–1 h, 1–3 h, 3–6 h, and 6–12 h. Mortality risk in four nonoverlapping time windows of 12 h was externally validated on data from 889 patients in the respiratory critical care unit of the Chinese PLA General Hospital and compared with LSTM, random forest and time-dependent cox regression model (survival analysis) methods. We also interpret the developed model to obtain important factors for predicting mortality risk across time windows. The code can be found in https://github.com/wyuexiao/NIMRF.&lt;/p&gt;&lt;h3 data-test=\"abstract-sub-heading\"&gt;Results&lt;/h3&gt;&lt;p&gt;The NIMRF model developed in this study could predict the risk of death in four nonoverlapping time windows (0–1 h, 1–3 h, 3–6 h, 6–12 h) after any time point in ICU patients, and in internal data validation, it is suggested that the model is more accurate than LSTM, random forest prediction and time-dependent cox regression model (area under receiver operating characteristic curve, or AUC, 0–1 h: 0.8015 [95% CI 0.7725–0.8304] vs. 0.7144 [95%] CI 0.6824–0.7464] vs. 0.7606 [95% CI 0.7300–0.7913] vs 0.3867 [95% CI 0.3573–0.4161]; 1–3 h: 0.7100 [95% CI 0.6777–0.7423] vs. 0.6389 [95% CI 0.6055–0.6723] vs. 0.6992 [95% CI 0.6667–0.7318] vs 0.3854 [95% CI 0.3559–0.4150]; 3–6 h: 0.6760 [95% CI 0.6425–0.7097] vs. 0.5964 [95% CI 0.5622–0.6306] vs. 0.6760 [95% CI 0.6427–0.7099] vs 0.3967 [95% CI 0.3662–0.4271]; 6–12 h: 0.6380 [0.6031–0.6729] vs. 0.6032 [0.5705–0.6406] vs. 0.6055 [0.5682–0.6383] vs 0.4023 [95% CI 0.3709–0.4337]). External validation was performed on the data of patients in the respiratory critical care unit of the Chinese PLA General Hospital. Compared with LSTM, random forest and time-dependent cox regression model, the NIMRF model was still the best, with an AUC of 0.9366 [95% CI 0.9157–0.9575 for predicting death risk in 0–1 h]. The corresponding AUCs of LSTM, random forest and time-dependent cox regression model were 0.9263 [95% CI 0.9039–0.9486], 0.7437 [95% CI 0.7083–0.7791] and 0.2447 [95% CI 0.2202–0.2692], respectively. Interpretation of the model revealed that vital signs (systolic blood pressure, heart rate, diastolic blood pressure, respiratory rate, and body temperature) were highly correlated with events of death.&lt;/p&gt;&lt;h3 data-test=\"abstract-sub-heading\"&gt;Conclusion&lt;/h3&gt;&lt;p&gt;","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"11 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing academic performance prediction with temporal graph networks for massive open online courses 利用时序图网络加强大规模开放式在线课程的学习成绩预测
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-13 DOI: 10.1186/s40537-024-00918-5
Qionghao Huang, Jili Chen

Educational big data significantly impacts education, and Massive Open Online Courses (MOOCs), a crucial learning approach, have evolved to be more intelligent with these technologies. Deep neural networks have significantly advanced the crucial task within MOOCs, predicting student academic performance. However, most deep learning-based methods usually ignore the temporal information and interaction behaviors during the learning activities, which can effectively enhance the model’s predictive accuracy. To tackle this, we formulate the learning processes of e-learning students as dynamic temporal graphs to encode the temporal information and interaction behaviors during their studying. We propose a novel academic performance prediction model (APP-TGN) based on temporal graph neural networks. Specifically, in APP-TGN, a dynamic graph is constructed from online learning activity logs. A temporal graph network with low-high filters learns potential academic performance variations encoded in dynamic graphs. Furthermore, a global sampling module is developed to mitigate the problem of false correlations in deep learning-based models. Finally, multi-head attention is utilized for predicting academic outcomes. Extensive experiments are conducted on a well-known public dataset. The experimental results indicate that APP-TGN significantly surpasses existing methods and demonstrates excellent potential in automated feedback and personalized learning.

教育大数据对教育产生了重大影响,而大规模开放式在线课程(MOOCs)作为一种重要的学习方法,在这些技术的推动下变得更加智能。深度神经网络极大地推动了 MOOC 的关键任务--预测学生的学习成绩。然而,大多数基于深度学习的方法通常会忽略学习活动中的时间信息和交互行为,而这些信息和行为可以有效提高模型的预测准确性。为此,我们将网络学习学生的学习过程表述为动态时序图,以编码学习过程中的时序信息和交互行为。我们提出了一种基于时序图神经网络的新型学业成绩预测模型(APP-TGN)。具体来说,APP-TGN 是根据在线学习活动日志构建的动态图。带有低-高过滤器的时序图网络可以学习动态图中编码的潜在学习成绩变化。此外,还开发了一个全局采样模块,以减轻基于深度学习的模型中的错误相关性问题。最后,多头注意力被用于预测学习成绩。我们在一个著名的公共数据集上进行了广泛的实验。实验结果表明,APP-TGN 显著超越了现有方法,并在自动反馈和个性化学习方面展现出卓越的潜力。
{"title":"Enhancing academic performance prediction with temporal graph networks for massive open online courses","authors":"Qionghao Huang, Jili Chen","doi":"10.1186/s40537-024-00918-5","DOIUrl":"https://doi.org/10.1186/s40537-024-00918-5","url":null,"abstract":"<p>Educational big data significantly impacts education, and Massive Open Online Courses (MOOCs), a crucial learning approach, have evolved to be more intelligent with these technologies. Deep neural networks have significantly advanced the crucial task within MOOCs, predicting student academic performance. However, most deep learning-based methods usually ignore the temporal information and interaction behaviors during the learning activities, which can effectively enhance the model’s predictive accuracy. To tackle this, we formulate the learning processes of e-learning students as dynamic temporal graphs to encode the temporal information and interaction behaviors during their studying. We propose a novel academic performance prediction model (APP-TGN) based on temporal graph neural networks. Specifically, in APP-TGN, a dynamic graph is constructed from online learning activity logs. A temporal graph network with low-high filters learns potential academic performance variations encoded in dynamic graphs. Furthermore, a global sampling module is developed to mitigate the problem of false correlations in deep learning-based models. Finally, multi-head attention is utilized for predicting academic outcomes. Extensive experiments are conducted on a well-known public dataset. The experimental results indicate that APP-TGN significantly surpasses existing methods and demonstrates excellent potential in automated feedback and personalized learning.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"8 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The differences in gastric cancer epidemiological data between SEER and GBD: a joinpoint and age-period-cohort analysis SEER 和 GBD 胃癌流行病学数据的差异:连接点和年龄段队列分析
IF 8.1 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-04-13 DOI: 10.1186/s40537-024-00907-8
Zenghong Wu, Kun Zhang, Weijun Wang, Mengke Fan, Rong Lin

Background

The burden of gastric cancer (GC) should be further clarified worldwide, and helped us to understand the current situation of GC.

Methods

In the present study, we estimated disability-adjusted life-years (DALYs) and mortality rates attributable to several major GC risk factors, including smoking, dietary risk, and behavioral risk. In addition, we evaluated the incidence rate and trends of incidence-based mortality (IBM) due to GC in the United States (US) during 1992–2018.

Results

Globally, GC incidences increased from 883,395 in 1990 to 1,269,805 in 2019 while GC-associated mortality increased from 788,316 in 1990 to 957,185 in 2019. In 2019, the age-standardized rate (ASR) of GC exhibited variations around the world, with Mongolia having the highest observed ASR (43.7 per 100,000), followed by Bolivia (34 per 100,000) and China (30.6 per 100,000). A negative association was found among estimated annual percentage change (EAPC) and ASR (age-standardized incidence rate (ASIR): r = − 0.28, p < 0.001; age-standardized death rate (ASDR): r = − 0.19, p = 0.005). There were 74,966 incidences of GC and 69,374 GC-related deaths recorded between 1992 and 2018. The significant decrease in GC incidences as well as decreasing trends in IBM of GC were first detected in 1994. The GC IBM significantly increased at a rate of 35%/y from 1992 to 1994 (95% CI 21.2% to 50.4%/y), and then begun to decrease at a rate of − 1.4%/y from 1994 to 2018 (95% CI − 1.6% to − 1.2%/y).

Conclusion

These findings mirror the global disease burden of GC and are important for development of targeted prevention strategies.

背景应进一步明确胃癌(GC)在全球范围内的负担,并帮助我们了解胃癌的现状。方法在本研究中,我们估算了几个主要胃癌风险因素(包括吸烟、饮食风险和行为风险)导致的残疾调整生命年(DALYs)和死亡率。此外,我们还评估了1992-2018年期间美国(US)GC发病率和基于发病率的死亡率(IBM)趋势。结果在全球范围内,GC发病率从1990年的883,395例增加到2019年的1,269,805例,而GC相关死亡率从1990年的788,316例增加到2019年的957,185例。2019 年,全球肺结核的年龄标准化发病率(ASR)在全球范围内呈现出差异,蒙古的年龄标准化发病率最高(43.7/100,000),其次是玻利维亚(34/100,000)和中国(30.6/100,000)。估计年度百分比变化(EAPC)与年龄标准化发病率(ASIR)呈负相关(年龄标准化发病率(ASIR):r = - 0.28,p < 0.001;年龄标准化死亡率(ASDR):r = - 0.19,p = 0.005)。1992 年至 2018 年间,共记录了 74966 例 GC 发病和 69374 例 GC 相关死亡。GC 发病率的大幅下降以及 GC IBM 的下降趋势于 1994 年首次被发现。从1992年到1994年,GC IBM以35%/年的速度大幅上升(95% CI为21.2%至50.4%/年),然后从1994年到2018年开始以-1.4%/年的速度下降(95% CI为-1.6%至-1.2%/年)。
{"title":"The differences in gastric cancer epidemiological data between SEER and GBD: a joinpoint and age-period-cohort analysis","authors":"Zenghong Wu, Kun Zhang, Weijun Wang, Mengke Fan, Rong Lin","doi":"10.1186/s40537-024-00907-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00907-8","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>The burden of gastric cancer (GC) should be further clarified worldwide, and helped us to understand the current situation of GC.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>In the present study, we estimated disability-adjusted life-years (DALYs) and mortality rates attributable to several major GC risk factors, including smoking, dietary risk, and behavioral risk. In addition, we evaluated the incidence rate and trends of incidence-based mortality (IBM) due to GC in the United States (US) during 1992–2018.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Globally, GC incidences increased from 883,395 in 1990 to 1,269,805 in 2019 while GC-associated mortality increased from 788,316 in 1990 to 957,185 in 2019. In 2019, the age-standardized rate (ASR) of GC exhibited variations around the world, with Mongolia having the highest observed ASR (43.7 per 100,000), followed by Bolivia (34 per 100,000) and China (30.6 per 100,000). A negative association was found among estimated annual percentage change (EAPC) and ASR (age-standardized incidence rate (ASIR): r = − 0.28, <i>p</i> &lt; 0.001; age-standardized death rate (ASDR): r = − 0.19, <i>p</i> = 0.005). There were 74,966 incidences of GC and 69,374 GC-related deaths recorded between 1992 and 2018. The significant decrease in GC incidences as well as decreasing trends in IBM of GC were first detected in 1994. The GC IBM significantly increased at a rate of 35%/y from 1992 to 1994 (95% CI 21.2% to 50.4%/y), and then begun to decrease at a rate of − 1.4%/y from 1994 to 2018 (95% CI − 1.6% to − 1.2%/y).</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>These findings mirror the global disease burden of GC and are important for development of targeted prevention strategies.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"26 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1