首页 > 最新文献

BenchCouncil Transactions on Benchmarks, Standards and Evaluations最新文献

英文 中文
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 2","pages":"Article 100220"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147196964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 4","pages":"Article 100259"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147038425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 4","pages":"Article 100251"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147038434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
{"title":"","authors":"","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 2","pages":"Article 100219"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147196963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open Source Evaluatology: An evaluation framework and methodology for open source ecosystems based on evaluatology 开源评估学:基于评估学的开源生态系统评估框架和方法
Pub Date : 2024-12-01 DOI: 10.1016/j.tbench.2025.100190
Fanyu Han , Shengyu Zhao , Wei Wang , Aoying Zhou , Weining Qian , Xuan Zhou , Jiaheng Peng , Lan You , Yang Chen , Xiaoya Xia , Yenan Tang , Liyun Yang , Chunqi Tian
The open-source ecosystem, as an important component of the modern software industry, has increasingly attracted attention from both academia and industry regarding its evaluation. However, current open-source evaluation methods face several issues, such as inconsistent evaluation standards, lack of theoretical support in the evaluation process, and poor comparability of evaluation results. Guided by the foundational theories of evaluatology, this paper proposes a new interdisciplinary research field, Open Source Evaluatology, and constructs an evaluation theoretical framework and methodology for open-source ecosystems. The main contributions of this paper include: (1) Based on the five axioms of evaluation theory, a theoretical system for Open Source Evaluatology is developed, and the basic concepts, evaluation dimensions, and evaluation standards for the open-source ecosystem are proposed; (2) An evaluation conditions (EC) framework is designed, encompassing five levels: problem definition, task instances, algorithm mechanisms, implementation examples, and supporting systems. A combined evaluation model (EM) based on statistical metrics and network metrics is also introduced; (3) Experimental validation using the GitHub dataset shows that the proposed evaluation framework effectively assesses various features of open-source projects, developers, and communities, and has been verified in multiple practical application scenarios. The research demonstrates that Open Source Evaluatology provides a standardized theoretical guide and methodological support for open-source ecosystem evaluation, which can be widely applied in various scenarios, such as open-source project selection, developer evaluation, and community management, and plays a significant role in promoting the healthy and sustainable development of open-source ecosystems.
开源生态系统作为现代软件产业的重要组成部分,其评价日益受到学术界和业界的关注。然而,目前的开源评价方法面临着评价标准不统一、评价过程缺乏理论支持、评价结果可比性差等问题。在评估学基础理论的指导下,提出了一个新的跨学科研究领域——开源评估学,构建了开源生态系统评估的理论框架和方法。本文的主要贡献有:(1)基于评估理论的五大公理,构建了开源评估学的理论体系,提出了开源生态系统的基本概念、评估维度和评估标准;(2)设计了评估条件(EC)框架,包括问题定义、任务实例、算法机制、实现示例和支持系统五个层次。介绍了一种基于统计度量和网络度量的综合评价模型;(3)基于GitHub数据集的实验验证表明,本文提出的评估框架有效地评估了开源项目、开发者和社区的各种特征,并在多个实际应用场景中得到了验证。研究表明,开源评估学为开源生态系统评估提供了标准化的理论指导和方法支持,可广泛应用于开源项目选择、开发者评估、社区管理等多种场景,对促进开源生态系统健康可持续发展具有重要作用。
{"title":"Open Source Evaluatology: An evaluation framework and methodology for open source ecosystems based on evaluatology","authors":"Fanyu Han ,&nbsp;Shengyu Zhao ,&nbsp;Wei Wang ,&nbsp;Aoying Zhou ,&nbsp;Weining Qian ,&nbsp;Xuan Zhou ,&nbsp;Jiaheng Peng ,&nbsp;Lan You ,&nbsp;Yang Chen ,&nbsp;Xiaoya Xia ,&nbsp;Yenan Tang ,&nbsp;Liyun Yang ,&nbsp;Chunqi Tian","doi":"10.1016/j.tbench.2025.100190","DOIUrl":"10.1016/j.tbench.2025.100190","url":null,"abstract":"<div><div>The open-source ecosystem, as an important component of the modern software industry, has increasingly attracted attention from both academia and industry regarding its evaluation. However, current open-source evaluation methods face several issues, such as inconsistent evaluation standards, lack of theoretical support in the evaluation process, and poor comparability of evaluation results. Guided by the foundational theories of evaluatology, this paper proposes a new interdisciplinary research field, Open Source Evaluatology, and constructs an evaluation theoretical framework and methodology for open-source ecosystems. The main contributions of this paper include: (1) Based on the five axioms of evaluation theory, a theoretical system for Open Source Evaluatology is developed, and the basic concepts, evaluation dimensions, and evaluation standards for the open-source ecosystem are proposed; (2) An evaluation conditions (EC) framework is designed, encompassing five levels: problem definition, task instances, algorithm mechanisms, implementation examples, and supporting systems. A combined evaluation model (EM) based on statistical metrics and network metrics is also introduced; (3) Experimental validation using the GitHub dataset shows that the proposed evaluation framework effectively assesses various features of open-source projects, developers, and communities, and has been verified in multiple practical application scenarios. The research demonstrates that Open Source Evaluatology provides a standardized theoretical guide and methodological support for open-source ecosystem evaluation, which can be widely applied in various scenarios, such as open-source project selection, developer evaluation, and community management, and plays a significant role in promoting the healthy and sustainable development of open-source ecosystems.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 4","pages":"Article 100190"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145871577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COADBench: A benchmark for revealing the relationship between AI models and clinical outcomes COADBench:揭示人工智能模型与临床结果之间关系的基准
Pub Date : 2024-12-01 DOI: 10.1016/j.tbench.2025.100198
Jiyue Xie , Wenjing Liu , Li Ma , Caiqin Yao , Qi Liang , Suqin Tang , Yunyou Huang
Alzheimer’s disease (AD), due to its irreversible nature and the severe social burden it causes, has garnered significant attention from AI researchers. Numerous auxiliary diagnostic models have been developed with the aim of improving AD diagnostic services and thereby reducing the social burden. However, due to a lack of validation regarding the clinical value of these models, no AD diagnostic model has been widely accepted by clinicians or officially approved for use in enhancing AD diagnostic services. The clinical value of traditional medical devices is validated through rigorous randomized controlled trials to prove their impact on clinical outcomes. In contrast, current AD diagnostic models are only validated based on their accuracy, and the relationship between these models and patient outcomes remains unknown. This gap has hindered the acceptance and clinical use of AD diagnostic models by healthcare professionals. To address this issue, we introduce the COADBench, a benchmark centered on clinical outcomes for evaluating the clinical value of AD diagnostic models. COADBench curated subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database who have at least two cognitive score records (the most commonly used clinical endpoint in AD clinical trials) from different follow-up visits. To the best of our knowledge, for the first time, it links the cognitive scores of subjects with model performance, using patient cognitive scores as clinical outcomes after intervention to evaluate the models. Through the benchmarking of current mainstream AD diagnostic algorithms using COADBench, we find that there was no significant correlation between the subjects’ cognitive improvement and the model’s performance, which means that the current performance evaluation criteria of mainstream AD diagnostic algorithms are not combined with clinical value.
阿尔茨海默病(AD)因其不可逆性和严重的社会负担,引起了人工智能研究人员的极大关注。为了改善阿尔茨海默病的诊断服务,从而减轻社会负担,已经开发了许多辅助诊断模型。然而,由于缺乏对这些模型的临床价值的验证,没有一个AD诊断模型被临床医生广泛接受或正式批准用于增强AD诊断服务。传统医疗器械的临床价值是通过严格的随机对照试验来验证的,以证明它们对临床结果的影响。相比之下,目前的AD诊断模型仅基于其准确性进行验证,这些模型与患者预后之间的关系尚不清楚。这一差距阻碍了医疗保健专业人员对AD诊断模型的接受和临床使用。为了解决这个问题,我们引入了COADBench,这是一个以临床结果为中心的基准,用于评估AD诊断模型的临床价值。COADBench从阿尔茨海默病神经影像学倡议(ADNI)数据库中挑选了至少有两个不同随访的认知评分记录(阿尔茨海默病临床试验中最常用的临床终点)的受试者。据我们所知,这是第一次将受试者的认知得分与模型性能联系起来,使用患者的认知得分作为干预后评估模型的临床结果。通过使用COADBench对当前主流AD诊断算法进行对标,我们发现被试的认知改善与模型的性能之间没有显著的相关性,这意味着目前主流AD诊断算法的性能评价标准没有与临床价值相结合。
{"title":"COADBench: A benchmark for revealing the relationship between AI models and clinical outcomes","authors":"Jiyue Xie ,&nbsp;Wenjing Liu ,&nbsp;Li Ma ,&nbsp;Caiqin Yao ,&nbsp;Qi Liang ,&nbsp;Suqin Tang ,&nbsp;Yunyou Huang","doi":"10.1016/j.tbench.2025.100198","DOIUrl":"10.1016/j.tbench.2025.100198","url":null,"abstract":"<div><div>Alzheimer’s disease (AD), due to its irreversible nature and the severe social burden it causes, has garnered significant attention from AI researchers. Numerous auxiliary diagnostic models have been developed with the aim of improving AD diagnostic services and thereby reducing the social burden. However, due to a lack of validation regarding the clinical value of these models, no AD diagnostic model has been widely accepted by clinicians or officially approved for use in enhancing AD diagnostic services. The clinical value of traditional medical devices is validated through rigorous randomized controlled trials to prove their impact on clinical outcomes. In contrast, current AD diagnostic models are only validated based on their accuracy, and the relationship between these models and patient outcomes remains unknown. This gap has hindered the acceptance and clinical use of AD diagnostic models by healthcare professionals. To address this issue, we introduce the COADBench, a benchmark centered on clinical outcomes for evaluating the clinical value of AD diagnostic models. COADBench curated subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database who have at least two cognitive score records (the most commonly used clinical endpoint in AD clinical trials) from different follow-up visits. To the best of our knowledge, for the first time, it links the cognitive scores of subjects with model performance, using patient cognitive scores as clinical outcomes after intervention to evaluate the models. Through the benchmarking of current mainstream AD diagnostic algorithms using COADBench, we find that there was no significant correlation between the subjects’ cognitive improvement and the model’s performance, which means that the current performance evaluation criteria of mainstream AD diagnostic algorithms are not combined with clinical value.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 4","pages":"Article 100198"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145871523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating long-term usage patterns of open source datasets: A citation network approach 评估开源数据集的长期使用模式:引文网络方法
Pub Date : 2024-12-01 DOI: 10.1016/j.tbench.2025.100199
Jiaheng Peng, Fanyu Han, Wei Wang
The evaluation of datasets serves as a fundamental basis for tasks in evaluatology. Evaluating the usage patterns of datasets has a significant impact on the selection of appropriate datasets. Many renowned Open Source datasets are well-established and have not been updated for many years, yet they continue to be widely used by a large number of researchers. Due to this characteristic, conventional Open Source metrics (e.g., number of stars, issues, and activity) are insufficient for evaluating the long-term usage patterns based on log activity data from their GitHub repositories.
Researchers often encounter significant challenges in selecting appropriate datasets due to the lack of insight into how these datasets are being utilized. To address this challenge, this paper proposes establishing a connection between Open Source datasets and the citation networks of their corresponding academic papers. By mining the citation network of the corresponding academic paper, we can obtain rich graph-structured information, such as citation times, authors, and more. Utilizing this information, we can evaluate the long-term usage patterns of the associated Open Source dataset.
Furthermore, this paper conducts extensive experiments based on five major dataset categories (Texts, Images, Videos, Audio, Medical) to demonstrate that the proposed method effectively evaluates the long-term usage patterns of Open Source datasets. Additionally, the insights gained from the experimental results can serve as a valuable reference for future researchers in selecting appropriate datasets for their work.
数据集的评估是评估任务的基本基础。评估数据集的使用模式对选择合适的数据集具有重要影响。许多著名的开源数据集已经建立良好,并且多年没有更新,但它们仍然被大量研究人员广泛使用。由于这个特点,传统的开源指标(例如,星星数量、问题和活动)不足以评估基于GitHub存储库的日志活动数据的长期使用模式。由于缺乏对这些数据集如何被利用的了解,研究人员在选择适当的数据集时经常遇到重大挑战。为了应对这一挑战,本文建议在开源数据集与其相应学术论文的引文网络之间建立连接。通过挖掘相应学术论文的引文网络,我们可以获得丰富的图结构信息,如引文次数、作者等。利用这些信息,我们可以评估相关开源数据集的长期使用模式。此外,本文基于五个主要数据集类别(文本、图像、视频、音频、医疗)进行了大量实验,以证明所提出的方法有效地评估了开源数据集的长期使用模式。此外,从实验结果中获得的见解可以为未来的研究人员选择合适的数据集提供有价值的参考。
{"title":"Evaluating long-term usage patterns of open source datasets: A citation network approach","authors":"Jiaheng Peng,&nbsp;Fanyu Han,&nbsp;Wei Wang","doi":"10.1016/j.tbench.2025.100199","DOIUrl":"10.1016/j.tbench.2025.100199","url":null,"abstract":"<div><div>The evaluation of datasets serves as a fundamental basis for tasks in evaluatology. Evaluating the usage patterns of datasets has a significant impact on the selection of appropriate datasets. Many renowned Open Source datasets are well-established and have not been updated for many years, yet they continue to be widely used by a large number of researchers. Due to this characteristic, conventional Open Source metrics (e.g., number of stars, issues, and activity) are insufficient for evaluating the long-term usage patterns based on log activity data from their GitHub repositories.</div><div>Researchers often encounter significant challenges in selecting appropriate datasets due to the lack of insight into how these datasets are being utilized. To address this challenge, this paper proposes establishing a connection between Open Source datasets and the citation networks of their corresponding academic papers. By mining the citation network of the corresponding academic paper, we can obtain rich graph-structured information, such as citation times, authors, and more. Utilizing this information, we can evaluate the long-term usage patterns of the associated Open Source dataset.</div><div>Furthermore, this paper conducts extensive experiments based on five major dataset categories (Texts, Images, Videos, Audio, Medical) to demonstrate that the proposed method effectively evaluates the long-term usage patterns of Open Source datasets. Additionally, the insights gained from the experimental results can serve as a valuable reference for future researchers in selecting appropriate datasets for their work.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 4","pages":"Article 100199"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145871525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patrick Star: A comprehensive benchmark for multi-modal image editing Patrick Star:多模态图像编辑的综合基准
Pub Date : 2024-12-01 DOI: 10.1016/j.tbench.2025.100201
Di Cheng , ZhengXin Yang , ChunJie Luo , Chen Zheng , YingJie Shi
Generative image editing enhances and automates traditional image designing methods. However, there is a significant imbalance in existing research, where the development of sketch-guided and example-guided image editing has not been sufficiently explored compared to text-guided image editing, despite the former being equally important in real-world applications. The leading cause of this phenomenon is the severe lack of corresponding benchmark datasets. To address this issue, this paper proposes a comprehensive and unified benchmark dataset, Patrick Star, which consists of approximately 500 test images, to promote balanced development in this field across multi-task and multi-modal settings. First, theoretical analysis grounded in Evaluatology highlights the importance of establishing a balanced benchmark dataset to advance research in image editing. Building on this theoretical foundation, the dataset’s construction methodology is explained in detail, ensuring it addresses critical gaps in existing studies. Next, statistical analyses are conducted to verify the dataset’s usability and diversity. Finally, comparative experiments underscore the dataset’s potential as a comprehensive benchmark, demonstrating its capacity to support balanced development in image editing.
生成图像编辑是对传统图像设计方法的改进和自动化。然而,在现有的研究中存在着明显的不平衡,尽管草图引导和示例引导图像编辑在现实应用中同样重要,但与文本引导图像编辑相比,草图引导和示例引导图像编辑的发展尚未得到充分的探索。造成这种现象的主要原因是严重缺乏相应的基准数据集。为了解决这一问题,本文提出了一个全面统一的基准数据集Patrick Star,该数据集由大约500个测试图像组成,以促进该领域在多任务和多模式设置下的平衡发展。首先,基于评估学的理论分析强调了建立平衡的基准数据集对推进图像编辑研究的重要性。在此理论基础上,详细解释了数据集的构建方法,确保它解决了现有研究中的关键空白。然后进行统计分析,验证数据集的可用性和多样性。最后,对比实验强调了数据集作为综合基准的潜力,展示了其支持图像编辑平衡发展的能力。
{"title":"Patrick Star: A comprehensive benchmark for multi-modal image editing","authors":"Di Cheng ,&nbsp;ZhengXin Yang ,&nbsp;ChunJie Luo ,&nbsp;Chen Zheng ,&nbsp;YingJie Shi","doi":"10.1016/j.tbench.2025.100201","DOIUrl":"10.1016/j.tbench.2025.100201","url":null,"abstract":"<div><div>Generative image editing enhances and automates traditional image designing methods. However, there is a significant imbalance in existing research, where the development of sketch-guided and example-guided image editing has not been sufficiently explored compared to text-guided image editing, despite the former being equally important in real-world applications. The leading cause of this phenomenon is the severe lack of corresponding benchmark datasets. To address this issue, this paper proposes a comprehensive and unified benchmark dataset, Patrick Star, which consists of approximately 500 test images, to promote balanced development in this field across multi-task and multi-modal settings. First, theoretical analysis grounded in Evaluatology highlights the importance of establishing a balanced benchmark dataset to advance research in image editing. Building on this theoretical foundation, the dataset’s construction methodology is explained in detail, ensuring it addresses critical gaps in existing studies. Next, statistical analyses are conducted to verify the dataset’s usability and diversity. Finally, comparative experiments underscore the dataset’s potential as a comprehensive benchmark, demonstrating its capacity to support balanced development in image editing.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 4","pages":"Article 100201"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145871576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced Deep Learning Models for Improving Movie Rating Predictions: A Benchmarking Study 用于提高电影评分预测的高级深度学习模型:基准研究
Pub Date : 2024-12-01 DOI: 10.1016/j.tbench.2025.100200
Manisha Valera , Dr. Rahul Mehta
Predicting movie ratings very precisely has become a vital aspect of personalized recommendation systems, which requires robust and high-performing models. for evaluating the effectiveness in predicting movie ratings, this study conducts a comprehensive performance analysis of various deep learning architectures, which includes BiLSTM, CNN + LSTM, CNN + GRU, CNN + Attention, CNN, VAE, Simple RNN, GRU + Attention, Transformer Encoder, FNN and ResNet. Here each model’s performance is evaluated on movie reviews’ dataset, enhanced with sentiment scores and user ratings, by using a range of evaluation metrics: Mean Absolute Error (MAE), R² score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Explained Variance. Here the results highlight distinct strengths and weaknesses among the models, in which VAE model consistently delivering superior accuracy, whereas attention-based models prove prominent improvements in interpretability and generalization. This analysis offers important insights into choosing models for movie recommendation systems, which also highlights the balance between prediction accuracy and computational efficiency. The discoveries from this study serve as a benchmark for future developments in movie rating prediction, supporting the researchers and practitioners in augmenting recommendation system performance.
非常精确地预测电影评分已经成为个性化推荐系统的一个重要方面,这需要强大和高性能的模型。为了评估预测电影评分的有效性,本研究对各种深度学习架构进行了全面的性能分析,包括BiLSTM、CNN + LSTM、CNN + GRU、CNN + Attention、CNN、VAE、Simple RNN、GRU + Attention、Transformer Encoder、FNN和ResNet。在这里,每个模型的表现都是在电影评论的数据集上进行评估的,通过使用一系列评估指标:平均绝对误差(MAE)、R²分数、均方误差(MSE)、均方根误差(RMSE)和解释方差(Explained Variance),用情感分数和用户评级来增强。这里的结果突出了模型之间不同的优点和缺点,其中VAE模型始终提供优越的准确性,而基于注意力的模型在可解释性和泛化方面证明了显著的改进。这一分析为电影推荐系统选择模型提供了重要的见解,也强调了预测精度和计算效率之间的平衡。本研究的发现可以作为电影评分预测未来发展的基准,支持研究人员和从业者增强推荐系统的性能。
{"title":"Advanced Deep Learning Models for Improving Movie Rating Predictions: A Benchmarking Study","authors":"Manisha Valera ,&nbsp;Dr. Rahul Mehta","doi":"10.1016/j.tbench.2025.100200","DOIUrl":"10.1016/j.tbench.2025.100200","url":null,"abstract":"<div><div>Predicting movie ratings very precisely has become a vital aspect of personalized recommendation systems, which requires robust and high-performing models. for evaluating the effectiveness in predicting movie ratings, this study conducts a comprehensive performance analysis of various deep learning architectures, which includes BiLSTM, CNN + LSTM, CNN + GRU, CNN + Attention, CNN, VAE, Simple RNN, GRU + Attention, Transformer Encoder, FNN and ResNet. Here each model’s performance is evaluated on movie reviews’ dataset, enhanced with sentiment scores and user ratings, by using a range of evaluation metrics: Mean Absolute Error (MAE), R² score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Explained Variance. Here the results highlight distinct strengths and weaknesses among the models, in which VAE model consistently delivering superior accuracy, whereas attention-based models prove prominent improvements in interpretability and generalization. This analysis offers important insights into choosing models for movie recommendation systems, which also highlights the balance between prediction accuracy and computational efficiency. The discoveries from this study serve as a benchmark for future developments in movie rating prediction, supporting the researchers and practitioners in augmenting recommendation system performance.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 4","pages":"Article 100200"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145871526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-powered Mathematical Sentiment Model and graph theory for social media trends 社交媒体趋势的人工智能数学情感模型和图论
Pub Date : 2024-12-01 DOI: 10.1016/j.tbench.2025.100202
M. VENKATACHALAM , R․VIKRAMA PRASAD
Significant issues have arisen as a result of the global spread of monkeypox, such as the extensive transmission of false information, public fear, and stigmatization on social media. Increased fear, prejudice, stigmatization of minority groups, and opposition to public health initiatives are frequently the results of these problems. Furthermore, health authorities are unable to provide correct information and prompt actions due to a lack of efficient methods for analyzing the enormous amounts of unstructured social media data. This disparity weakens crisis management initiatives and increases public skepticism of health guidelines. In order to address these issues, this study looks into the attitude around monkeypox on social media in order to pinpoint public worries, counter false information, and enhance communication tactics. The study intends to improve public comprehension, offer practical insights, and help health authorities manage the outbreak by fusing graph theory with AI-driven sentiment analysis. In order to facilitate semantic analysis of tweets through structured information extraction, graph theory is used to organize unstructured or semi-structured data by creating meaningful links between entities. Furthermore, opinions on monkeypox infection in social media are analyzed and user sentiments are detected using a reinforcement Markov decision process. According to experimental results, the suggested model's accuracy on the Monkeypox tweet dataset was 98 %. These results help raise awareness of monkeypox among the general population and promote an educated and robust social response.
猴痘的全球传播导致了一些重大问题,如虚假信息的广泛传播、公众的恐惧以及社交媒体上的污名化。这些问题的结果往往是对少数群体的恐惧、偏见、污名化和对公共卫生倡议的反对。此外,由于缺乏分析大量非结构化社交媒体数据的有效方法,卫生当局无法提供正确的信息和迅速采取行动。这种差距削弱了危机管理举措,并增加了公众对健康指南的怀疑。为了解决这些问题,本研究调查了社交媒体上对猴痘的态度,以确定公众的担忧,反击虚假信息,并加强沟通策略。该研究旨在通过融合图论和人工智能驱动的情绪分析,提高公众的理解,提供实用的见解,并帮助卫生当局管理疫情。为了便于通过结构化信息提取对tweets进行语义分析,图论通过在实体之间创建有意义的链接来组织非结构化或半结构化数据。此外,分析了社交媒体上关于猴痘感染的意见,并使用强化马尔可夫决策过程检测用户情绪。实验结果表明,该模型在猴痘推文数据集上的准确率为98%。这些结果有助于提高一般人群对猴痘的认识,并促进有知识和强有力的社会反应。
{"title":"AI-powered Mathematical Sentiment Model and graph theory for social media trends","authors":"M. VENKATACHALAM ,&nbsp;R․VIKRAMA PRASAD","doi":"10.1016/j.tbench.2025.100202","DOIUrl":"10.1016/j.tbench.2025.100202","url":null,"abstract":"<div><div>Significant issues have arisen as a result of the global spread of monkeypox, such as the extensive transmission of false information, public fear, and stigmatization on social media. Increased fear, prejudice, stigmatization of minority groups, and opposition to public health initiatives are frequently the results of these problems. Furthermore, health authorities are unable to provide correct information and prompt actions due to a lack of efficient methods for analyzing the enormous amounts of unstructured social media data. This disparity weakens crisis management initiatives and increases public skepticism of health guidelines. In order to address these issues, this study looks into the attitude around monkeypox on social media in order to pinpoint public worries, counter false information, and enhance communication tactics. The study intends to improve public comprehension, offer practical insights, and help health authorities manage the outbreak by fusing graph theory with AI-driven sentiment analysis. In order to facilitate semantic analysis of tweets through structured information extraction, graph theory is used to organize unstructured or semi-structured data by creating meaningful links between entities. Furthermore, opinions on monkeypox infection in social media are analyzed and user sentiments are detected using a reinforcement Markov decision process. According to experimental results, the suggested model's accuracy on the Monkeypox tweet dataset was 98 %. These results help raise awareness of monkeypox among the general population and promote an educated and robust social response.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 4","pages":"Article 100202"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145871528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BenchCouncil Transactions on Benchmarks, Standards and Evaluations
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1