BenchCouncil Transactions on Benchmarks, Standards and Evaluations最新文献

Evaluation of mechanical properties of natural fiber based polymer composite 天然纤维基聚合物复合材料机械性能评估

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-09-01 DOI: 10.1016/j.tbench.2024.100183

Tarikur Jaman Pramanik , Md. Rafiquzzaman , Anup Karmakar , Marzan Hasan Nayeem , S M Kalbin Salim Turjo , Md. Ragib Abid

Natural fiber based polymer composites are eco-friendly alternatives to synthetic materials, with greater mechanical properties, biodegradability, availability, ease of access, and affordability. Jute fiber is widely recognized as one of the most important and beneficial natural fibers due to its strength, durability, and biodegradability. In this study, the jute composite is designed and fabricated using a 5-layer jute and epoxy resin, utilizing the manual hand lay-up technique. The combination of 52.5 % jute and 47.5 % of epoxy resin and harder is found optimized to achieve the goals of improving the tensile strength and flexural strength, reducing the cost of epoxy resin, and promoting eco-friendliness and sustainability. Tensile testing was performed on a universal testing machine, while flexural testing was done with a three-point bending test. Experimentally, the composites reinforced with jute and epoxy resin were capable of achieving the required levels of tensile strength (42.91 MPa) and bending strength (69.30 MPa). To validate and visualize specimens, numerical analysis was performed on the ABAQUS simulation software. The numerical simulation utilized ASTM D3039 and ASTM D7264 as the specified requirements for tensile and flexural behavior. For validation, these tensile and flexural test results were then numerically analyzed and compared to the experimental data. Finally, composite design, fabrication, and optimization can improve mechanical properties, reduce composite weight, lower resin cost, and increase sustainability. The proposed design and composition can be implemented to achieve lightweight properties in various applications, such as car components, door handle sheets, bicycle seat backs, and luggage covers.

基于天然纤维的聚合物复合材料是合成材料的生态友好型替代品，具有更高的机械性能、生物可降解性、可用性、易获取性和经济性。黄麻纤维因其强度、耐久性和生物可降解性，被公认为最重要、最有益的天然纤维之一。在这项研究中，我们利用手工铺层技术，设计并制造了一种由 5 层黄麻和环氧树脂组成的黄麻复合材料。研究发现，52.5% 的黄麻和 47.5%的环氧树脂以及较硬的黄麻的优化组合可实现提高拉伸强度和抗弯强度、降低环氧树脂成本以及促进生态友好性和可持续性的目标。拉伸测试在万能试验机上进行，而弯曲测试则采用三点弯曲试验。实验结果表明，用黄麻和环氧树脂增强的复合材料能够达到所需的拉伸强度（42.91 兆帕）和弯曲强度（69.30 兆帕）。为了验证和观察试样，使用 ABAQUS 仿真软件进行了数值分析。数值模拟采用 ASTM D3039 和 ASTM D7264 作为拉伸和弯曲行为的指定要求。然后，对这些拉伸和弯曲测试结果进行数值分析，并与实验数据进行比较，以进行验证。最后，复合材料的设计、制造和优化可以改善机械性能、减轻复合材料重量、降低树脂成本并提高可持续性。所建议的设计和组成可在汽车部件、门把手板、自行车椅背和行李箱盖等各种应用中实现轻质特性。

{"title":"Evaluation of mechanical properties of natural fiber based polymer composite","authors":"Tarikur Jaman Pramanik , Md. Rafiquzzaman , Anup Karmakar , Marzan Hasan Nayeem , S M Kalbin Salim Turjo , Md. Ragib Abid","doi":"10.1016/j.tbench.2024.100183","DOIUrl":"10.1016/j.tbench.2024.100183","url":null,"abstract":"<div><div>Natural fiber based polymer composites are eco-friendly alternatives to synthetic materials, with greater mechanical properties, biodegradability, availability, ease of access, and affordability. Jute fiber is widely recognized as one of the most important and beneficial natural fibers due to its strength, durability, and biodegradability. In this study, the jute composite is designed and fabricated using a 5-layer jute and epoxy resin, utilizing the manual hand lay-up technique. The combination of 52.5 % jute and 47.5 % of epoxy resin and harder is found optimized to achieve the goals of improving the tensile strength and flexural strength, reducing the cost of epoxy resin, and promoting eco-friendliness and sustainability. Tensile testing was performed on a universal testing machine, while flexural testing was done with a three-point bending test. Experimentally, the composites reinforced with jute and epoxy resin were capable of achieving the required levels of tensile strength (42.91 MPa) and bending strength (69.30 MPa). To validate and visualize specimens, numerical analysis was performed on the ABAQUS simulation software. The numerical simulation utilized ASTM D3039 and ASTM D7264 as the specified requirements for tensile and flexural behavior. For validation, these tensile and flexural test results were then numerically analyzed and compared to the experimental data. Finally, composite design, fabrication, and optimization can improve mechanical properties, reduce composite weight, lower resin cost, and increase sustainability. The proposed design and composition can be implemented to achieve lightweight properties in various applications, such as car components, door handle sheets, bicycle seat backs, and luggage covers.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100183"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142418440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Could bibliometrics reveal top science and technology achievements and researchers? The case for evaluatology-based science and technology evaluation 文献计量学能否揭示顶尖科技成果和研究人员？基于评价学的科技评价案例

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-09-01 DOI: 10.1016/j.tbench.2024.100182

Guoxin Kang , Wanling Gao , Lei Wang , Chunjie Luo , Hainan Ye , Qian He , Shaopeng Dai , Jianfeng Zhan

By utilizing statistical methods to analyze bibliographic data, bibliometrics faces inherent limitations in identifying the most significant science and technology achievements and researchers. To overcome this challenge, we present an evaluatology-based science and technology evaluation methodology. At the heart of this approach lies the concept of an extended evaluation condition (EC), encompassing nine crucial components derived from a field. We define four relationships that illustrate the connections among various achievements based on their mapped extended EC components, as well as their temporal and citation links. Within a relationship under an extended EC, evaluators can effectively compare these achievements by carefully addressing the influence of confounding variables. We establish a real-world evaluation system encompassing an entire collection of achievements, each of which is mapped to several components of an extended EC. Within a specific field like chip technology or open source, we construct a perfect evaluation model that can accurately trace the evolution and development of all achievements in terms of four relationships based on the real-world evaluation system. Building upon the foundation of the perfect evaluation model, we put forth four-round rules to eliminate non-significant achievements by utilizing four relationships. This process allows us to establish a pragmatic evaluation model that effectively captures the essential achievements, serving as a curated collection of the top N achievements within a specific field during a specific timeframe. We present a case study on the top 100 Chip achievements to demonstrate the effectiveness of our science and technology evaluatology. The case study highlights its practical application and efficacy in identifying significant achievements and researchers that otherwise cannot be identified by using bibliometrics.

通过利用统计方法分析书目数据，文献计量学在确定最重要的科技成果和研究人员方面面临固有的局限性。为了克服这一挑战，我们提出了一种基于评价学的科技评价方法。这种方法的核心是扩展评价条件（EC）的概念，包括从一个领域衍生出来的九个关键要素。我们定义了四种关系，根据其映射的扩展 EC 要素及其时间和引文联系来说明各种成果之间的联系。在扩展 EC 下的关系中，评估人员可以通过仔细处理混杂变量的影响，有效地比较这些成就。我们建立了一个真实世界的评估系统，其中包含一整套成就，每项成就都映射到扩展 EC 的若干组成部分。在芯片技术或开源技术等特定领域内，我们构建了一个完美的评价模型，该模型可以在真实世界评价体系的基础上，根据四种关系准确追踪所有成果的演变和发展。在完美评价模型的基础上，我们提出了四轮规则，利用四种关系剔除不重要的成果。通过这一过程，我们建立了一个务实的评价模型，有效地捕捉到了基本成就，成为特定时间段内特定领域内 N 项顶级成就的精选集。我们介绍了一个关于前 100 项芯片成就的案例研究，以展示我们的科技评估方法的有效性。该案例研究强调了其在识别重要成果和研究人员方面的实际应用和功效，而这些成果和研究人员是无法通过文献计量学识别的。

{"title":"Could bibliometrics reveal top science and technology achievements and researchers? The case for evaluatology-based science and technology evaluation","authors":"Guoxin Kang , Wanling Gao , Lei Wang , Chunjie Luo , Hainan Ye , Qian He , Shaopeng Dai , Jianfeng Zhan","doi":"10.1016/j.tbench.2024.100182","DOIUrl":"10.1016/j.tbench.2024.100182","url":null,"abstract":"<div><div>By utilizing statistical methods to analyze bibliographic data, bibliometrics faces inherent limitations in identifying the most significant science and technology achievements and researchers. To overcome this challenge, we present an evaluatology-based science and technology evaluation methodology. At the heart of this approach lies the concept of an extended evaluation condition (EC), encompassing nine crucial components derived from a field. We define four relationships that illustrate the connections among various achievements based on their mapped extended EC components, as well as their temporal and citation links. Within a relationship under an extended EC, evaluators can effectively compare these achievements by carefully addressing the influence of confounding variables. We establish a real-world evaluation system encompassing an entire collection of achievements, each of which is mapped to several components of an extended EC. Within a specific field like chip technology or open source, we construct a perfect evaluation model that can accurately trace the evolution and development of all achievements in terms of four relationships based on the real-world evaluation system. Building upon the foundation of the perfect evaluation model, we put forth four-round rules to eliminate non-significant achievements by utilizing four relationships. This process allows us to establish a pragmatic evaluation model that effectively captures the essential achievements, serving as a curated collection of the top N achievements within a specific field during a specific timeframe. We present a case study on the top 100 Chip achievements to demonstrate the effectiveness of our science and technology evaluatology. The case study highlights its practical application and efficacy in identifying significant achievements and researchers that otherwise cannot be identified by using bibliometrics.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100182"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced deep learning based decision support system for kidney tumour detection 基于深度学习的肾脏肿瘤检测增强型决策支持系统

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-06-01 DOI: 10.1016/j.tbench.2024.100174

Taha ETEM , Mustafa TEKE

This study presents a high-accuracy deep learning-based decision support system for kidney cancer detection. The research utilizes a relatively large dataset of 10,000 CT images, including both healthy and tumour-detected kidney scans. After data preprocessing and optimization, various deep learning models were evaluated, with DenseNet-201 emerging as the top performer, achieving an accuracy of 99.75 %. The study compares multiple deep learning architectures, including AlexNet, EfficientNet, Darknet-53, Xception, and DenseNet-201, across different learning rates. Performance metrics such as accuracy, precision, sensitivity, F1-score, and specificity are analysed using confusion matrices. The proposed system outperforms different deep learning networks, demonstrating superior accuracy in kidney cancer detection. The improvement is attributed to effective data engineering and hyperparameter optimization of the deep learning networks. This research contributes to the field of medical image analysis by providing a robust decision support tool for early and rapid diagnosis of kidney cancer. The high accuracy and efficiency of the proposed system make it a promising aid for healthcare professionals in clinical settings.

本研究提出了一种基于深度学习的高精度肾癌检测决策支持系统。研究利用了一个包含 10,000 张 CT 图像的相对较大的数据集，其中包括健康肾脏扫描图像和检测到肿瘤的肾脏扫描图像。经过数据预处理和优化后，对各种深度学习模型进行了评估，其中 DenseNet-201 表现最佳，准确率达到 99.75%。该研究比较了不同学习率下的多种深度学习架构，包括 AlexNet、EfficientNet、Darknet-53、Xception 和 DenseNet-201。使用混淆矩阵分析了准确度、精确度、灵敏度、F1-分数和特异性等性能指标。所提出的系统优于不同的深度学习网络，在肾癌检测方面表现出更高的准确性。这一改进归功于有效的数据工程和深度学习网络的超参数优化。这项研究为肾癌的早期快速诊断提供了强大的决策支持工具，从而为医学图像分析领域做出了贡献。所提议系统的高准确性和高效率使其成为临床环境中医护人员的理想助手。

{"title":"Enhanced deep learning based decision support system for kidney tumour detection","authors":"Taha ETEM , Mustafa TEKE","doi":"10.1016/j.tbench.2024.100174","DOIUrl":"10.1016/j.tbench.2024.100174","url":null,"abstract":"<div><p>This study presents a high-accuracy deep learning-based decision support system for kidney cancer detection. The research utilizes a relatively large dataset of 10,000 CT images, including both healthy and tumour-detected kidney scans. After data preprocessing and optimization, various deep learning models were evaluated, with DenseNet-201 emerging as the top performer, achieving an accuracy of 99.75 %. The study compares multiple deep learning architectures, including AlexNet, EfficientNet, Darknet-53, Xception, and DenseNet-201, across different learning rates. Performance metrics such as accuracy, precision, sensitivity, F1-score, and specificity are analysed using confusion matrices. The proposed system outperforms different deep learning networks, demonstrating superior accuracy in kidney cancer detection. The improvement is attributed to effective data engineering and hyperparameter optimization of the deep learning networks. This research contributes to the field of medical image analysis by providing a robust decision support tool for early and rapid diagnosis of kidney cancer. The high accuracy and efficiency of the proposed system make it a promising aid for healthcare professionals in clinical settings.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 2","pages":"Article 100174"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000267/pdfft?md5=1e6e92b87d485e865811a8bedeb30bc4&pid=1-s2.0-S2772485924000267-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analyzing the impact of opportunistic maintenance optimization on manufacturing industries in Bangladesh: An empirical study 分析机会主义维护优化对孟加拉国制造业的影响：实证研究

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-06-01 DOI: 10.1016/j.tbench.2024.100172

Md. Ariful Alam , Md. Rafiquzzaman , Md. Hasan Ali , Gazi Faysal Jubayer

The study investigates the impact of opportunistic maintenance (OM) optimization on manufacturing industries, especially in Bangladesh, to reduce maintenance costs. To that end, OM strategies have been proposed and optimized for multi-unit manufacturing systems, whereas most of the existing research is for single- or two-unit systems. OM strategies in this research cover one of the three policies: preventive replacement, preventive repair, and a two-level maintenance approach. The proposed two-level maintenance approach is a combination of lower-level maintenance, known as preventive repair, and higher-level maintenance, known as preventive replacement. Simulation optimization (SO) techniques using Python were utilized to evaluate the strategies. Historical data from two of Bangladesh's most promising and significant sectors, the footwear and railway industries, was used as the case study. Compared to the currently utilized corrective maintenance approach, the two-level maintenance approach is the most effective for both case studies, demonstrating cost savings of 16.9 % and 22.4 % for the footwear and railway industries, respectively. This study reveals that manufacturing industries can achieve significant cost savings by implementing the proposed OM strategies, a concept that has yet to be explored in developing countries like Bangladesh. However, the study considered the proposed approaches for major components of the system, and more significant benefits can be achieved if it is possible to apply them to all critical components of the system.

本研究探讨了机会主义维护（OM）优化对制造业，尤其是孟加拉国制造业降低维护成本的影响。为此，针对多单元制造系统提出并优化了 OM 策略，而现有研究大多针对单单元或双单元系统。本研究中的 OM 策略包括三种策略中的一种：预防性更换、预防性维修和两级维护方法。所提出的两级维护方法是低级维护（即预防性维修）和高级维护（即预防性更换）的结合。使用 Python 的模拟优化 (SO) 技术对这些策略进行了评估。案例研究使用了孟加拉国最有前途的两个重要行业--制鞋业和铁路业的历史数据。与目前使用的纠正性维护方法相比，两级维护方法在两个案例研究中都是最有效的，分别为制鞋业和铁路业节省了 16.9% 和 22.4% 的成本。这项研究表明，制造业可以通过实施建议的 OM 战略来大幅节约成本，而这一概念在孟加拉国等发展中国家尚待探索。不过，本研究考虑的是针对系统主要组件提出的方法，如果有可能将这些方法应用于系统的所有关键组件，则可实现更显著的效益。

{"title":"Analyzing the impact of opportunistic maintenance optimization on manufacturing industries in Bangladesh: An empirical study","authors":"Md. Ariful Alam , Md. Rafiquzzaman , Md. Hasan Ali , Gazi Faysal Jubayer","doi":"10.1016/j.tbench.2024.100172","DOIUrl":"10.1016/j.tbench.2024.100172","url":null,"abstract":"<div><p>The study investigates the impact of opportunistic maintenance (OM) optimization on manufacturing industries, especially in Bangladesh, to reduce maintenance costs. To that end, OM strategies have been proposed and optimized for multi-unit manufacturing systems, whereas most of the existing research is for single- or two-unit systems. OM strategies in this research cover one of the three policies: preventive replacement, preventive repair, and a two-level maintenance approach. The proposed two-level maintenance approach is a combination of lower-level maintenance, known as preventive repair, and higher-level maintenance, known as preventive replacement. Simulation optimization (SO) techniques using Python were utilized to evaluate the strategies. Historical data from two of Bangladesh's most promising and significant sectors, the footwear and railway industries, was used as the case study. Compared to the currently utilized corrective maintenance approach, the two-level maintenance approach is the most effective for both case studies, demonstrating cost savings of 16.9 % and 22.4 % for the footwear and railway industries, respectively. This study reveals that manufacturing industries can achieve significant cost savings by implementing the proposed OM strategies, a concept that has yet to be explored in developing countries like Bangladesh. However, the study considered the proposed approaches for major components of the system, and more significant benefits can be achieved if it is possible to apply them to all critical components of the system.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 2","pages":"Article 100172"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000243/pdfft?md5=1b77ff7ad4966e3ee27415efaf6f7e80&pid=1-s2.0-S2772485924000243-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A short summary of evaluatology: The science and engineering of evaluation 评价学简述：评价的科学与工程

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-06-01 DOI: 10.1016/j.tbench.2024.100175

Jianfeng Zhan

Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. The science of evaluation addresses the fundamental question: ”Does any evaluation outcome possess a true value?” The engineering of evaluation tackles the challenge of minimizing costs while satisfying the evaluation requirements of stakeholders. To address the above challenges, we propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.

This is a short summary of Evaluatology (Zhan et al., 2024). The objective of this revised version is to alleviate the readers’ burden caused by the length of the original text. Compared to the original version (Zhan et al., 2024), this revised edition clarifies various concepts like evaluation systems and conditions and streamlines the concept system by eliminating the evaluation model concept. It rectifies errors, rephrases fundamental evaluation issues, and incorporates a case study on CPU evaluation (Wang et al., 2024). For a more comprehensive understanding, please refer to the original article (Zhan et al., 2024). If you wish to cite this work, kindly cite the original article.

Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang (2024). Evaluatology: The science and engineering of evaluation. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 4(1), 100162.

评价是人类生存的一个重要方面，在各个领域都发挥着至关重要的作用。然而，人们往往以经验主义和临时性的方式来对待它，对普遍的概念、术语、理论和方法缺乏共识。这种缺乏共识的现象造成了严重后果。本文旨在正式介绍评价学这一学科，它包括评价的科学和工程。评价科学要解决的基本问题是"任何评价结果是否具有真正的价值？评价工程学解决的挑战是在满足利益相关者评价要求的同时最大限度地降低成本。为了应对上述挑战，我们提出了一个通用的评价框架，其中包括概念、术语、理论和方法，即使不能应用于所有学科，也可以应用于各个学科。这是《评价学》（Zhan 等，2024 年）的简短摘要。本修订版旨在减轻原文篇幅过长给读者带来的负担。与原版（Zhan et al.，2024）相比，修订版明确了评价体系、评价条件等多个概念，取消了评价模型概念，简化了概念体系。它纠正了错误，重新表述了基本的评价问题，并纳入了关于 CPU 评价的案例研究（Wang 等，2024 年）。如需更全面的了解，请参阅原文（Zhan et al.）如需引用本作品，请注明原文出处。詹剑锋、王磊、高婉玲、李红晓、王晨曦、黄云友、李亚涛、杨正新、康国新、罗春杰、叶海南、戴少鹏、张志飞（2024）。评价学：评估的科学与工程。BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 4(1), 100162.

{"title":"A short summary of evaluatology: The science and engineering of evaluation","authors":"Jianfeng Zhan","doi":"10.1016/j.tbench.2024.100175","DOIUrl":"10.1016/j.tbench.2024.100175","url":null,"abstract":"<div><div>Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. The science of evaluation addresses the fundamental question: ”Does any evaluation outcome possess a true value?” The engineering of evaluation tackles the challenge of minimizing costs while satisfying the evaluation requirements of stakeholders. To address the above challenges, we propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.</div><div>This is a short summary of Evaluatology (Zhan et al., 2024). The objective of this revised version is to alleviate the readers’ burden caused by the length of the original text. Compared to the original version (Zhan et al., 2024), this revised edition clarifies various concepts like evaluation systems and conditions and streamlines the concept system by eliminating the evaluation model concept. It rectifies errors, rephrases fundamental evaluation issues, and incorporates a case study on CPU evaluation (Wang et al., 2024). For a more comprehensive understanding, please refer to the original article (Zhan et al., 2024). If you wish to cite this work, kindly cite the original article.</div><div><em>Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang (2024). Evaluatology: The science and engineering of evaluation. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 4(1), 100162.</em></div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 2","pages":"Article 100175"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142422338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques BinCodex：用于评估二进制代码相似性检测技术的多层次综合数据集

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-06-01 DOI: 10.1016/j.tbench.2024.100163

Peihua Zhang , Chenggang Wu , Zhe Wang

The binary code similarity detection (BCSD) technique can quantitatively measure the differences between two given binaries and give matching results at predefined granularity (e.g., function), and has been widely used in multiple scenarios including software vulnerability search, security patch analysis, malware detection, code clone detection, etc. With the help of deep learning, the BCSD techniques have achieved high accuracy in their evaluation. However, on the one hand, their high accuracy has become indistinguishable due to the lack of a standard dataset, thus being unable to reveal their abilities. On the other hand, since binary code can be easily changed, it is essential to gain a holistic understanding of the underlying transformations including default optimization options, non-default optimization options, and commonly used code obfuscations, thus assessing their impact on the accuracy and adaptability of the BCSD technique. This paper presents our observations regarding the diversity of BCSD datasets and proposes a comprehensive dataset for the BCSD technique. We employ and present detailed evaluation results of various BCSD works, applying different classifications for different types of BCSD tasks, including pure function pairing and vulnerable code detection. Our results show that most BCSD works are capable of adopting default compiler options but are unsatisfactory when facing non-default compiler options and code obfuscation. We take a layered perspective on the BCSD task and point to opportunities for future optimizations in the technologies we consider.

二进制代码相似性检测（BCSD）技术可以定量测量两个给定二进制文件之间的差异，并给出预定粒度（如函数）的匹配结果，已被广泛应用于软件漏洞搜索、安全补丁分析、恶意软件检测、代码克隆检测等多个场景。在深度学习的帮助下，BCSD 技术在评估中取得了较高的准确率。然而，一方面，由于缺乏标准数据集，其高精度变得难以区分，从而无法展现其能力。另一方面，由于二进制代码很容易更改，因此有必要全面了解底层转换，包括默认优化选项、非默认优化选项和常用代码混淆，从而评估它们对 BCSD 技术准确性和适应性的影响。本文介绍了我们对 BCSD 数据集多样性的观察，并为 BCSD 技术提出了一个综合数据集。我们针对不同类型的 BCSD 任务（包括纯函数配对和漏洞代码检测）采用了不同的分类方法，并介绍了各种 BCSD 作品的详细评估结果。我们的结果表明，大多数 BCSD 作品都能采用默认编译器选项，但在面对非默认编译器选项和代码混淆时却不能令人满意。我们从分层的角度来看待 BCSD 任务，并指出了我们所考虑的技术在未来的优化机会。

{"title":"BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques","authors":"Peihua Zhang , Chenggang Wu , Zhe Wang","doi":"10.1016/j.tbench.2024.100163","DOIUrl":"https://doi.org/10.1016/j.tbench.2024.100163","url":null,"abstract":"<div><p>The binary code similarity detection (BCSD) technique can quantitatively measure the differences between two given binaries and give matching results at predefined granularity (e.g., function), and has been widely used in multiple scenarios including software vulnerability search, security patch analysis, malware detection, code clone detection, etc. With the help of deep learning, the BCSD techniques have achieved high accuracy in their evaluation. However, on the one hand, their high accuracy has become indistinguishable due to the lack of a standard dataset, thus being unable to reveal their abilities. On the other hand, since binary code can be easily changed, it is essential to gain a holistic understanding of the underlying transformations including default optimization options, non-default optimization options, and commonly used code obfuscations, thus assessing their impact on the accuracy and adaptability of the BCSD technique. This paper presents our observations regarding the diversity of BCSD datasets and proposes a comprehensive dataset for the BCSD technique. We employ and present detailed evaluation results of various BCSD works, applying different classifications for different types of BCSD tasks, including pure function pairing and vulnerable code detection. Our results show that most BCSD works are capable of adopting default compiler options but are unsatisfactory when facing non-default compiler options and code obfuscation. We take a layered perspective on the BCSD task and point to opportunities for future optimizations in the technologies we consider.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 2","pages":"Article 100163"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000152/pdfft?md5=e14058fa183420c2a27c98650ad7e993&pid=1-s2.0-S2772485924000152-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141240102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TensorTable: Extending PyTorch for mixed relational and linear algebra pipelines TensorTable：为混合关系和线性代数管道扩展 PyTorch

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-03-01 DOI: 10.1016/j.tbench.2024.100161

Xu Wen

The mixed relational algebra (RA) and linear algebra (LA) pipelines have become increasingly common in recent years. However, contemporary widely used frameworks struggle to support both RA and LA operators effectively, failing to ensure optimal end-to-end performance due to the cost of LA operators and data conversion. This underscores the demand for a system capable of seamlessly integrating RA and LA while delivering robust end-to-end performance. This paper proposes TensorTable, a tensor system that extends PyTorch to enable mixed RA and LA pipelines. We propose TensorTable as the unified data representation, storing data in a tensor format to prioritize the performance of LA operators and reduce data conversion costs. Relational tables from RA, as well as vectors, matrices, and tensors from LA, can be seamlessly converted into TensorTables. Additionally, we provide TensorTable-based implementations for RA operators and build a system that supports mixed LA and RA pipelines. We implement TensorTable on top of PyTorch, achieving comparable performance for both RA and LA operators, particularly on small datasets. TensorTable achieves a 1.15x-5.63x speedup for mixed pipelines, compared with state-of-the-art frameworks—AIDA and RMA.

近年来，混合关系代数（RA）和线性代数（LA）管道越来越常见。然而，由于线性代数运算符和数据转换的成本问题，当代广泛使用的框架难以同时有效支持关系代数和线性代数运算符，无法确保最佳的端到端性能。这就凸显了对能够无缝集成 RA 和 LA 并提供强大端到端性能的系统的需求。本文提出的张量系统 TensorTable 对 PyTorch 进行了扩展，以实现 RA 和 LA 混合管道。我们建议将 TensorTable 作为统一的数据表示方式，以张量格式存储数据，从而优先考虑 LA 运算符的性能并降低数据转换成本。来自 RA 的关系表，以及来自 LA 的向量、矩阵和张量，都可以无缝转换成 TensorTable。此外，我们还为 RA 运算符提供了基于 TensorTable 的实现，并构建了一个支持 LA 和 RA 混合管道的系统。我们在 PyTorch 的基础上实现了 TensorTable，为 RA 和 LA 运算符实现了相当的性能，尤其是在小型数据集上。与最先进的框架--AIDA 和 RMA 相比，TensorTable 的混合管道速度提高了 1.15-5.63 倍。

{"title":"TensorTable: Extending PyTorch for mixed relational and linear algebra pipelines","authors":"Xu Wen","doi":"10.1016/j.tbench.2024.100161","DOIUrl":"10.1016/j.tbench.2024.100161","url":null,"abstract":"<div><p>The mixed relational algebra (RA) and linear algebra (LA) pipelines have become increasingly common in recent years. However, contemporary widely used frameworks struggle to support both RA and LA operators effectively, failing to ensure optimal end-to-end performance due to the cost of LA operators and data conversion. This underscores the demand for a system capable of seamlessly integrating RA and LA while delivering robust end-to-end performance. This paper proposes TensorTable, a tensor system that extends PyTorch to enable mixed RA and LA pipelines. We propose TensorTable as the unified data representation, storing data in a tensor format to prioritize the performance of LA operators and reduce data conversion costs. Relational tables from RA, as well as vectors, matrices, and tensors from LA, can be seamlessly converted into TensorTables. Additionally, we provide TensorTable-based implementations for RA operators and build a system that supports mixed LA and RA pipelines. We implement TensorTable on top of PyTorch, achieving comparable performance for both RA and LA operators, particularly on small datasets. TensorTable achieves a 1.15x-5.63x speedup for mixed pipelines, compared with state-of-the-art frameworks—AIDA and RMA.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 1","pages":"Article 100161"},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000139/pdfft?md5=159d30f36fa85195e487f7a07663be37&pid=1-s2.0-S2772485924000139-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140090009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluatology: The science and engineering of evaluation 评价学：评估科学与工程

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-03-01 DOI: 10.1016/j.tbench.2024.100162

Jianfeng Zhan , Lei Wang , Wanling Gao , Hongxiao Li , Chenxi Wang , Yunyou Huang , Yatao Li , Zhengxin Yang , Guoxin Kang , Chunjie Luo , Hainan Ye , Shaopeng Dai , Zhifei Zhang

Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.

Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to individuals or systems under scrutiny, which we refer to as the subjects. This process allows for the creation of an evaluation system or model. By measuring and/or testing this evaluation system or model, we can infer the impact of different subjects. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.

评价是人类生存的一个重要方面，在各个领域都发挥着至关重要的作用。然而，人们往往以经验主义和临时性的方式来对待它，对普遍的概念、术语、理论和方法缺乏共识。这种缺乏共识的现象造成了严重后果。本文旨在正式介绍评价学这一学科，它包括评价的科学和工程。我们提出了一个通用的评价框架，其中包含的概念、术语、理论和方法即使不能适用于所有学科，也可以适用于各个学科。我们的研究揭示了评价的本质在于进行实验，有意识地对被审查的个人或系统（我们称之为被试）施加一个定义明确的评价条件。通过这一过程，可以创建一个评价系统或模型。通过测量和/或测试这个评价系统或模型，我们可以推断出不同主体的影响。从评价的本质出发，我们提出了五个公理，作为评价的基础理论，这些公理集中在评价结果的关键方面。这些公理是我们建立通用评价理论和方法的基石。在评价单一科目时，关键是要创造不同等效水平的评价条件。通过将这些条件应用于不同的主题，我们可以建立参考评价模型。通过这些模型，我们可以一次改变一个独立变量，同时保留所有其他变量作为对照。在对复杂的情况进行评估时，关键在于建立一系列能够保持反向性的评估模型。在评估科学的基础上，我们提出了基准的正式定义，即保证不同等效水平的简化和抽样评估条件。这一概念是基于基准的通用工程评估方法的基石，适用于各个学科，我们称之为基准学。

{"title":"Evaluatology: The science and engineering of evaluation","authors":"Jianfeng Zhan , Lei Wang , Wanling Gao , Hongxiao Li , Chenxi Wang , Yunyou Huang , Yatao Li , Zhengxin Yang , Guoxin Kang , Chunjie Luo , Hainan Ye , Shaopeng Dai , Zhifei Zhang","doi":"10.1016/j.tbench.2024.100162","DOIUrl":"https://doi.org/10.1016/j.tbench.2024.100162","url":null,"abstract":"<div><p>Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.</p><p>Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to individuals or systems under scrutiny, which we refer to as the <em>subjects</em>. This process allows for the creation of an evaluation system or model. By measuring and/or testing this evaluation system or model, we can infer the impact of different subjects. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 1","pages":"Article 100162"},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000140/pdfft?md5=31c7470bd845fb50d0580585f84133b4&pid=1-s2.0-S2772485924000140-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An approach to workload generation for modern data centers: A view from Alibaba trace 现代数据中心工作负载生成方法：来自阿里巴巴的观点

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-03-01 DOI: 10.1016/j.tbench.2024.100164

Yi Liang , Nianyi Ruan , Lan Yi , Xing Su

Modern data centers provide the foundational infrastructure of cloud computing. Workload generation, which involves simulating or constructing tasks and transactions to replicate the actual resource usage patterns of real-world systems or applications, plays essential role for efficient resource management in these centers. Data center traces, rich in information about workload execution and resource utilization, are thus ideal data for workload generation. Traditional traces provide detailed temporal resource usage data to enable fine-grained workload generation. However, modern data centers tend to favor tracing statistical metrics to reduce overhead. Therefore the accurate reconstruction of temporal resource consumption without detailed, temporized trace information become a major challenge for trace-based workload generation. To address this challenge, we propose STWGEN, a novel method that leverages statistical trace data for workload generation. STWGEN is specifically designed to generate the batch task workloads based on Alibaba trace. STWGEN contains two key components: a suite of C program-based flexible workload building blocks and a heuristic strategy to assemble building blocks for workload generation. Both components are carefully designed to reproduce synthetic batch tasks that closely replicate the observed resource usage patterns in a representative data center. Experimental results demonstrate that STWGEN outperforms state-of-the-art workload generation methods as it emulates workload-level and machine-level resource usage in much higher accuracy.

现代数据中心是云计算的基础架构。工作负载生成涉及模拟或构建任务和事务，以复制现实世界中系统或应用的实际资源使用模式，对这些中心的高效资源管理起着至关重要的作用。因此，数据中心跟踪信息中含有丰富的工作负载执行和资源利用信息，是工作负载生成的理想数据。传统的跟踪可提供详细的时间资源使用数据，从而实现细粒度的工作负载生成。然而，现代数据中心倾向于采用跟踪统计指标来减少开销。因此，在没有详细的时间化跟踪信息的情况下，如何准确重建时间资源消耗成为基于跟踪的工作负载生成所面临的一大挑战。为了应对这一挑战，我们提出了 STWGEN，一种利用统计跟踪数据生成工作负载的新方法。STWGEN 专为生成基于阿里巴巴跟踪的批处理任务工作负载而设计。STWGEN 包含两个关键组件：一套基于 C 程序的灵活工作负载构建模块和一种启发式策略，用于组合构建模块以生成工作负载。这两个组件都经过精心设计，用于重现合成批处理任务，这些任务与在代表性数据中心观察到的资源使用模式密切相关。实验结果表明，STWGEN 超越了最先进的工作负载生成方法，因为它能更准确地模拟工作负载级和机器级资源使用情况。

{"title":"An approach to workload generation for modern data centers: A view from Alibaba trace","authors":"Yi Liang , Nianyi Ruan , Lan Yi , Xing Su","doi":"10.1016/j.tbench.2024.100164","DOIUrl":"https://doi.org/10.1016/j.tbench.2024.100164","url":null,"abstract":"<div><p>Modern data centers provide the foundational infrastructure of cloud computing. Workload generation, which involves simulating or constructing tasks and transactions to replicate the actual resource usage patterns of real-world systems or applications, plays essential role for efficient resource management in these centers. Data center traces, rich in information about workload execution and resource utilization, are thus ideal data for workload generation. Traditional traces provide detailed temporal resource usage data to enable fine-grained workload generation. However, modern data centers tend to favor tracing statistical metrics to reduce overhead. Therefore the accurate reconstruction of temporal resource consumption without detailed, temporized trace information become a major challenge for trace-based workload generation. To address this challenge, we propose STWGEN, a novel method that leverages statistical trace data for workload generation. STWGEN is specifically designed to generate the batch task workloads based on Alibaba trace. STWGEN contains two key components: a suite of C program-based flexible workload building blocks and a heuristic strategy to assemble building blocks for workload generation. Both components are carefully designed to reproduce synthetic batch tasks that closely replicate the observed resource usage patterns in a representative data center. Experimental results demonstrate that STWGEN outperforms state-of-the-art workload generation methods as it emulates workload-level and machine-level resource usage in much higher accuracy.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 1","pages":"Article 100164"},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000164/pdfft?md5=dc97b50be70f18c4e64b66906a378a03&pid=1-s2.0-S2772485924000164-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141095886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking ChatGPT for Prototyping Theories: Experimental Studies Using the Technology Acceptance Model 以 ChatGPT 为原型理论基准：使用技术接受模型的实验研究

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

Pub Date : 2024-02-01 DOI: 10.1016/j.tbench.2024.100153

Yanwu Yang, T. Goh, Xin Dai

引用次数: 0