Big Data and Cognitive Computing最新文献_第10页

Massive Parallel Alignment of RNA-seq Reads in Serverless Computing 无服务器计算中RNA-seq读取的大规模并行比对

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-15 DOI: 10.3390/bdcc7020098

Pietro Cinaglia, J. L. Vázquez-Poletti, M. Cannataro

In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated.

近年来，使用云基础设施进行数据处理已被证明是有用的，其计算潜力不受本地基础设施限制的影响。在这种情况下，无服务器计算由于其自动扩展方法、可靠性和容错性而成为增长最快的云服务模型。我们提出了一种基于内部Serverless基础设施的解决方案，该解决方案能够执行大规模RNA-seq数据分析，重点是将测序读数映射到参考基因组。主要贡献是将基因组数据的计算纳入无服务器计算，重点是RNA-seq读取到参考基因组的映射，因为这对一些管道来说是最耗时的任务。所提出的解决方案处理大量并行实例，以最大限度地提高运行时间的效率。我们通过进行两项主要测试来评估我们的解决方案的性能，这两项测试都基于RNA-seq读数与人类GRCh38的映射。与具有16个、8个和4个虚拟核心的本地环境相比，我们的实验分别减少了79.838%、90.0079%和96.382%。此外，还研究了无服务器限制。

{"title":"Massive Parallel Alignment of RNA-seq Reads in Serverless Computing","authors":"Pietro Cinaglia, J. L. Vázquez-Poletti, M. Cannataro","doi":"10.3390/bdcc7020098","DOIUrl":"https://doi.org/10.3390/bdcc7020098","url":null,"abstract":"In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49668088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review SQL和NoSQL数据库软件架构性能分析与评估——系统文献综述

Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-12 DOI: 10.3390/bdcc7020097

Wisal Khan, Teerath Kumar, Cheng Zhang, Kislay Raj, Arunabha M. Roy, Bin Luo

The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and can efficiently process large amounts of unstructured data. Organizational needs determine which paradigm is appropriate, yet selecting the best option is not always easy. Differences in database design are what set SQL and NoSQL databases apart. Each NoSQL database type also consistently employs a mixed-model approach. Therefore, it is challenging for cloud users to transfer their data among different cloud storage services (CSPs). There are several different paradigms being monitored by the various cloud platforms (IaaS, PaaS, SaaS, and DBaaS). The purpose of this SLR is to examine the articles that address cloud data portability and interoperability, as well as the software architectures of SQL and NoSQL databases. Numerous studies comparing the capabilities of SQL and NoSQL of databases, particularly Oracle RDBMS and NoSQL Document Database (MongoDB), in terms of scale, performance, availability, consistency, and sharding, were presented as part of the state of the art. Research indicates that NoSQL databases, with their specifically tailored structures, may be the best option for big data analytics, while SQL databases are best suited for online transaction processing (OLTP) purposes.

在SQL和NoSQL数据库处理大数据的艰巨任务中，良好的软件体系结构起着至关重要的作用。创建SQL数据库是为了组织数据并允许水平扩展。另一方面，NoSQL数据库支持水平可伸缩性，可以有效地处理大量非结构化数据。组织需要决定哪一种范式是合适的，但是选择最佳选项并不总是那么容易。数据库设计上的差异是SQL和NoSQL数据库的区别所在。每种NoSQL数据库类型也始终采用混合模型方法。因此，云用户在不同的云存储服务(csp)之间传输数据具有挑战性。各种云平台(IaaS、PaaS、SaaS和DBaaS)正在监视几种不同的范式。本SLR的目的是研究有关云数据可移植性和互操作性以及SQL和NoSQL数据库的软件体系结构的文章。在规模、性能、可用性、一致性和分片方面，对数据库(特别是Oracle RDBMS和NoSQL文档数据库(MongoDB))的SQL和NoSQL的能力进行了大量的比较研究。研究表明，具有专门定制结构的NoSQL数据库可能是大数据分析的最佳选择，而SQL数据库最适合在线事务处理(OLTP)目的。

{"title":"SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review","authors":"Wisal Khan, Teerath Kumar, Cheng Zhang, Kislay Raj, Arunabha M. Roy, Bin Luo","doi":"10.3390/bdcc7020097","DOIUrl":"https://doi.org/10.3390/bdcc7020097","url":null,"abstract":"The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and can efficiently process large amounts of unstructured data. Organizational needs determine which paradigm is appropriate, yet selecting the best option is not always easy. Differences in database design are what set SQL and NoSQL databases apart. Each NoSQL database type also consistently employs a mixed-model approach. Therefore, it is challenging for cloud users to transfer their data among different cloud storage services (CSPs). There are several different paradigms being monitored by the various cloud platforms (IaaS, PaaS, SaaS, and DBaaS). The purpose of this SLR is to examine the articles that address cloud data portability and interoperability, as well as the software architectures of SQL and NoSQL databases. Numerous studies comparing the capabilities of SQL and NoSQL of databases, particularly Oracle RDBMS and NoSQL Document Database (MongoDB), in terms of scale, performance, availability, consistency, and sharding, were presented as part of the state of the art. Research indicates that NoSQL databases, with their specifically tailored structures, may be the best option for big data analytics, while SQL databases are best suited for online transaction processing (OLTP) purposes.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135288679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Design Proposal for a Virtual Shopping Assistant for People with Vision Problems Applying Artificial Intelligence Techniques 应用人工智能技术为有视觉问题的人提供虚拟购物助理的设计方案

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-12 DOI: 10.3390/bdcc7020096

W. Villegas-Ch., Rodrigo Amores-Falconi, Eduardo Coronel-Silva

Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice assistant forms an intelligent system that can understand and respond to users’ voice commands. The design considers the visual limitations of the users, such as difficulty reading information on the screen or identifying images. The voice assistant provides detailed product descriptions and ideas in a clear, easy-to-understand voice. In addition, the voice assistant has a series of additional features to improve the shopping experience. For example, the assistant can provide product recommendations based on the user’s previous purchases and information about special promotions and discounts. The main goal of this design is to create an accessible and inclusive online shopping experience for the visually impaired. The voice assistant is based on a conversational user interface, allowing users to easily navigate an eCommerce website, search for products, and make purchases.

无障碍对于电子商务来说是一个越来越重要的话题，尤其是对于有视力问题的人来说。为了改善他们的在线体验，已经提出了语音助手的设计，以使这些人能够更快、更高效地在线浏览和购物。这个语音助手形成了一个智能系统，可以理解和响应用户的语音命令。该设计考虑了用户的视觉限制，例如难以阅读屏幕上的信息或识别图像。语音助手以清晰易懂的声音提供详细的产品描述和想法。此外，语音助手还有一系列额外功能，可以改善购物体验。例如，助理可以基于用户以前的购买以及关于特别促销和折扣的信息来提供产品推荐。该设计的主要目标是为视障人士创造一种无障碍、包容的在线购物体验。语音助手基于对话式用户界面，允许用户轻松浏览电子商务网站、搜索产品和进行购买。

{"title":"Design Proposal for a Virtual Shopping Assistant for People with Vision Problems Applying Artificial Intelligence Techniques","authors":"W. Villegas-Ch., Rodrigo Amores-Falconi, Eduardo Coronel-Silva","doi":"10.3390/bdcc7020096","DOIUrl":"https://doi.org/10.3390/bdcc7020096","url":null,"abstract":"Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice assistant forms an intelligent system that can understand and respond to users’ voice commands. The design considers the visual limitations of the users, such as difficulty reading information on the screen or identifying images. The voice assistant provides detailed product descriptions and ideas in a clear, easy-to-understand voice. In addition, the voice assistant has a series of additional features to improve the shopping experience. For example, the assistant can provide product recommendations based on the user’s previous purchases and information about special promotions and discounts. The main goal of this design is to create an accessible and inclusive online shopping experience for the visually impaired. The voice assistant is based on a conversational user interface, allowing users to easily navigate an eCommerce website, search for products, and make purchases.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46300721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Virtual Reality-Based Digital Twins: A Case Study on Pharmaceutical Cannabis 基于虚拟现实的数字孪生:药用大麻的案例研究

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-10 DOI: 10.3390/bdcc7020095

Orestis Spyrou, W. Hurst, C. Verdouw

Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality technologies), when coupled, are promising solutions to address the challenges of highly regulated crop production, namely the complexity of modern production environments for pharmaceutical cannabis, which are growing constantly as a result of legislative changes. Cannabis farms not only have to meet very high quality standards and regulatory requirements but also have to deal with high production and market uncertainties, including energy considerations. Thus, the main contributions of the research include an architecture design for eXtended-Reality-based Digital Twins for pharmaceutical cannabis production and a proof of concept, which was demonstrated at the Wageningen University Digital Twins conference. A convenience sampling method was used to recruit 30 participants who provided feedback on the application. The findings indicate that, despite 70% being unfamiliar with the concept, 80% of the participants were positive regarding the innovation and creativity.

数字双胞胎是现实生活中物体的数字等价物。它们允许生产商在出现（预期）偏差时立即采取行动，并根据真实数据模拟干预措施的效果。数字孪生和扩展现实技术（包括增强现实、混合现实和虚拟现实技术）相结合，是应对高度监管的作物生产挑战的有前景的解决方案，即现代药用大麻生产环境的复杂性，由于立法的变化，大麻生产环境不断增长。大麻农场不仅必须满足非常高的质量标准和监管要求，还必须应对高产量和市场的不确定性，包括能源方面的考虑。因此，该研究的主要贡献包括用于药用大麻生产的基于扩展现实的数字双胞胎的架构设计和概念验证，这在瓦赫宁根大学数字双胞胎会议上得到了演示。采用方便抽样的方法招募了30名对申请提供反馈的参与者。研究结果表明，尽管70%的参与者不熟悉这个概念，但80%的参与者对创新和创造力持积极态度。

{"title":"Virtual Reality-Based Digital Twins: A Case Study on Pharmaceutical Cannabis","authors":"Orestis Spyrou, W. Hurst, C. Verdouw","doi":"10.3390/bdcc7020095","DOIUrl":"https://doi.org/10.3390/bdcc7020095","url":null,"abstract":"Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality technologies), when coupled, are promising solutions to address the challenges of highly regulated crop production, namely the complexity of modern production environments for pharmaceutical cannabis, which are growing constantly as a result of legislative changes. Cannabis farms not only have to meet very high quality standards and regulatory requirements but also have to deal with high production and market uncertainties, including energy considerations. Thus, the main contributions of the research include an architecture design for eXtended-Reality-based Digital Twins for pharmaceutical cannabis production and a proof of concept, which was demonstrated at the Wageningen University Digital Twins conference. A convenience sampling method was used to recruit 30 participants who provided feedback on the application. The findings indicate that, despite 70% being unfamiliar with the concept, 80% of the participants were positive regarding the innovation and creativity.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48678556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Application of Artificial Intelligence for Fraudulent Banking Operations Recognition 人工智能在银行欺诈业务识别中的应用

Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-10 DOI: 10.3390/bdcc7020093

Bohdan Mytnyk, Oleksandr Tkachyk, Nataliya Shakhovska, Solomiia Fedushko, Yuriy Syerov

This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of many charitable funds that criminals can use to deceive users. The present work focuses on machine learning algorithms as a tool well suited for analyzing and recognizing online banking transactions. The study’s scientific novelty is the development of machine learning models for identifying fraudulent banking transactions and techniques for preprocessing bank data for further comparison and selection of the best results. This paper also details various methods for improving detection accuracy, i.e., handling highly imbalanced datasets, feature transformation, and feature engineering. The proposed model, which is based on an artificial neural network, effectively improves the accuracy of fraudulent transaction detection. The results of the different algorithms are visualized, and the logistic regression algorithm performs the best, with an output AUC value of approximately 0.946. The stacked generalization shows a better AUC of 0.954. The recognition of banking fraud using artificial intelligence algorithms is a topical issue in our digital society.

本研究考虑了应用人工智能识别银行欺诈的任务。近年来，由于COVID-19大流行，由于许多业务大规模转移到在线平台，以及创建了许多犯罪分子可以用来欺骗用户的慈善基金，银行欺诈变得更加普遍。目前的工作重点是将机器学习算法作为一种非常适合分析和识别网上银行交易的工具。该研究的科学新颖之处在于开发了用于识别欺诈性银行交易的机器学习模型，以及用于进一步比较和选择最佳结果的预处理银行数据的技术。本文还详细介绍了提高检测精度的各种方法，即处理高度不平衡的数据集、特征转换和特征工程。该模型基于人工神经网络，有效地提高了欺诈交易检测的准确性。将不同算法的结果可视化，其中逻辑回归算法表现最好，输出AUC值约为0.946。叠加泛化的AUC为0.954。使用人工智能算法识别银行欺诈是我们数字社会的一个热门问题。

{"title":"Application of Artificial Intelligence for Fraudulent Banking Operations Recognition","authors":"Bohdan Mytnyk, Oleksandr Tkachyk, Nataliya Shakhovska, Solomiia Fedushko, Yuriy Syerov","doi":"10.3390/bdcc7020093","DOIUrl":"https://doi.org/10.3390/bdcc7020093","url":null,"abstract":"This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of many charitable funds that criminals can use to deceive users. The present work focuses on machine learning algorithms as a tool well suited for analyzing and recognizing online banking transactions. The study’s scientific novelty is the development of machine learning models for identifying fraudulent banking transactions and techniques for preprocessing bank data for further comparison and selection of the best results. This paper also details various methods for improving detection accuracy, i.e., handling highly imbalanced datasets, feature transformation, and feature engineering. The proposed model, which is based on an artificial neural network, effectively improves the accuracy of fraudulent transaction detection. The results of the different algorithms are visualized, and the logistic regression algorithm performs the best, with an output AUC value of approximately 0.946. The stacked generalization shows a better AUC of 0.954. The recognition of banking fraud using artificial intelligence algorithms is a topical issue in our digital society.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135572682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Improved Pattern Sequence-Based Energy Load Forecast Algorithm Based on Self-Organizing Maps and Artificial Neural Networks 基于自组织映射和人工神经网络的改进模式序列能源负荷预测算法

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-10 DOI: 10.3390/bdcc7020092

D. Criado-Ramón, L. Ruiz, M. Pegalajar

Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the field. Firstly, this study evaluates the use of pattern sequence-based models with large datasets. Unlike previous works that use only one dataset or multiple datasets with less than two years of data, this work evaluates the models in three different public datasets, each containing eleven years of data. Secondly, we propose a new pattern sequence-based algorithm that uses a genetic algorithm to optimize the number of clusters alongside all other hyperparameters of the forecasting method, instead of using the Cluster Validity Indices (CVIs) commonly used in previous proposals. The results indicate that neural networks provide more accurate results than any pattern sequence-based algorithm and that our proposed algorithm outperforms other pattern sequence-based algorithms, albeit with a longer training time.

基于模式序列的模型是一种预测算法，它利用聚类和其他技术比传统的机器学习模型更快地产生易于解释的预测。本研究着重于它们在能源需求预测中的应用，并介绍了该领域的两个重要贡献。首先，本研究评估了基于模式序列的模型在大数据集上的使用。不像以前的工作只使用一个数据集或使用少于两年的数据集的多个数据集，这项工作在三个不同的公共数据集中评估模型，每个数据集包含11年的数据。其次，我们提出了一种新的基于模式序列的算法，该算法使用遗传算法来优化聚类数量以及预测方法的所有其他超参数，而不是使用先前建议中常用的聚类有效性指数(CVIs)。结果表明，神经网络比任何基于模式序列的算法提供更准确的结果，并且我们提出的算法优于其他基于模式序列的算法，尽管需要更长的训练时间。

{"title":"An Improved Pattern Sequence-Based Energy Load Forecast Algorithm Based on Self-Organizing Maps and Artificial Neural Networks","authors":"D. Criado-Ramón, L. Ruiz, M. Pegalajar","doi":"10.3390/bdcc7020092","DOIUrl":"https://doi.org/10.3390/bdcc7020092","url":null,"abstract":"Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the field. Firstly, this study evaluates the use of pattern sequence-based models with large datasets. Unlike previous works that use only one dataset or multiple datasets with less than two years of data, this work evaluates the models in three different public datasets, each containing eleven years of data. Secondly, we propose a new pattern sequence-based algorithm that uses a genetic algorithm to optimize the number of clusters alongside all other hyperparameters of the forecasting method, instead of using the Cluster Validity Indices (CVIs) commonly used in previous proposals. The results indicate that neural networks provide more accurate results than any pattern sequence-based algorithm and that our proposed algorithm outperforms other pattern sequence-based algorithms, albeit with a longer training time.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47740495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Recognizing Similar Musical Instruments with YOLO Models 用YOLO模型识别相似乐器

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-10 DOI: 10.3390/bdcc7020094

Christine Dewi, Abbott Po Shun Chen, Henoch Juli Christanto

Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments can be approached as a classification problem, where the goal is to train a machine learning model to classify instruments based on their features and shape. Cellos, clarinets, erhus, guitars, saxophones, trumpets, French horns, harps, recorders, bassoons, and violins were all classified in this investigation. There are many different musical instruments that have the same size, shape, and sound. In addition, we were amazed by the simplicity with which humans can identify items that are very similar to one another, but this is a challenging task for computers. For this study, we used YOLOv7 to identify pairs of musical instruments that are most like one another. Next, we compared and evaluated the results from YOLOv7 with those from YOLOv5. Furthermore, the results of our tests allowed us to enhance the performance in terms of detecting similar musical instruments. Moreover, with an average accuracy of 86.7%, YOLOv7 outperformed previous approaches and other research results.

机器学习和人工智能领域的研究人员最近开始将注意力集中在对象识别上。通过计算机视觉进行图像识别的最大障碍之一是对相似物品的检测和识别。识别类似的乐器可以作为一个分类问题来处理，目标是训练一个机器学习模型，根据乐器的特征和形状对其进行分类。大提琴、单簧管、二胡、吉他、萨克斯管、小号、法国号、竖琴、录音机、巴松管和小提琴都在本次调查中被分类。有许多不同的乐器具有相同的大小、形状和声音。此外，我们对人类能够识别彼此非常相似的项目的简单性感到惊讶，但这对计算机来说是一项具有挑战性的任务。在这项研究中，我们使用YOLOv7来识别彼此最相似的乐器对。接下来，我们将YOLOv7的结果与YOLOv5的结果进行了比较和评估。此外，我们的测试结果使我们能够在检测类似乐器方面提高性能。此外，YOLOv7的平均准确率为86.7%，优于以前的方法和其他研究结果。

{"title":"Recognizing Similar Musical Instruments with YOLO Models","authors":"Christine Dewi, Abbott Po Shun Chen, Henoch Juli Christanto","doi":"10.3390/bdcc7020094","DOIUrl":"https://doi.org/10.3390/bdcc7020094","url":null,"abstract":"Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments can be approached as a classification problem, where the goal is to train a machine learning model to classify instruments based on their features and shape. Cellos, clarinets, erhus, guitars, saxophones, trumpets, French horns, harps, recorders, bassoons, and violins were all classified in this investigation. There are many different musical instruments that have the same size, shape, and sound. In addition, we were amazed by the simplicity with which humans can identify items that are very similar to one another, but this is a challenging task for computers. For this study, we used YOLOv7 to identify pairs of musical instruments that are most like one another. Next, we compared and evaluated the results from YOLOv7 with those from YOLOv5. Furthermore, the results of our tests allowed us to enhance the performance in terms of detecting similar musical instruments. Moreover, with an average accuracy of 86.7%, YOLOv7 outperformed previous approaches and other research results.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43087739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Blockchain-Based Double-Layer Byzantine Fault Tolerance for Scalability Enhancement for Building Information Modeling Information Exchange 基于区块链的双层拜占庭容错增强可扩展性的建筑信息建模信息交换

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-09 DOI: 10.3390/bdcc7020090

Widya Nita Suliyanti, Riri Fitri Sari

A Practical Byzantine Fault Tolerance (PBFT) is a consensus algorithm deployed in a consortium blockchain that connects a group of related participants. This type of blockchain suits the implementation of the Building Information Modeling (BIM) information exchange with few participants. However, when much more participants are involved in the BIM information exchange, the PBFT algorithm, which inherently requires intensive communications among participating nodes, has limitations in terms of scalability and performance. The proposed solution for a multi-layer BFT hypothesizes that multi-layer BFT reduces communication complexity. However, having more layers will introduce more latency. Therefore, in this paper, Double-Layer Byzantine Fault Tolerance (DLBFT) is proposed to improve the blockchain scalability and performance of BIM information exchange. This study shows a double-layer network structure of nodes that can be built with each node on the first layer, which connects and forms a group with several nodes on the second layer. This network runs the Byzantine Fault Tolerance algorithm to reach a consensus. Instead of having one node send messages to all the nodes in the peer-to-peer network, one node only sends messages to a limited number of nodes on Layer 1 and up to three nodes in each corresponding group in Layer 2 in a hierarchical network. The DLBFT algorithm has been shown to reduce the required number of messages exchanged among nodes by 84% and the time to reach a consensus by 70%, thus improving blockchain scalability. Further research is required if more than one party is involved in multi-BIM projects.

实用拜占庭容错(PBFT)是部署在连接一组相关参与者的联盟区块链中的共识算法。这种类型的区块链适合参与者较少的建筑信息模型(BIM)信息交换的实现。然而，当更多的参与者参与到BIM信息交换中时，PBFT算法需要在参与节点之间进行密集的通信，因此在可扩展性和性能方面存在局限性。提出的多层BFT解决方案假设多层BFT降低了通信复杂性。然而，拥有更多的层将会带来更多的延迟。为此，本文提出双层拜占庭容错(double layer Byzantine Fault Tolerance, DLBFT)技术，以提高BIM信息交换的区块链可扩展性和性能。本研究展示了一种双层节点网络结构，第一层每个节点可以构建，第二层多个节点连接成一组。该网络运行拜占庭容错算法以达成共识。而不是让一个节点向对等网络中的所有节点发送消息，在分层网络中，一个节点只向第一层的有限数量的节点发送消息，并在第二层的每个相应组中最多发送三个节点。DLBFT算法已被证明可以将节点间交换的消息数量减少84%，达成共识的时间减少70%，从而提高区块链的可扩展性。如果多方参与多个bim项目，则需要进一步研究。

{"title":"Blockchain-Based Double-Layer Byzantine Fault Tolerance for Scalability Enhancement for Building Information Modeling Information Exchange","authors":"Widya Nita Suliyanti, Riri Fitri Sari","doi":"10.3390/bdcc7020090","DOIUrl":"https://doi.org/10.3390/bdcc7020090","url":null,"abstract":"A Practical Byzantine Fault Tolerance (PBFT) is a consensus algorithm deployed in a consortium blockchain that connects a group of related participants. This type of blockchain suits the implementation of the Building Information Modeling (BIM) information exchange with few participants. However, when much more participants are involved in the BIM information exchange, the PBFT algorithm, which inherently requires intensive communications among participating nodes, has limitations in terms of scalability and performance. The proposed solution for a multi-layer BFT hypothesizes that multi-layer BFT reduces communication complexity. However, having more layers will introduce more latency. Therefore, in this paper, Double-Layer Byzantine Fault Tolerance (DLBFT) is proposed to improve the blockchain scalability and performance of BIM information exchange. This study shows a double-layer network structure of nodes that can be built with each node on the first layer, which connects and forms a group with several nodes on the second layer. This network runs the Byzantine Fault Tolerance algorithm to reach a consensus. Instead of having one node send messages to all the nodes in the peer-to-peer network, one node only sends messages to a limited number of nodes on Layer 1 and up to three nodes in each corresponding group in Layer 2 in a hierarchical network. The DLBFT algorithm has been shown to reduce the required number of messages exchanged among nodes by 84% and the time to reach a consensus by 70%, thus improving blockchain scalability. Further research is required if more than one party is involved in multi-BIM projects.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42259504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Predicting Cell Cleavage Timings from Time-Lapse Videos of Human Embryos 从人类胚胎延时录像预测细胞分裂时间

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-09 DOI: 10.3390/bdcc7020091

Akriti Sharma, Ayaz Z. Ansari, R. Kakulavarapu, M. Stensen, M. Riegler, H. Hammer

Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning methodology for automating the computations for the start of cell cleavage stages, in hours post insemination, in time-lapse videos. The methodology detects embryo cells in video frames and predicts the frame with the onset of the cell cleavage stage. Next, the methodology reads hours post insemination from the frame using optical character recognition. Unlike traditional embryo cell detection techniques, our suggested approach eliminates the need for extra image processing tasks such as locating embryos or removing extracellular material (fragmentation). The methodology accurately predicts cell cleavage stages up to five cells. The methodology was also able to detect the morphological structures of later cell cleavage stages, such as morula and blastocyst. It takes about one minute for the methodology to annotate the times of all the cell cleavages in a time-lapse video.

辅助生殖技术用于治疗不孕不育，其成功取决于子宫移植胚胎的质量和生存能力。目前，胚胎学家手动评估胚胎发育，包括细胞分裂之间的持续时间。本文介绍了一种机器学习方法，用于在延时视频中自动计算受精后数小时内细胞切割阶段的开始。该方法检测视频帧中的胚胎细胞，并预测细胞切割阶段开始时的帧。接下来，该方法使用光学字符识别从框架中读取受精后的小时数。与传统的胚胎细胞检测技术不同，我们提出的方法消除了对额外图像处理任务的需要，如定位胚胎或去除细胞外物质（碎片）。该方法准确预测了多达五个细胞的细胞切割阶段。该方法还能够检测后期细胞切割阶段的形态结构，如桑椹胚和胚泡。该方法在延时视频中注释所有细胞裂解的时间大约需要一分钟。

{"title":"Predicting Cell Cleavage Timings from Time-Lapse Videos of Human Embryos","authors":"Akriti Sharma, Ayaz Z. Ansari, R. Kakulavarapu, M. Stensen, M. Riegler, H. Hammer","doi":"10.3390/bdcc7020091","DOIUrl":"https://doi.org/10.3390/bdcc7020091","url":null,"abstract":"Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning methodology for automating the computations for the start of cell cleavage stages, in hours post insemination, in time-lapse videos. The methodology detects embryo cells in video frames and predicts the frame with the onset of the cell cleavage stage. Next, the methodology reads hours post insemination from the frame using optical character recognition. Unlike traditional embryo cell detection techniques, our suggested approach eliminates the need for extra image processing tasks such as locating embryos or removing extracellular material (fragmentation). The methodology accurately predicts cell cleavage stages up to five cells. The methodology was also able to detect the morphological structures of later cell cleavage stages, such as morula and blastocyst. It takes about one minute for the methodology to annotate the times of all the cell cleavages in a time-lapse video.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49264161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Finding the Time-Period-Based Most Frequent Path from Trajectory–Topology 从轨迹拓扑中寻找基于时间段的最频繁路径

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing

Pub Date : 2023-05-08 DOI: 10.3390/bdcc7020088

Jianing Ding, Xin Jin, Zhiheng Li

The Time-Period-Based Most Frequent Path (TPMFP) problem has been a hot topic in traffic studies for many years. The TPMFP problem involves finding the most frequent path between two locations by observing the travelling behaviors of drivers in a specific time period. However, the previous researchers over-simplify the road network, which results in the ignorance of transfer costs at intersections. To address this problem more elegantly, we built up an urban topology model consisting of Intersection Vertices and Connection Vertices. Specifically, we split the Intersection Vertices to eliminate the influence of transfer cost on finding TPMFP and generate Trajectory–Topology from GPS records data. In addition, we further leveraged the Footmark Graph method to find the TPMFP. Finally, we conducted extensive experiments using a real-world dataset containing over eight million GPS records. Compared to the current state-of-the-art method, our proposed approach can find more reasonable MFP in approximately 10% of cases during off-peak hours and 40% of cases during peak hours.

基于时间段的最频繁路径(TPMFP)问题是近年来交通研究中的一个热点问题。TPMFP问题是通过观察驾驶员在特定时间段内的出行行为，找到两个地点之间最频繁的路径。然而，以往的研究对路网进行了过度简化，忽略了交叉口的转移成本。为了更优雅地解决这个问题，我们建立了一个由相交顶点和连接顶点组成的城市拓扑模型。具体来说，我们将交点分割以消除转移代价对TPMFP的影响，并从GPS记录数据中生成轨迹拓扑。此外，我们进一步利用Footmark Graph方法来寻找TPMFP。最后，我们使用包含超过800万条GPS记录的真实数据集进行了广泛的实验。与目前最先进的方法相比，我们提出的方法可以在大约10%的非高峰时段和40%的高峰时段找到更合理的MFP。

引用次数: 0