首页 > 最新文献

Big Data and Cognitive Computing最新文献

英文 中文
Cumulative and Rolling Horizon Prediction of Overall Equipment Effectiveness (OEE) with Machine Learning 基于机器学习的总体装备效能累积滚动预测
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-08-02 DOI: 10.3390/bdcc7030138
Péter Dobra, J. Jósvai
Nowadays, one of the important and indispensable conditions for the effectiveness and competitiveness of industrial companies is the high efficiency of manufacturing and assembly. These enterprises based on different methods and tools systematically monitor their efficiency metrics with Key Performance Indicators (KPIs). One of these most frequently used metrics is Overall Equipment Effectiveness (OEE), the product of availability, performance and quality. In addition to monitoring, it is also necessary to predict efficiency, which can be implemented with the support of machine learning techniques. This paper presents and compares several supervised machine learning techniques amongst other polynomial regression, lasso regression, ridge regression and gradient boost regression. The aim of this article is to determine the best estimation method for semiautomatic assembly line and large batch size. The case study presented with a real industrial example gives the answer as to which of the cumulative or rolling horizon prediction methods is more accurate.
如今,工业企业的有效性和竞争力的重要和不可或缺的条件之一是高效率的制造和装配。这些企业基于不同的方法和工具,通过关键绩效指标(kpi)系统地监控其效率指标。其中最常用的指标之一是整体设备效率(OEE),它是可用性、性能和质量的产物。除了监测之外,还需要预测效率,这可以在机器学习技术的支持下实现。本文介绍并比较了几种有监督机器学习技术,其中包括多项式回归、lasso回归、脊回归和梯度增强回归。本文的目的是确定大批量半自动装配线的最佳估计方法。通过一个实际的工业实例,给出了累积层位预测和滚动层位预测哪种方法更准确的答案。
{"title":"Cumulative and Rolling Horizon Prediction of Overall Equipment Effectiveness (OEE) with Machine Learning","authors":"Péter Dobra, J. Jósvai","doi":"10.3390/bdcc7030138","DOIUrl":"https://doi.org/10.3390/bdcc7030138","url":null,"abstract":"Nowadays, one of the important and indispensable conditions for the effectiveness and competitiveness of industrial companies is the high efficiency of manufacturing and assembly. These enterprises based on different methods and tools systematically monitor their efficiency metrics with Key Performance Indicators (KPIs). One of these most frequently used metrics is Overall Equipment Effectiveness (OEE), the product of availability, performance and quality. In addition to monitoring, it is also necessary to predict efficiency, which can be implemented with the support of machine learning techniques. This paper presents and compares several supervised machine learning techniques amongst other polynomial regression, lasso regression, ridge regression and gradient boost regression. The aim of this article is to determine the best estimation method for semiautomatic assembly line and large batch size. The case study presented with a real industrial example gives the answer as to which of the cumulative or rolling horizon prediction methods is more accurate.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42464962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Approach Based on Recurrent Neural Networks and Interactive Visualization to Improve Explainability in AI Systems 基于递归神经网络和交互式可视化的人工智能系统可解释性改进方法
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-31 DOI: 10.3390/bdcc7030136
W. Villegas-Ch., J. Garcia-Ortiz, Ángel Jaramillo-Alcázar
This paper investigated the importance of explainability in artificial intelligence models and its application in the context of prediction in Formula (1). A step-by-step analysis was carried out, including collecting and preparing data from previous races, training an AI model to make predictions, and applying explainability techniques in the said model. Two approaches were used: the attention technique, which allowed visualizing the most relevant parts of the input data using heat maps, and the permutation importance technique, which evaluated the relative importance of features. The results revealed that feature length and qualifying performance are crucial variables for position predictions in Formula (1). These findings highlight the relevance of explainability in AI models, not only in Formula (1) but also in other fields and sectors, by ensuring fairness, transparency, and accountability in AI-based decision making. The results highlight the importance of considering explainability in AI models and provide a practical methodology for its implementation in Formula (1) and other domains.
本文研究了可解释性在人工智能模型中的重要性及其在公式(1)预测中的应用。进行了逐步分析,包括收集和准备之前比赛的数据,训练人工智能模型进行预测,并在所述模型中应用可解释性技术。使用了两种方法:注意力技术和排列重要性技术,前者允许使用热图可视化输入数据中最相关的部分,后者评估特征的相对重要性。结果表明,特征长度和合格性能是公式(1)中位置预测的关键变量。这些发现强调了人工智能模型中可解释性的相关性,不仅在公式(1)中,而且在其他领域和部门中,通过确保基于人工智能的决策的公平性、透明度和问责制。研究结果强调了在人工智能模型中考虑可解释性的重要性,并为其在公式(1)和其他领域的实现提供了一种实用的方法。
{"title":"An Approach Based on Recurrent Neural Networks and Interactive Visualization to Improve Explainability in AI Systems","authors":"W. Villegas-Ch., J. Garcia-Ortiz, Ángel Jaramillo-Alcázar","doi":"10.3390/bdcc7030136","DOIUrl":"https://doi.org/10.3390/bdcc7030136","url":null,"abstract":"This paper investigated the importance of explainability in artificial intelligence models and its application in the context of prediction in Formula (1). A step-by-step analysis was carried out, including collecting and preparing data from previous races, training an AI model to make predictions, and applying explainability techniques in the said model. Two approaches were used: the attention technique, which allowed visualizing the most relevant parts of the input data using heat maps, and the permutation importance technique, which evaluated the relative importance of features. The results revealed that feature length and qualifying performance are crucial variables for position predictions in Formula (1). These findings highlight the relevance of explainability in AI models, not only in Formula (1) but also in other fields and sectors, by ensuring fairness, transparency, and accountability in AI-based decision making. The results highlight the importance of considering explainability in AI models and provide a practical methodology for its implementation in Formula (1) and other domains.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45558990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EnviroStream: A Stream Reasoning Benchmark for Environmental and Climate Monitoring 环境和气候监测的流推理基准
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-31 DOI: 10.3390/bdcc7030135
Elena Mastria, Francesco Pacenza, J. Zangari, Francesco Calimeri, S. Perri, G. Terracina
Stream Reasoning (SR) focuses on developing advanced approaches for applying inference to dynamic data streams; it has become increasingly relevant in various application scenarios such as IoT, Smart Cities, Emergency Management, and Healthcare, despite being a relatively new field of research. The current lack of standardized formalisms and benchmarks has been hindering the comparison between different SR approaches. We proposed a new benchmark, called EnviroStream, for evaluating SR systems on weather and environmental data. The benchmark includes queries and datasets of different sizes. We adopted I-DLV-sr, a recently released SR system based on Answer Set Programming, as a baseline for query modelling and experimentation. We also showcased continuous online reasoning via a web application.
流推理(SR)专注于开发将推理应用于动态数据流的高级方法;尽管它是一个相对较新的研究领域,但它在物联网、智慧城市、应急管理和医疗保健等各种应用场景中越来越重要。目前缺乏标准化的形式和基准已经阻碍了不同SR方法之间的比较。我们提出了一个新的基准,称为EnviroStream,用于评估SR系统的天气和环境数据。基准测试包括不同大小的查询和数据集。我们采用了I-DLV-sr,一个最近发布的基于答案集编程的SR系统,作为查询建模和实验的基线。我们还通过一个web应用程序展示了连续的在线推理。
{"title":"EnviroStream: A Stream Reasoning Benchmark for Environmental and Climate Monitoring","authors":"Elena Mastria, Francesco Pacenza, J. Zangari, Francesco Calimeri, S. Perri, G. Terracina","doi":"10.3390/bdcc7030135","DOIUrl":"https://doi.org/10.3390/bdcc7030135","url":null,"abstract":"Stream Reasoning (SR) focuses on developing advanced approaches for applying inference to dynamic data streams; it has become increasingly relevant in various application scenarios such as IoT, Smart Cities, Emergency Management, and Healthcare, despite being a relatively new field of research. The current lack of standardized formalisms and benchmarks has been hindering the comparison between different SR approaches. We proposed a new benchmark, called EnviroStream, for evaluating SR systems on weather and environmental data. The benchmark includes queries and datasets of different sizes. We adopted I-DLV-sr, a recently released SR system based on Answer Set Programming, as a baseline for query modelling and experimentation. We also showcased continuous online reasoning via a web application.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44214843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting the Price of Bitcoin Using Sentiment-Enriched Time Series Forecasting 使用情绪丰富的时间序列预测预测比特币价格
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-31 DOI: 10.3390/bdcc7030137
Markus Frohmann, Manuel Karner, Said Khudoyan, Robert Wagner, M. Schedl
Recently, various methods to predict the future price of financial assets have emerged. One promising approach is to combine the historic price with sentiment scores derived via sentiment analysis techniques. In this article, we focus on predicting the future price of Bitcoin, which is currently the most popular cryptocurrency. More precisely, we propose a hybrid approach, combining time series forecasting and sentiment prediction from microblogs, to predict the intraday price of Bitcoin. Moreover, in addition to standard sentiment analysis methods, we are the first to employ a fine-tuned BERT model for this task. We also introduce a novel weighting scheme in which the weight of the sentiment of each tweet depends on the number of its creator’s followers. For evaluation, we consider periods with strongly varying ranges of Bitcoin prices. This enables us to assess the models w.r.t. robustness and generalization to varied market conditions. Our experiments demonstrate that BERT-based sentiment analysis and the proposed weighting scheme improve upon previous methods. Specifically, our hybrid models that use linear regression as the underlying forecasting algorithm perform best in terms of the mean absolute error (MAE of 2.67) and root mean squared error (RMSE of 3.28). However, more complicated models, particularly long short-term memory networks and temporal convolutional networks, tend to have generalization and overfitting issues, resulting in considerably higher MAE and RMSE scores.
近年来,预测金融资产未来价格的各种方法层出不穷。一种有前景的方法是将历史价格与通过情绪分析技术得出的情绪得分相结合。在这篇文章中,我们重点预测比特币的未来价格,比特币是目前最受欢迎的加密货币。更准确地说,我们提出了一种混合方法,将时间序列预测和微博情绪预测相结合,来预测比特币的日内价格。此外,除了标准的情绪分析方法外,我们还是第一个在这项任务中使用微调的BERT模型的人。我们还引入了一种新颖的加权方案,其中每条推文的情感权重取决于其创作者的追随者数量。为了进行评估,我们考虑了比特币价格变化幅度很大的时期。这使我们能够评估模型对不同市场条件的稳健性和泛化能力。我们的实验表明,基于BERT的情绪分析和所提出的加权方案改进了以前的方法。具体而言,我们使用线性回归作为基础预测算法的混合模型在平均绝对误差(MAE为2.67)和均方根误差(RMSE为3.28)方面表现最好。然而,更复杂的模型,特别是长短期记忆网络和时间卷积网络,往往存在泛化和过拟合问题,从而导致相当高的MAE和RMSE分数。
{"title":"Predicting the Price of Bitcoin Using Sentiment-Enriched Time Series Forecasting","authors":"Markus Frohmann, Manuel Karner, Said Khudoyan, Robert Wagner, M. Schedl","doi":"10.3390/bdcc7030137","DOIUrl":"https://doi.org/10.3390/bdcc7030137","url":null,"abstract":"Recently, various methods to predict the future price of financial assets have emerged. One promising approach is to combine the historic price with sentiment scores derived via sentiment analysis techniques. In this article, we focus on predicting the future price of Bitcoin, which is currently the most popular cryptocurrency. More precisely, we propose a hybrid approach, combining time series forecasting and sentiment prediction from microblogs, to predict the intraday price of Bitcoin. Moreover, in addition to standard sentiment analysis methods, we are the first to employ a fine-tuned BERT model for this task. We also introduce a novel weighting scheme in which the weight of the sentiment of each tweet depends on the number of its creator’s followers. For evaluation, we consider periods with strongly varying ranges of Bitcoin prices. This enables us to assess the models w.r.t. robustness and generalization to varied market conditions. Our experiments demonstrate that BERT-based sentiment analysis and the proposed weighting scheme improve upon previous methods. Specifically, our hybrid models that use linear regression as the underlying forecasting algorithm perform best in terms of the mean absolute error (MAE of 2.67) and root mean squared error (RMSE of 3.28). However, more complicated models, particularly long short-term memory networks and temporal convolutional networks, tend to have generalization and overfitting issues, resulting in considerably higher MAE and RMSE scores.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49452726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Driving Excellence in Official Statistics: Unleashing the Potential of Comprehensive Digital Data Governance 推动官方统计的卓越发展:释放数字数据综合治理的潜力
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-29 DOI: 10.3390/bdcc7030134
Hossein Hassani, S. MacFeely
With the ubiquitous use of digital technologies and the consequent data deluge, official statistics faces new challenges and opportunities. In this context, strengthening official statistics through effective data governance will be crucial to ensure reliability, quality, and access to data. This paper presents a comprehensive framework for digital data governance for official statistics, addressing key components, such as data collection and management, processing and analysis, data sharing and dissemination, as well as privacy and ethical considerations. The framework integrates principles of data governance into digital statistical processes, enabling statistical organizations to navigate the complexities of the digital environment. Drawing on case studies and best practices, the paper highlights successful implementations of digital data governance in official statistics. The paper concludes by discussing future trends and directions, including emerging technologies and opportunities for advancing digital data governance.
随着数字技术的广泛使用和随之而来的数据泛滥,官方统计面临着新的挑战和机遇。在此背景下,通过有效的数据治理加强官方统计对于确保数据的可靠性、质量和可及性至关重要。本文提出了官方统计数字数据治理的综合框架,解决了数据收集和管理、处理和分析、数据共享和传播以及隐私和道德考虑等关键组成部分。该框架将数据治理原则集成到数字统计流程中,使统计组织能够驾驭数字环境的复杂性。通过案例研究和最佳实践,本文重点介绍了数字数据治理在官方统计中的成功实施。本文最后讨论了未来的趋势和方向,包括推进数字数据治理的新兴技术和机遇。
{"title":"Driving Excellence in Official Statistics: Unleashing the Potential of Comprehensive Digital Data Governance","authors":"Hossein Hassani, S. MacFeely","doi":"10.3390/bdcc7030134","DOIUrl":"https://doi.org/10.3390/bdcc7030134","url":null,"abstract":"With the ubiquitous use of digital technologies and the consequent data deluge, official statistics faces new challenges and opportunities. In this context, strengthening official statistics through effective data governance will be crucial to ensure reliability, quality, and access to data. This paper presents a comprehensive framework for digital data governance for official statistics, addressing key components, such as data collection and management, processing and analysis, data sharing and dissemination, as well as privacy and ethical considerations. The framework integrates principles of data governance into digital statistical processes, enabling statistical organizations to navigate the complexities of the digital environment. Drawing on case studies and best practices, the paper highlights successful implementations of digital data governance in official statistics. The paper concludes by discussing future trends and directions, including emerging technologies and opportunities for advancing digital data governance.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48750953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation Method of Electric Vehicle Charging Station Operation Based on Contrastive Learning 基于对比学习的电动汽车充电站运行评价方法
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-24 DOI: 10.3390/bdcc7030133
Ze-Yang Tang, Qi-Biao Hu, Yibo Cui, Lei Hu, Yi-Wen Li, Yu-Jie Li
This paper aims to address the issue of evaluating the operation of electric vehicle charging stations (EVCSs). Previous studies have commonly employed the method of constructing comprehensive evaluation systems, which greatly relies on manual experience for index selection and weight allocation. To overcome this limitation, this paper proposes an evaluation method based on natural language models for assessing the operation of charging stations. By utilizing the proposed SimCSEBERT model, this study analyzes the operational data, user charging data, and basic information of charging stations to predict the operational status and identify influential factors. Additionally, this study compared the evaluation accuracy and impact factor analysis accuracy of the baseline and the proposed model. The experimental results demonstrate that our model achieves a higher evaluation accuracy (operation evaluation accuracy = 0.9464; impact factor analysis accuracy = 0.9492) and effectively assesses the operation of EVCSs. Compared with traditional evaluation methods, this approach exhibits improved universality and a higher level of intelligence. It provides insights into the operation of EVCSs and user demands, allowing for the resolution of supply–demand contradictions that are caused by power supply constraints and the uneven distribution of charging demands. Furthermore, it offers guidance for more efficient and targeted strategies for the operation of charging stations.
本文旨在解决电动汽车充电站(EVCS)运营评估问题。以往的研究通常采用构建综合评价体系的方法,这在很大程度上依赖于人工经验进行指标选择和权重分配。为了克服这一局限性,本文提出了一种基于自然语言模型的充电站运营评估方法。利用所提出的SimCSEBERT模型,本研究分析了充电站的运营数据、用户充电数据和基本信息,以预测运营状态并识别影响因素。此外,本研究还比较了基线和所提出模型的评估准确性和影响因素分析准确性。实验结果表明,我们的模型实现了更高的评估精度(操作评估精度=0.9464;影响因素分析精度=0.9492),并有效地评估了EVCS的操作。与传统的评估方法相比,该方法具有更好的通用性和更高的智能化水平。它提供了对电动汽车运营和用户需求的深入了解,从而解决了由电力供应限制和充电需求分布不均引起的供需矛盾。此外,它还为充电站的运营提供了更高效、更有针对性的策略指导。
{"title":"Evaluation Method of Electric Vehicle Charging Station Operation Based on Contrastive Learning","authors":"Ze-Yang Tang, Qi-Biao Hu, Yibo Cui, Lei Hu, Yi-Wen Li, Yu-Jie Li","doi":"10.3390/bdcc7030133","DOIUrl":"https://doi.org/10.3390/bdcc7030133","url":null,"abstract":"This paper aims to address the issue of evaluating the operation of electric vehicle charging stations (EVCSs). Previous studies have commonly employed the method of constructing comprehensive evaluation systems, which greatly relies on manual experience for index selection and weight allocation. To overcome this limitation, this paper proposes an evaluation method based on natural language models for assessing the operation of charging stations. By utilizing the proposed SimCSEBERT model, this study analyzes the operational data, user charging data, and basic information of charging stations to predict the operational status and identify influential factors. Additionally, this study compared the evaluation accuracy and impact factor analysis accuracy of the baseline and the proposed model. The experimental results demonstrate that our model achieves a higher evaluation accuracy (operation evaluation accuracy = 0.9464; impact factor analysis accuracy = 0.9492) and effectively assesses the operation of EVCSs. Compared with traditional evaluation methods, this approach exhibits improved universality and a higher level of intelligence. It provides insights into the operation of EVCSs and user demands, allowing for the resolution of supply–demand contradictions that are caused by power supply constraints and the uneven distribution of charging demands. Furthermore, it offers guidance for more efficient and targeted strategies for the operation of charging stations.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49283617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters 基于固定字符级滤波器的卷积神经网络哈萨克语语音识别模型的开发
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-20 DOI: 10.3390/bdcc7030132
N. Kadyrbek, M. Mansurova, A. Shomanov, G. Makharova
This study is devoted to the transcription of human speech in the Kazakh language in dynamically changing conditions. It discusses key aspects related to the phonetic structure of the Kazakh language, technical considerations in collecting the transcribed audio corpus, and the use of deep neural networks for speech modeling. A high-quality decoded audio corpus was collected, containing 554 h of data, giving an idea of the frequencies of letters and syllables, as well as demographic parameters such as the gender, age, and region of residence of native speakers. The corpus contains a universal vocabulary and serves as a valuable resource for the development of modules related to speech. Machine learning experiments were conducted using the DeepSpeech2 model, which includes a sequence-to-sequence architecture with an encoder, decoder, and attention mechanism. To increase the reliability of the model, filters initialized with symbol-level embeddings were introduced to reduce the dependence on accurate positioning on object maps. The training process included simultaneous preparation of convolutional filters for spectrograms and symbolic objects. The proposed approach, using a combination of supervised and unsupervised learning methods, resulted in a 66.7% reduction in the weight of the model while maintaining relative accuracy. The evaluation on the test sample showed a 7.6% lower character error rate (CER) compared to existing models, demonstrating its most modern characteristics. The proposed architecture provides deployment on platforms with limited resources. Overall, this study presents a high-quality audio corpus, an improved speech recognition model, and promising results applicable to speech-related applications and languages beyond Kazakh.
本研究致力于在动态变化的条件下哈萨克语中人类语音的转录。它讨论了与哈萨克语语音结构相关的关键方面,收集转录音频语料库的技术考虑,以及深度神经网络用于语音建模。收集了一个高质量的解码音频语料库,包含554小时的数据,了解了字母和音节的频率,以及母语人士的性别、年龄和居住地区等人口统计参数。语料库包含通用词汇,是开发语音相关模块的宝贵资源。机器学习实验是使用DeepSpeech2模型进行的,该模型包括带有编码器、解码器和注意力机制的序列到序列架构。为了提高模型的可靠性,引入了用符号级嵌入初始化的滤波器,以减少对对象地图精确定位的依赖。训练过程包括同时为声谱图和符号对象准备卷积滤波器。所提出的方法结合了有监督和无监督的学习方法,在保持相对准确性的同时,使模型的权重降低了66.7%。对测试样本的评估显示,与现有模型相比,字符错误率(CER)降低了7.6%,展示了其最现代的特征。所提出的体系结构在资源有限的平台上进行部署。总的来说,这项研究提供了一个高质量的音频语料库,一个改进的语音识别模型,以及适用于哈萨克语以外的语音相关应用和语言的有希望的结果。
{"title":"The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters","authors":"N. Kadyrbek, M. Mansurova, A. Shomanov, G. Makharova","doi":"10.3390/bdcc7030132","DOIUrl":"https://doi.org/10.3390/bdcc7030132","url":null,"abstract":"This study is devoted to the transcription of human speech in the Kazakh language in dynamically changing conditions. It discusses key aspects related to the phonetic structure of the Kazakh language, technical considerations in collecting the transcribed audio corpus, and the use of deep neural networks for speech modeling. A high-quality decoded audio corpus was collected, containing 554 h of data, giving an idea of the frequencies of letters and syllables, as well as demographic parameters such as the gender, age, and region of residence of native speakers. The corpus contains a universal vocabulary and serves as a valuable resource for the development of modules related to speech. Machine learning experiments were conducted using the DeepSpeech2 model, which includes a sequence-to-sequence architecture with an encoder, decoder, and attention mechanism. To increase the reliability of the model, filters initialized with symbol-level embeddings were introduced to reduce the dependence on accurate positioning on object maps. The training process included simultaneous preparation of convolutional filters for spectrograms and symbolic objects. The proposed approach, using a combination of supervised and unsupervised learning methods, resulted in a 66.7% reduction in the weight of the model while maintaining relative accuracy. The evaluation on the test sample showed a 7.6% lower character error rate (CER) compared to existing models, demonstrating its most modern characteristics. The proposed architecture provides deployment on platforms with limited resources. Overall, this study presents a high-quality audio corpus, an improved speech recognition model, and promising results applicable to speech-related applications and languages beyond Kazakh.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43245154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Real-Time Vehicle Speed Prediction Method Based on a Lightweight Informer Driven by Big Temporal Data 基于时间大数据驱动的轻型信息器的实时车速预测方法
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-15 DOI: 10.3390/bdcc7030131
Xinyu Tian, Qinghe Zheng, Zhiguo Yu, Mingqiang Yang, Yao Ding, Abdussalam Elhanashi, S. Saponara, K. Kpalma
At present, the design of modern vehicles requires improving driving performance while meeting emission standards, leading to increasingly complex power systems. In autonomous driving systems, accurate, real-time vehicle speed prediction is one of the key factors in achieving automated driving. Accurate prediction and optimal control based on future vehicle speeds are key strategies for dealing with ever-changing and complex actual driving environments. However, predicting driver behavior is uncertain and may be influenced by the surrounding driving environment, such as weather and road conditions. To overcome these limitations, we propose a real-time vehicle speed prediction method based on a lightweight deep learning model driven by big temporal data. Firstly, the temporal data collected by automotive sensors are decomposed into a feature matrix through empirical mode decomposition (EMD). Then, an informer model based on the attention mechanism is designed to extract key information for learning and prediction. During the iterative training process of the informer, redundant parameters are removed through importance measurement criteria to achieve real-time inference. Finally, experimental results demonstrate that the proposed method achieves superior speed prediction performance through comparing it with state-of-the-art statistical modelling methods and deep learning models. Tests on edge computing devices also confirmed that the designed model can meet the requirements of actual tasks.
目前,现代车辆的设计要求在满足排放标准的同时提高行驶性能,导致动力系统日益复杂。在自动驾驶系统中,准确、实时的车速预测是实现自动驾驶的关键因素之一。基于未来车速的准确预测和最优控制是应对不断变化和复杂的实际驾驶环境的关键策略。然而,预测驾驶员的行为是不确定的,并且可能受到周围驾驶环境的影响,例如天气和道路状况。为了克服这些限制,我们提出了一种基于大时间数据驱动的轻量级深度学习模型的实时车速预测方法。首先,通过经验模态分解(EMD)将汽车传感器采集的时间数据分解为特征矩阵;然后,设计了一个基于注意机制的信息者模型,提取关键信息进行学习和预测。在告密者的迭代训练过程中,通过重要性度量准则去除冗余参数,实现实时推理。最后,通过与现有统计建模方法和深度学习模型的比较,实验结果表明该方法具有更好的速度预测性能。在边缘计算设备上的测试也证实了所设计的模型能够满足实际任务的要求。
{"title":"A Real-Time Vehicle Speed Prediction Method Based on a Lightweight Informer Driven by Big Temporal Data","authors":"Xinyu Tian, Qinghe Zheng, Zhiguo Yu, Mingqiang Yang, Yao Ding, Abdussalam Elhanashi, S. Saponara, K. Kpalma","doi":"10.3390/bdcc7030131","DOIUrl":"https://doi.org/10.3390/bdcc7030131","url":null,"abstract":"At present, the design of modern vehicles requires improving driving performance while meeting emission standards, leading to increasingly complex power systems. In autonomous driving systems, accurate, real-time vehicle speed prediction is one of the key factors in achieving automated driving. Accurate prediction and optimal control based on future vehicle speeds are key strategies for dealing with ever-changing and complex actual driving environments. However, predicting driver behavior is uncertain and may be influenced by the surrounding driving environment, such as weather and road conditions. To overcome these limitations, we propose a real-time vehicle speed prediction method based on a lightweight deep learning model driven by big temporal data. Firstly, the temporal data collected by automotive sensors are decomposed into a feature matrix through empirical mode decomposition (EMD). Then, an informer model based on the attention mechanism is designed to extract key information for learning and prediction. During the iterative training process of the informer, redundant parameters are removed through importance measurement criteria to achieve real-time inference. Finally, experimental results demonstrate that the proposed method achieves superior speed prediction performance through comparing it with state-of-the-art statistical modelling methods and deep learning models. Tests on edge computing devices also confirmed that the designed model can meet the requirements of actual tasks.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45047438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Guide to Data Collection for Computation and Monitoring of Node Energy Consumption 节点能耗计算与监测数据收集指南
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-11 DOI: 10.3390/bdcc7030130
A. del Río, Giuseppe Conti, S. Castaño-Solis, Javier Serrano, David Jiménez, J. Fraile-Ardanuy
The digital transition that drives the new industrial revolution is largely driven by the application of intelligence and data. This boost leads to an increase in energy consumption, much of it associated with computing in data centers. This fact clashes with the growing need to save and improve energy efficiency and requires a more optimized use of resources. The deployment of new services in edge and cloud computing, virtualization, and software-defined networks requires a better understanding of consumption patterns aimed at more efficient and sustainable models and a reduction in carbon footprints. These patterns are suitable to be exploited by machine, deep, and reinforced learning techniques in pursuit of energy consumption optimization, which can ideally improve the energy efficiency of data centers and big computing servers providing these kinds of services. For the application of these techniques, it is essential to investigate data collection processes to create initial information points. Datasets also need to be created to analyze how to diagnose systems and sort out new ways of optimization. This work describes a data collection methodology used to create datasets that collect consumption data from a real-world work environment dedicated to data centers, server farms, or similar architectures. Specifically, it covers the entire process of energy stimuli generation, data extraction, and data preprocessing. The evaluation and reproduction of this method is offered to the scientific community through an online repository created for this work, which hosts all the code available for its download.
推动新工业革命的数字化转型在很大程度上是由智能和数据的应用驱动的。这种增长导致能源消耗的增加,其中大部分与数据中心的计算有关。这一事实与日益增长的节约和提高能源效率的需求相冲突,需要更优化地利用资源。在边缘和云计算、虚拟化和软件定义网络中部署新服务需要更好地了解消费模式,以实现更高效、更可持续的模式,并减少碳足迹。这些模式适合被机器、深度和强化学习技术利用,以追求能耗优化,这可以理想地提高提供此类服务的数据中心和大型计算服务器的能源效率。为了应用这些技术,必须研究数据收集过程以创建初始信息点。还需要创建数据集来分析如何诊断系统并整理出新的优化方法。这项工作描述了一种数据收集方法,用于创建数据集,收集来自数据中心、服务器场或类似架构的真实工作环境的消费数据。具体来说,它涵盖了能量刺激产生、数据提取和数据预处理的整个过程。该方法的评估和复制通过为这项工作创建的在线存储库提供给科学界,该存储库包含所有可下载的代码。
{"title":"A Guide to Data Collection for Computation and Monitoring of Node Energy Consumption","authors":"A. del Río, Giuseppe Conti, S. Castaño-Solis, Javier Serrano, David Jiménez, J. Fraile-Ardanuy","doi":"10.3390/bdcc7030130","DOIUrl":"https://doi.org/10.3390/bdcc7030130","url":null,"abstract":"The digital transition that drives the new industrial revolution is largely driven by the application of intelligence and data. This boost leads to an increase in energy consumption, much of it associated with computing in data centers. This fact clashes with the growing need to save and improve energy efficiency and requires a more optimized use of resources. The deployment of new services in edge and cloud computing, virtualization, and software-defined networks requires a better understanding of consumption patterns aimed at more efficient and sustainable models and a reduction in carbon footprints. These patterns are suitable to be exploited by machine, deep, and reinforced learning techniques in pursuit of energy consumption optimization, which can ideally improve the energy efficiency of data centers and big computing servers providing these kinds of services. For the application of these techniques, it is essential to investigate data collection processes to create initial information points. Datasets also need to be created to analyze how to diagnose systems and sort out new ways of optimization. This work describes a data collection methodology used to create datasets that collect consumption data from a real-world work environment dedicated to data centers, server farms, or similar architectures. Specifically, it covers the entire process of energy stimuli generation, data extraction, and data preprocessing. The evaluation and reproduction of this method is offered to the scientific community through an online repository created for this work, which hosts all the code available for its download.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45666839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An End-to-End Online Traffic-Risk Incident Prediction in First-Person Dash Camera Videos 第一人称行车记录仪视频中的端到端在线交通风险事件预测
IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-07-06 DOI: 10.3390/bdcc7030129
Hilmil Pradana
Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for the driver to react safely and accurately while perceiving the diversity of traffic-risk predictions in real-world applications. In this paper, we aim to bridge the gap by investigating two key research questions regarding the driver’s current status of driving through online videos and the types of other moving objects that lead to dangerous situations. To address these problems, we proposed an end-to-end two-stage architecture: in the first stage, unsupervised learning is applied to collect all suspicious events on actual driving; in the second stage, supervised learning is used to classify all suspicious event results from the first stage to a common event type. To enrich the classification type, the metadata from the result of the first stage is sent to the second stage to handle the data limitation while training our classification model. Through the online situation, our method runs 9.60 fps on average with 1.44 fps on standard deviation. Our quantitative evaluation shows that our method reaches 81.87% and 73.43% for the average F1-score on labeled data of CST-S3D and real driving datasets, respectively. Furthermore, the proposed method has the potential to assist distribution companies in evaluating the driving performance of their driver by automatically monitoring near-miss events and analyzing driving patterns for training programs to reduce future accidents.
以第一人称的方式预测交通风险事件有助于确保在各种驾驶场景和条件下,在事故发生之前做出安全反应。构建高级驾驶辅助系统的一个挑战是创建一个早期预警系统,让驾驶员在感知现实世界应用中各种交通风险预测的同时,安全准确地做出反应。在本文中,我们的目标是通过调查两个关键的研究问题,即驾驶员通过在线视频驾驶的现状和导致危险情况的其他移动物体的类型,来弥合这一差距。为了解决这些问题,我们提出了一个端到端的两阶段架构:在第一阶段,应用无监督学习来收集实际驾驶中的所有可疑事件;在第二阶段,使用监督学习将第一阶段的所有可疑事件结果分类为公共事件类型。为了丰富分类类型,第一阶段结果的元数据被发送到第二阶段,以便在训练分类模型时处理数据限制。通过在线情况,我们的方法平均运行9.60 fps,标准差为1.44 fps。定量评价表明,我们的方法在CST-S3D标记数据和真实驾驶数据集上的平均f1得分分别达到81.87%和73.43%。此外,该方法还可以帮助配送公司评估驾驶员的驾驶表现,通过自动监控未遂事件和分析驾驶模式来进行培训,以减少未来的事故。
{"title":"An End-to-End Online Traffic-Risk Incident Prediction in First-Person Dash Camera Videos","authors":"Hilmil Pradana","doi":"10.3390/bdcc7030129","DOIUrl":"https://doi.org/10.3390/bdcc7030129","url":null,"abstract":"Predicting traffic risk incidents in first-person helps to ensure a safety reaction can occur before the incident happens for a wide range of driving scenarios and conditions. One challenge to building advanced driver assistance systems is to create an early warning system for the driver to react safely and accurately while perceiving the diversity of traffic-risk predictions in real-world applications. In this paper, we aim to bridge the gap by investigating two key research questions regarding the driver’s current status of driving through online videos and the types of other moving objects that lead to dangerous situations. To address these problems, we proposed an end-to-end two-stage architecture: in the first stage, unsupervised learning is applied to collect all suspicious events on actual driving; in the second stage, supervised learning is used to classify all suspicious event results from the first stage to a common event type. To enrich the classification type, the metadata from the result of the first stage is sent to the second stage to handle the data limitation while training our classification model. Through the online situation, our method runs 9.60 fps on average with 1.44 fps on standard deviation. Our quantitative evaluation shows that our method reaches 81.87% and 73.43% for the average F1-score on labeled data of CST-S3D and real driving datasets, respectively. Furthermore, the proposed method has the potential to assist distribution companies in evaluating the driving performance of their driver by automatically monitoring near-miss events and analyzing driving patterns for training programs to reduce future accidents.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44738453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data and Cognitive Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1