2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)最新文献_第2页

Evaluation of Machine Learning to Early Detection of Highly Cited Papers 机器学习对高被引论文早期检测的评价

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00006

G. M. Binmakhashen, Hamdi A. Al-Jamimi

As one of the fastest-growing topics, machine learning has many applications that span through different domains including image and signal recognition, text mining, information retrieval, robotics, etc. It enables information extraction and analysis for better insights and decision-based systems. The Web of Science(WoS) citation database is a leading organization that provides citation data of high-quality published research. WoS has its metrics to label published articles as Highly Cited Paper(HCP). Machine learning (ML) can help researchers in identifying the key characteristics of HCP. Moreover, it can allow research evaluation units forecasting significant scientific articles. In other words, it may allow researchers and/or research evaluators to detect potential scientific breakthrough ideas and stay current. In this study, more than 26 thousand records of published articles indexed by WoS were analyzed. All the records are drawn from the Technology research area as defined by WoS. Four ML algorithms are evaluated to verify the HCP common factors influence in raising citations and interest in scientific articles. The ensemble algorithms show promising results to identify HCP articles using only four factors.

作为发展最快的课题之一，机器学习在图像和信号识别、文本挖掘、信息检索、机器人等不同领域有着广泛的应用。它支持信息提取和分析，以获得更好的见解和基于决策的系统。Web of Science(WoS)引文数据库是提供高质量已发表研究的引文数据的领先组织。WoS有自己的指标来将发表的文章标记为高被引论文(HCP)。机器学习(ML)可以帮助研究人员识别HCP的关键特征。此外，它可以让研究评价单位预测重要的科学文章。换句话说，它可以让研究人员和/或研究评估人员发现潜在的科学突破性想法，并保持最新状态。在这项研究中，我们分析了超过2.6万条由WoS索引的已发表文章记录。所有记录均取自WoS定义的技术研究区域。评估了四种机器学习算法，以验证HCP共同因素对提高科学文章的引用和兴趣的影响。集成算法显示了有希望的结果，识别HCP文章仅使用四个因素。

{"title":"Evaluation of Machine Learning to Early Detection of Highly Cited Papers","authors":"G. M. Binmakhashen, Hamdi A. Al-Jamimi","doi":"10.1109/CDMA54072.2022.00006","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00006","url":null,"abstract":"As one of the fastest-growing topics, machine learning has many applications that span through different domains including image and signal recognition, text mining, information retrieval, robotics, etc. It enables information extraction and analysis for better insights and decision-based systems. The Web of Science(WoS) citation database is a leading organization that provides citation data of high-quality published research. WoS has its metrics to label published articles as Highly Cited Paper(HCP). Machine learning (ML) can help researchers in identifying the key characteristics of HCP. Moreover, it can allow research evaluation units forecasting significant scientific articles. In other words, it may allow researchers and/or research evaluators to detect potential scientific breakthrough ideas and stay current. In this study, more than 26 thousand records of published articles indexed by WoS were analyzed. All the records are drawn from the Technology research area as defined by WoS. Four ML algorithms are evaluated to verify the HCP common factors influence in raising citations and interest in scientific articles. The ensemble algorithms show promising results to identify HCP articles using only four factors.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116951519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Machine Learning Algorithms for Detection of Noisy/Artifact-Corrupted Epochs of Visual Oddball Paradigm ERP Data 视觉古怪范式ERP数据中噪声/伪影损坏时代检测的机器学习算法

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00033

Rafia Akhter, F. Beyette

Electroencephalography (EEG) is a non-invasive monitoring method that tracks and records the neural activities of the brain. The time-locked capture of the EEG to the external stimuli is known as Event-Related Potential (ERP) and it can help elucidate how the brain responds to the stimuli. In general, EEG is an uneven mixture of neural and non-neural sources of activities and these non-neural (non-EEG) signals produce artifacts in the EEG that can decrease the SNR in experiments and may lead to erroneous conclusions about the effects of experimental manipulation. Thus, it is very important to remove artifacts from the recorded EEG prior to analysis. The most common artifacts impacting ERPs are eye-blink, eye-movement, and body-movement. These artifacts-corrupted data can be removed by visual inspection or by computer-automated signal processing methods. While these methods are suitable for post-processing of collected ERP applications, they not well-suited for real-time processing of continuous ERP data. This project seeks to address the challenges associated with real-time identification of artifacts by introducing a machine learning model that can screen ERP, detect and reject artifact-corrupted data epochs prior to signal analysis. In addition to enabling real-time pre-processing of streaming ERP data, the DBScan machine-learning methods explored here can provide up to 90% accuracy in the identification of artifacts-mixed ERP epochs. As a result, the findings of this study will help to improve the signal quality of ERP trials and will enable ERP to be used as a biomarker in real-world applications where streaming EEG data collection and analysis are required.

脑电图(EEG)是一种追踪和记录大脑神经活动的无创监测方法。脑电图对外部刺激的时间锁定捕获被称为事件相关电位(ERP)，它可以帮助阐明大脑如何对刺激作出反应。一般来说，脑电图是神经和非神经活动源的不均匀混合，这些非神经(非脑电图)信号在脑电图中产生伪影，会降低实验中的信噪比，并可能导致对实验操作效果的错误结论。因此，在分析之前从记录的脑电图中去除伪影是非常重要的。影响erp的最常见的人为因素是眨眼、眼球运动和身体运动。这些损坏的数据可以通过目视检查或计算机自动信号处理方法去除。虽然这些方法适合于收集的ERP应用程序的后处理，但它们不太适合于连续ERP数据的实时处理。该项目旨在通过引入机器学习模型来解决与人工制品实时识别相关的挑战，该模型可以在信号分析之前筛选ERP，检测和拒绝人工制品损坏的数据时代。除了能够实时预处理流ERP数据外，本文探讨的DBScan机器学习方法在识别人工混合ERP时代方面可以提供高达90%的准确性。因此，本研究的发现将有助于提高ERP试验的信号质量，并将使ERP在需要流式脑电图数据收集和分析的现实应用中用作生物标志物。

{"title":"Machine Learning Algorithms for Detection of Noisy/Artifact-Corrupted Epochs of Visual Oddball Paradigm ERP Data","authors":"Rafia Akhter, F. Beyette","doi":"10.1109/CDMA54072.2022.00033","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00033","url":null,"abstract":"Electroencephalography (EEG) is a non-invasive monitoring method that tracks and records the neural activities of the brain. The time-locked capture of the EEG to the external stimuli is known as Event-Related Potential (ERP) and it can help elucidate how the brain responds to the stimuli. In general, EEG is an uneven mixture of neural and non-neural sources of activities and these non-neural (non-EEG) signals produce artifacts in the EEG that can decrease the SNR in experiments and may lead to erroneous conclusions about the effects of experimental manipulation. Thus, it is very important to remove artifacts from the recorded EEG prior to analysis. The most common artifacts impacting ERPs are eye-blink, eye-movement, and body-movement. These artifacts-corrupted data can be removed by visual inspection or by computer-automated signal processing methods. While these methods are suitable for post-processing of collected ERP applications, they not well-suited for real-time processing of continuous ERP data. This project seeks to address the challenges associated with real-time identification of artifacts by introducing a machine learning model that can screen ERP, detect and reject artifact-corrupted data epochs prior to signal analysis. In addition to enabling real-time pre-processing of streaming ERP data, the DBScan machine-learning methods explored here can provide up to 90% accuracy in the identification of artifacts-mixed ERP epochs. As a result, the findings of this study will help to improve the signal quality of ERP trials and will enable ERP to be used as a biomarker in real-world applications where streaming EEG data collection and analysis are required.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115475503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Denoising Electromagnatic Surveys Using LSTMs 利用lstm进行电磁测量降噪

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00018

Asma Z. Yamani, Klemens Katterbauer, A. Alshehri, A. Marsala, Rabah A. Al-Zaidy

Resistivity readings obtained from electromagnetic crosswell surveys provide insight for reservoir water saturation prediction. Although high resistivity values should map to low water saturation and vice versa, in many cases the readings may not be consistent with this correlation. This is due to factors that add noise to the resistivity reading, such as the borehole effect and the salinity of the injected water. Here, we attempt to treat the resistivity reading to negatively correlate with water saturation, enhancing the accuracy and interperability of water saturation prediction models. We utilize the resistivity readings from locations further from sources of noise to correct the inconsistencies in the resistivity readings using a Long-Short Term Memory (LSTM) Neural Network approach. Our results demonstrate that by addressing noisy inconsistencies in the data, the performance of the water saturation model increases in terms of R2 from 0.62 to 0.70. Moreover, upon deploying model interpretation method, namely, SHAP TreeExplainer, we show that the resistivity-based features in the water saturation prediction model posses higher importance values than before the enhancement, in comparison with porosity features.

电磁井间测量获得的电阻率读数为储层含水饱和度预测提供了依据。虽然高电阻率值应该映射到低含水饱和度，反之亦然，但在许多情况下，读数可能与这种相关性不一致。这是由于一些因素在电阻率读数中增加了噪声，例如井眼效应和注入水的盐度。在此，我们尝试将电阻率读数与含水饱和度负相关，以提高含水饱和度预测模型的准确性和互操作性。我们利用远离噪声源位置的电阻率读数，使用长短期记忆(LSTM)神经网络方法纠正电阻率读数的不一致性。我们的研究结果表明，通过处理数据中的噪声不一致性，水饱和度模型的性能在R2方面从0.62增加到0.70。此外，通过部署模型解释方法(即SHAP TreeExplainer)，我们发现，与孔隙度特征相比，饱和度预测模型中基于电阻率的特征具有比增强前更高的重要值。

{"title":"Denoising Electromagnatic Surveys Using LSTMs","authors":"Asma Z. Yamani, Klemens Katterbauer, A. Alshehri, A. Marsala, Rabah A. Al-Zaidy","doi":"10.1109/CDMA54072.2022.00018","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00018","url":null,"abstract":"Resistivity readings obtained from electromagnetic crosswell surveys provide insight for reservoir water saturation prediction. Although high resistivity values should map to low water saturation and vice versa, in many cases the readings may not be consistent with this correlation. This is due to factors that add noise to the resistivity reading, such as the borehole effect and the salinity of the injected water. Here, we attempt to treat the resistivity reading to negatively correlate with water saturation, enhancing the accuracy and interperability of water saturation prediction models. We utilize the resistivity readings from locations further from sources of noise to correct the inconsistencies in the resistivity readings using a Long-Short Term Memory (LSTM) Neural Network approach. Our results demonstrate that by addressing noisy inconsistencies in the data, the performance of the water saturation model increases in terms of R2 from 0.62 to 0.70. Moreover, upon deploying model interpretation method, namely, SHAP TreeExplainer, we show that the resistivity-based features in the water saturation prediction model posses higher importance values than before the enhancement, in comparison with porosity features.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"4 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130563832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative analysis of Graph Neural Networks and commonly used machine learning algorithms on fake news detection 图神经网络与常用机器学习算法在假新闻检测中的比较分析

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00021

Fahim Mahmud, Mahi Md. Sadek Rayhan, Mahdi Hasan Shuvo, Islam Sadia, Md. Kishor Morol

Fake news on social media is increasingly regarded as one of the most concerning issues. Low cost, simple accessibility via social platforms, and a plethora of low-budget online news sources are some of the factors that contribute to the spread of false news. Most of the existing fake news detection algorithms are solely focused on the news content only but engaged users' prior posts or social activities provide a wealth of information about their views on news and have significant ability to improve fake news identification. Graph Neural Networks are a form of deep learning approach that conducts prediction on graph-described data. Social media platforms are followed graph structure in their representation, Graph Neural Network are special types of neural networks that could be usually applied to graphs, making it much easier to execute edge, node and graph-level prediction. Therefore, in this paper, we present a comparative analysis among some commonly used machine learning algorithms and Graph Neural Networks for detecting the spread of false news on social media platforms. In this study, we take the UPFD dataset and implement several existing machine learning algorithms on text data only. Besides this, we create different GNN layers for fusing graph-structured news propagation data and the text data as the node feature in our GNN models. GNNs provide the best solutions to the dilemma of identifying false news in our research.

社交媒体上的假新闻越来越被视为最令人担忧的问题之一。低成本，通过社交平台的简单访问，以及大量低成本的在线新闻来源是导致虚假新闻传播的一些因素。现有的假新闻检测算法大多只关注新闻内容，但参与用户之前的帖子或社交活动提供了丰富的新闻观点信息，具有显著的提高假新闻识别能力。图神经网络是一种深度学习方法，用于对图描述的数据进行预测。社交媒体平台在其表示中遵循图结构，图神经网络是一种特殊类型的神经网络，通常可以应用于图，使其更容易执行边缘，节点和图级预测。因此，在本文中，我们对一些常用的机器学习算法和图神经网络进行了比较分析，以检测社交媒体平台上虚假新闻的传播。在本研究中，我们采用UPFD数据集，并仅在文本数据上实现几种现有的机器学习算法。此外，我们创建了不同的GNN层，用于融合图结构新闻传播数据和文本数据作为我们的GNN模型的节点特征。在我们的研究中，gnn为识别假新闻的困境提供了最佳解决方案。

{"title":"A comparative analysis of Graph Neural Networks and commonly used machine learning algorithms on fake news detection","authors":"Fahim Mahmud, Mahi Md. Sadek Rayhan, Mahdi Hasan Shuvo, Islam Sadia, Md. Kishor Morol","doi":"10.1109/CDMA54072.2022.00021","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00021","url":null,"abstract":"Fake news on social media is increasingly regarded as one of the most concerning issues. Low cost, simple accessibility via social platforms, and a plethora of low-budget online news sources are some of the factors that contribute to the spread of false news. Most of the existing fake news detection algorithms are solely focused on the news content only but engaged users' prior posts or social activities provide a wealth of information about their views on news and have significant ability to improve fake news identification. Graph Neural Networks are a form of deep learning approach that conducts prediction on graph-described data. Social media platforms are followed graph structure in their representation, Graph Neural Network are special types of neural networks that could be usually applied to graphs, making it much easier to execute edge, node and graph-level prediction. Therefore, in this paper, we present a comparative analysis among some commonly used machine learning algorithms and Graph Neural Networks for detecting the spread of false news on social media platforms. In this study, we take the UPFD dataset and implement several existing machine learning algorithms on text data only. Besides this, we create different GNN layers for fusing graph-structured news propagation data and the text data as the node feature in our GNN models. GNNs provide the best solutions to the dilemma of identifying false news in our research.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131173958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Legal Judgment Prediction for Canadian Appeal Cases 加拿大上诉案件的判决预测

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00032

Intisar Almuslim, D. Inkpen

Law is one of the knowledge domains that are most reliant on textual material. Nowadays, however, it is very difficult and time-consuming for legal professionals to read, understand, and analyze all the available documents, due to the vast volume of case law that is published every day. In this age of legal big data, and with the increased availability of legal text online, many researchers have given more focus to the development of legal intelligent systems and applications. These intelligent systems can provide great services and solve many problems in legal domain. Over the last years, researchers have focused on predicting judicial case outcomes using Natural Language Processing (NLP) and Machine Learning (ML) methods over case documents. Thus, Legal Judgment Prediction (LJP) is the task of automatically predicting the outcome of a court case given only the text of the case. To the best of our knowledge, no prior research with this intention has been conducted in English for appeal courts in Canada, as of 2021. The NLP application to legal judgments, that our proposed methodology focuses on, is to predict the outcomes of cases by looking only at the description of cases written by the court. Because appeal court decisions are often binary, as in accept or reject, the task is defined as a binary classification problem between’ Allow’ and ‘Dismiss'. This is the general approach in the literature as well. We employ various classification methods including classical classifiers, Deep Learning (DL) models, and compare their performances. Our best results are obtained using DL models with accuracy values reaching 93.46% and F1-scores reaching 0.92, which are on par with the best results in the literature. Through this study, we hope to establish the basis for future research on the legal system of Canada and offer a baseline for future work.

法律是最依赖文本材料的知识领域之一。然而，如今，由于每天都有大量的判例法出版，对于法律专业人士来说，阅读、理解和分析所有可用的文件是非常困难和耗时的。在这个法律大数据时代，随着在线法律文本的增加，许多研究人员更加关注法律智能系统和应用的发展。这些智能系统可以提供大量的服务，解决法律领域的许多问题。在过去的几年里，研究人员一直专注于使用自然语言处理(NLP)和机器学习(ML)方法对案件文件进行预测司法案件的结果。因此，法律判决预测(Legal Judgment Prediction, LJP)的任务是仅根据案件文本自动预测法院案件的结果。据我们所知，截至2021年，还没有针对加拿大上诉法院的英语相关研究。NLP在法律判决中的应用，是我们提出的方法的重点，是通过只看法院写的案件描述来预测案件的结果。由于上诉法院的判决通常是二元的，如接受或拒绝，因此该任务被定义为“允许”和“驳回”之间的二元分类问题。这也是文献中的一般方法。我们采用了各种分类方法，包括经典分类器、深度学习(DL)模型，并比较了它们的性能。我们使用DL模型得到了最好的结果，准确率达到93.46%，f1得分达到0.92，与文献中最好的结果相当。我们希望通过本研究为今后对加拿大法律制度的研究奠定基础，为今后的工作提供一个基线。

{"title":"Legal Judgment Prediction for Canadian Appeal Cases","authors":"Intisar Almuslim, D. Inkpen","doi":"10.1109/CDMA54072.2022.00032","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00032","url":null,"abstract":"Law is one of the knowledge domains that are most reliant on textual material. Nowadays, however, it is very difficult and time-consuming for legal professionals to read, understand, and analyze all the available documents, due to the vast volume of case law that is published every day. In this age of legal big data, and with the increased availability of legal text online, many researchers have given more focus to the development of legal intelligent systems and applications. These intelligent systems can provide great services and solve many problems in legal domain. Over the last years, researchers have focused on predicting judicial case outcomes using Natural Language Processing (NLP) and Machine Learning (ML) methods over case documents. Thus, Legal Judgment Prediction (LJP) is the task of automatically predicting the outcome of a court case given only the text of the case. To the best of our knowledge, no prior research with this intention has been conducted in English for appeal courts in Canada, as of 2021. The NLP application to legal judgments, that our proposed methodology focuses on, is to predict the outcomes of cases by looking only at the description of cases written by the court. Because appeal court decisions are often binary, as in accept or reject, the task is defined as a binary classification problem between’ Allow’ and ‘Dismiss'. This is the general approach in the literature as well. We employ various classification methods including classical classifiers, Deep Learning (DL) models, and compare their performances. Our best results are obtained using DL models with accuracy values reaching 93.46% and F1-scores reaching 0.92, which are on par with the best results in the literature. Through this study, we hope to establish the basis for future research on the legal system of Canada and offer a baseline for future work.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126773688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Investigation of Forecasting Tadawul All Share Index (TASI) Using Machine Learning 利用机器学习预测Tadawul全股指数(TASI)的研究

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00009

G. M. Binmakhashen, A. Bakather, A. Bin-Salem

Stock markets are one of the most complex, and dynamic environments. To make predictions about the stock prices, we may require combining several sources of market information. Another possibility is to attempt to monitor and predict the stock index prices of a target market. In this study, we investigated several machine learning algorithms to predict the Saudi stock price index by utilizing Bloomberg's most used indicators. The collected data represents 26 years of Tadawul All Share Index(TASI) index prices. Several machine learning algorithms were investigated for forecasting midterm TASI index pricing. Two Recurrent Neural Network (RNN) architectures (deeper, and shallower architectures) were created, trained, tested, and their performances in forecasting TASI index prices are contrasted. Furthermore, several traditional machine learning methods such as Linear regression, decision trees, and random forests are also studied for index price prediction. The experiments suggested that with 26 years of TASI index transactions, simple machine learning(ML) models are generally suitable to make better midterm index price forecasting in comparison to more complex ML models.

股票市场是最复杂、最动态的环境之一。为了预测股票价格，我们可能需要结合几种市场信息来源。另一种可能性是试图监控和预测目标市场的股票指数价格。在这项研究中，我们研究了几种机器学习算法，利用彭博最常用的指标来预测沙特股票价格指数。所收集的数据代表了26年来Tadawul所有股票指数(TASI)指数的价格。研究了几种机器学习算法用于预测中期TASI指数定价。两种循环神经网络(RNN)架构(深层和浅层架构)被创建、训练和测试，并对比了它们在预测TASI指数价格方面的表现。此外，本文还研究了几种传统的机器学习方法，如线性回归、决策树和随机森林等，用于指数价格预测。实验表明，通过26年的TASI指数交易，与更复杂的机器学习模型相比，简单的机器学习(ML)模型通常适合于更好的中期指数价格预测。

{"title":"An Investigation of Forecasting Tadawul All Share Index (TASI) Using Machine Learning","authors":"G. M. Binmakhashen, A. Bakather, A. Bin-Salem","doi":"10.1109/CDMA54072.2022.00009","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00009","url":null,"abstract":"Stock markets are one of the most complex, and dynamic environments. To make predictions about the stock prices, we may require combining several sources of market information. Another possibility is to attempt to monitor and predict the stock index prices of a target market. In this study, we investigated several machine learning algorithms to predict the Saudi stock price index by utilizing Bloomberg's most used indicators. The collected data represents 26 years of Tadawul All Share Index(TASI) index prices. Several machine learning algorithms were investigated for forecasting midterm TASI index pricing. Two Recurrent Neural Network (RNN) architectures (deeper, and shallower architectures) were created, trained, tested, and their performances in forecasting TASI index prices are contrasted. Furthermore, several traditional machine learning methods such as Linear regression, decision trees, and random forests are also studied for index price prediction. The experiments suggested that with 26 years of TASI index transactions, simple machine learning(ML) models are generally suitable to make better midterm index price forecasting in comparison to more complex ML models.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"11 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124184796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Face-Swap-Verification Using PRNU 使用PRNU的高效人脸交换验证

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00012

Ali Hassani, H. Malik

Facial recognition is becoming the go-to method of identifying users for convenience applications. While great advances have occurred in achieving strong false acceptance and false rejection rates on authentic images, these systems can be vulnerable to face-swap-attacks. This research addresses face-swap-attacks via camera forensics. Whenever an image is modified, there is necessarily an impact to the noise profile (in this case Photo Response Non-Uniformity). Hence, a framework is proposed to enroll the facial recognition camera's “noiseprint” and assess authenticity on future images based on deviation from expected value. This is done using down-sampling compression to improve run time, where images are further segmented into sub-zones to retain local sensitivity. Framework performance is evalu-ated by recording identical facial-images using multiple cameras of the same make. Next, a subset is modified via hand-crafted and AI-tool face-swaps. 100% of images are correctly identified as authentic or tampering when using full-image analysis at full-scale. Efficiency is then optimized by dividing the image into sub-zones and applying compression. Run-time is improved to 4.6 msec on CPU, a 99.1% reduction, by applying quarter-scale down-sampling with 16 sub-zones (this retains 93.5% verification accuracy). These results are validated against three existing state-of-the-art algorithms, which in comparison show over-fitting when compressed. This demonstrates that compressed PRNU can be used to efficiently verify facial-images, including against AI facial manipulation tools.

面部识别正在成为方便应用程序识别用户的首选方法。虽然在真实图像的高错误接受率和错误拒绝率方面取得了巨大进展，但这些系统可能容易受到人脸交换攻击。这项研究通过摄像头取证解决了人脸交换攻击。每当图像被修改时，必然会对噪声轮廓产生影响(在这种情况下，照片响应不均匀性)。因此，提出了一种框架来登记面部识别相机的“噪声指纹”，并基于期望值的偏差评估未来图像的真实性。这是通过下采样压缩来改善运行时间的，其中图像被进一步分割成子区域以保持局部灵敏度。框架的性能是通过使用同一品牌的多个摄像头记录相同的面部图像来评估的。接下来，通过手工制作和人工智能工具的面部交换来修改子集。当使用全尺寸图像分析时，100%的图像被正确识别为真实或篡改。然后通过将图像划分为子区域并应用压缩来优化效率。通过对16个子区域应用四分之一比例的降采样(这保留了93.5%的验证精度)，CPU上的运行时间提高到4.6 msec，减少了99.1%。这些结果是针对现有的三种最先进的算法进行验证的，相比之下，这些算法在压缩时显示出过拟合。这表明压缩的PRNU可以用于有效地验证面部图像，包括对抗人工智能面部操作工具。

{"title":"Efficient Face-Swap-Verification Using PRNU","authors":"Ali Hassani, H. Malik","doi":"10.1109/CDMA54072.2022.00012","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00012","url":null,"abstract":"Facial recognition is becoming the go-to method of identifying users for convenience applications. While great advances have occurred in achieving strong false acceptance and false rejection rates on authentic images, these systems can be vulnerable to face-swap-attacks. This research addresses face-swap-attacks via camera forensics. Whenever an image is modified, there is necessarily an impact to the noise profile (in this case Photo Response Non-Uniformity). Hence, a framework is proposed to enroll the facial recognition camera's “noiseprint” and assess authenticity on future images based on deviation from expected value. This is done using down-sampling compression to improve run time, where images are further segmented into sub-zones to retain local sensitivity. Framework performance is evalu-ated by recording identical facial-images using multiple cameras of the same make. Next, a subset is modified via hand-crafted and AI-tool face-swaps. 100% of images are correctly identified as authentic or tampering when using full-image analysis at full-scale. Efficiency is then optimized by dividing the image into sub-zones and applying compression. Run-time is improved to 4.6 msec on CPU, a 99.1% reduction, by applying quarter-scale down-sampling with 16 sub-zones (this retains 93.5% verification accuracy). These results are validated against three existing state-of-the-art algorithms, which in comparison show over-fitting when compressed. This demonstrates that compressed PRNU can be used to efficiently verify facial-images, including against AI facial manipulation tools.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122161412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Deep Learning Framework for Temperature Forecasting 温度预测的深度学习框架

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00016

Patil Malini, B. Qureshi

Among many global warming issues, the increase in global temperatures causing summer heatwaves have triggered heat-strokes leading to untimely deaths of thousands of people. Heatwaves are meteorological events with prolonged periods of excessive heat. Machine learning algorithms such as Auto-Regressive Integrated Moving Average (ARIMA) and Ensemble-learning and Long Short-term Memory Network (LSTM) have recently been used to forecast weather conditions. Optimizing the hyperparameters for accurate temperature forecasting is challenging. This paper presents Cauchy Particle-swarm optimization (CPSO) technique for finding the hyperparameters of the LSTM. The proposed technique minimizes the validation mean square error rate (MSER) to improve accuracy. We test the proposed technique on 30-year Riyadh city temperature datasets. In our experimental evaluation, the proposed CPSO-LSTM outperforms LSTM and Grid-search LSTM by 50% and 55% respectively.

在众多全球变暖问题中，全球气温上升引发的夏季热浪引发了中暑，导致数千人过早死亡。热浪是指长时间过热的气象事件。机器学习算法，如自回归综合移动平均(ARIMA)和集成学习和长短期记忆网络(LSTM)最近被用于预测天气状况。优化超参数以实现准确的温度预报是一项具有挑战性的工作。本文提出了求解LSTM超参数的柯西粒子群算法(CPSO)。该方法最大限度地降低了验证均方错误率(MSER)，提高了验证的准确性。我们在30年的利雅得城市温度数据集上测试了所提出的技术。在我们的实验评估中，所提出的CPSO-LSTM分别比LSTM和网格搜索LSTM高出50%和55%。

引用次数: 1

On the Capabilities of Quantum Machine Learning 量子机器学习的能力

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00035

Sarah Alghamdi, Sultan Almuhammadi

Machine learning techniques give impressive results in many areas. However, due to the physical limitation of integrated circuits which restricts their computational power growth, and the rapid advances in quantum computing, lots of research studies on quantum machine learning (QML) have been done recently. QML is a technique that uses quantum algorithms as parts of the implementation. Quantum algorithms use quantum mechanics and have the potential to outperform classical algorithms for a given problem. In this paper, three widely used machine learning algorithms are discussed and their quantum versions are presented, namely: quantum neural network, quantum autoencoder, and quantum kernel method. In addition, we discuss the potential capabilities of these QML algorithms and review recent work employing them. Moreover, a quantum neural network prototype is implemented using Qiskit as a proof of concept and tested on a real quantum computer. Empirical results show that quantum neural networks can be trained efficiently.

机器学习技术在许多领域取得了令人印象深刻的成果。然而，由于集成电路的物理限制限制了其计算能力的增长，以及量子计算的快速发展，近年来人们对量子机器学习(QML)进行了大量的研究。QML是一种使用量子算法作为实现部分的技术。量子算法使用量子力学，并且在给定问题上具有超越经典算法的潜力。本文讨论了三种广泛使用的机器学习算法，并给出了它们的量子版本，即量子神经网络、量子自编码器和量子核方法。此外，我们还讨论了这些QML算法的潜在功能，并回顾了最近使用它们的工作。此外，使用Qiskit实现了量子神经网络原型作为概念验证，并在真实的量子计算机上进行了测试。实验结果表明，量子神经网络可以有效地训练。

引用次数: 0

A Robot-based Arabic Sign Language Translating System 基于机器人的阿拉伯手语翻译系统

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00030

Dina A. Alabbad, Nouha O. Alsaleh, Naimah A. Alaqeel, Yara A. Alshehri, Nashwa A. Alzahrani, Maha K. Alhobaishi

Services provided to deaf people in the Eastern province of Saudi Arabia were evaluated, which confirmed a high need to support the deaf community. This paper proposes utilizing the Pepper robot in the task of recognizing and translating Arabic sign language (ArSL), by which the robot recognizes static hand gestures of the letters in ArSL from each keyframe extracted from the input video and translate it into written text and vice versa. This project aims to conduct a two-way translation of the Arabic sign language in a way that fulfills the communication gap found in Saudi Arabia among deaf and non-deaf people. The methods proposed in this paper are computer vision to use the pepper robot's camera and sensors, Natural language processing to convert natural speech to sign language and Deep learning to build a convolutional neural network model that classifies the sign language gestures and convert them into their corresponding written and spoken form. Moreover, two datasets were used, first one is a collection of hand gestures for training the model and the other one is 39 animated signs of all the Arabic letters and special letters.

对沙特阿拉伯东部省向聋人提供的服务进行了评估，证实了对聋人社区的高度支持需求。本文提出将Pepper机器人用于识别和翻译阿拉伯手语(ArSL)任务，机器人从输入视频中提取的每个关键帧中识别ArSL中字母的静态手势，并将其翻译成书面文本，反之亦然。本项目旨在对阿拉伯手语进行双向翻译，以填补沙特阿拉伯聋哑人与非聋哑人之间的沟通差距。本文提出的方法是计算机视觉，利用辣椒机器人的摄像头和传感器;自然语言处理，将自然语音转换为手语;深度学习，建立卷积神经网络模型，对手语手势进行分类，并将其转换为相应的书面和口头形式。此外，我们使用了两个数据集，第一个数据集是用于训练模型的手势集合，另一个数据集是所有阿拉伯字母和特殊字母的39个动画符号。

{"title":"A Robot-based Arabic Sign Language Translating System","authors":"Dina A. Alabbad, Nouha O. Alsaleh, Naimah A. Alaqeel, Yara A. Alshehri, Nashwa A. Alzahrani, Maha K. Alhobaishi","doi":"10.1109/CDMA54072.2022.00030","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00030","url":null,"abstract":"Services provided to deaf people in the Eastern province of Saudi Arabia were evaluated, which confirmed a high need to support the deaf community. This paper proposes utilizing the Pepper robot in the task of recognizing and translating Arabic sign language (ArSL), by which the robot recognizes static hand gestures of the letters in ArSL from each keyframe extracted from the input video and translate it into written text and vice versa. This project aims to conduct a two-way translation of the Arabic sign language in a way that fulfills the communication gap found in Saudi Arabia among deaf and non-deaf people. The methods proposed in this paper are computer vision to use the pepper robot's camera and sensors, Natural language processing to convert natural speech to sign language and Deep learning to build a convolutional neural network model that classifies the sign language gestures and convert them into their corresponding written and spoken form. Moreover, two datasets were used, first one is a collection of hand gestures for training the model and the other one is 39 animated signs of all the Arabic letters and special letters.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127570816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3