Acquiring continuous spatial data, e.g., spatial ground motion, is essential to assess the damaged area and appropriately assign rescue and medical teams. Therefore, spatial interpolation methods have been developed to estimate the value of unobserved points linearly from neighbor observed values, i.e., inverse distance weighting and Kriging. Meanwhile, realistic spatial continuous environmental data with various scenarios can be generated by 3-D finite difference methods using a high-resolution structure model. These enable to collect supervised data even for unobserved points. Therefore, this paper proposes a framework of supervised spatial interpolation and applies highly advanced deep inpainting methods, where spatially distributed observed points are treated as masked images and non-linearly expanded through convolutional encoder–decoder networks. However, the property of translation invariance would avoid locally fine-grained interpolation because the relation between the target and surrounding observation points varies among regions owing to their topography and subsurface structure. To overcome this issue, this paper proposes introducing position-dependent partial convolution, where kernel weights are adjusted depending on their position on an image based on the trainable position-feature map. The experimental results show the effectiveness of the proposed method, called Position-dependent Deep Inpainting Method, using toy and ground-motion data.
{"title":"Position-dependent partial convolutions for supervised spatial interpolation","authors":"Hirotaka Hachiya , Kotaro Nagayoshi , Asako Iwaki , Takahiro Maeda , Naonori Ueda , Hiroyuki Fujiwara","doi":"10.1016/j.mlwa.2023.100514","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100514","url":null,"abstract":"<div><p>Acquiring continuous spatial data, e.g., spatial ground motion, is essential to assess the damaged area and appropriately assign rescue and medical teams. Therefore, spatial interpolation methods have been developed to estimate the value of unobserved points linearly from neighbor observed values, i.e., inverse distance weighting and Kriging. Meanwhile, realistic spatial continuous environmental data with various scenarios can be generated by 3-D finite difference methods using a high-resolution structure model. These enable to collect supervised data even for unobserved points. Therefore, this paper proposes a framework of supervised spatial interpolation and applies highly advanced deep inpainting methods, where spatially distributed observed points are treated as masked images and non-linearly expanded through convolutional encoder–decoder networks. However, the property of translation invariance would avoid locally fine-grained interpolation because the relation between the target and surrounding observation points varies among regions owing to their topography and subsurface structure. To overcome this issue, this paper proposes introducing position-dependent partial convolution, where kernel weights are adjusted depending on their position on an image based on the trainable position-feature map. The experimental results show the effectiveness of the proposed method, called Position-dependent Deep Inpainting Method, using toy and ground-motion data.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100514"},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000671/pdfft?md5=5d684f97a44cd5785cd259835ac21e2a&pid=1-s2.0-S2666827023000671-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138356239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During an Ecological Momentary Assessment (EMA) study, through repeated digital questionnaires, we have the opportunity to collect multiple multivariate time-series (MTS) data for all participants. Although, it is common that individual data is analyzed per participant, the richness of such dataset poses the question of whether meaningful groups of individuals could be uncovered to better understand the underlying processes on an individual and a group level. Such grouping could be obtained by clustering. Therefore, this paper examines the performance of various clustering approaches for grouping individuals based on the similarity of their raw time-series data patterns. Clustering is an unsupervised task, where the true underlying groups are not usually available, making the result difficult to evaluate. Therefore, in the current paper, simulated irregular time-series data, resembling EMA, are used to validate the performance of several methods under different clustering-related choices, such as the distance metric. Data are generated with a varying number of clusters, total number of individuals and time-points as well as number of variables and proportions of noisy variables, while their time-series represent well-shaped patterns, typically observed in emotional behavior. After applying clustering to all simulated datasets, clustering performance was first assessed by comparing the true and predicted labels, while the impact of the different datasets’ parameters was also examined. Because ground truth labels are not always available, or do not even exist, in real-world scenarios, clustering evaluation through distance-based and distance-free measures was further investigated. Overall, all clustering methods (e.g. k-means, Hierarchical clustering, Fuzzy k-medoids) proved reliable in different configurations, revealing the true number of clusters. Moreover, kernel-based methods appeared more efficient when highly noisy variables are involved, becoming more promising for real-world data. As a second part, an illustration of two specific simulated scenarios (datasets) is provided, showing, in more detail, all different analysis steps before drawing a conclusion about the choice of the optimal number of clusters.
{"title":"Evaluating multivariate time-series clustering using simulated ecological momentary assessment data","authors":"Mandani Ntekouli , Gerasimos Spanakis , Lourens Waldorp , Anne Roefs","doi":"10.1016/j.mlwa.2023.100512","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100512","url":null,"abstract":"<div><p>During an Ecological Momentary Assessment (EMA) study, through repeated digital questionnaires, we have the opportunity to collect multiple multivariate time-series (MTS) data for all participants. Although, it is common that individual data is analyzed per participant, the richness of such dataset poses the question of whether meaningful groups of individuals could be uncovered to better understand the underlying processes on an individual and a group level. Such grouping could be obtained by clustering. Therefore, this paper examines the performance of various clustering approaches for grouping individuals based on the similarity of their raw time-series data patterns. Clustering is an unsupervised task, where the true underlying groups are not usually available, making the result difficult to evaluate. Therefore, in the current paper, simulated irregular time-series data, resembling EMA, are used to validate the performance of several methods under different clustering-related choices, such as the distance metric. Data are generated with a varying number of clusters, total number of individuals and time-points as well as number of variables and proportions of noisy variables, while their time-series represent well-shaped patterns, typically observed in emotional behavior. After applying clustering to all simulated datasets, clustering performance was first assessed by comparing the true and predicted labels, while the impact of the different datasets’ parameters was also examined. Because ground truth labels are not always available, or do not even exist, in real-world scenarios, clustering evaluation through distance-based and distance-free measures was further investigated. Overall, all clustering methods (e.g. k-means, Hierarchical clustering, Fuzzy k-medoids) proved reliable in different configurations, revealing the true number of clusters. Moreover, kernel-based methods appeared more efficient when highly noisy variables are involved, becoming more promising for real-world data. As a second part, an illustration of two specific simulated scenarios (datasets) is provided, showing, in more detail, all different analysis steps before drawing a conclusion about the choice of the optimal number of clusters.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100512"},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000658/pdfft?md5=1ec5ee06e2dff3ae3806641723ab9f42&pid=1-s2.0-S2666827023000658-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138413642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-19DOI: 10.1016/j.mlwa.2023.100511
Rweyemamu Ignatius Barongo , Jimmy Tibangayuka Mbelwa
The accurate classification of banks’ Liquidity Risk (LR) for regulatory supervision is hindered by limitations in the measures, such as Minimum Liquid Assets (MLA), Net-Stable Funding Ratio (NSFR), and Liquidity Coverage Ratio (LCR). This study addressed two limitations on data integrity vulnerabilities and the narrow composition of LR factors excluding practical LR determinants such as credit portfolio quality, market conditions, strategies of assets and funding. Theoretical gaps included the eight new LR factors in this study, benchmarking study results with measures to interpret the studies’ contributions and the selection of suitable prediction methods for non-linear, imbalanced, scaling, and near real-time data. We used data from 38 Tanzanian banks (2010-2021) from the Bank of Tanzania (BOT). Extensive factors experimentation using Random Forest (RF) and Multi-Layer Perceptron (MLP) models identified ten features for Machine Learning (ML) analysis and LR rating as output. A hybrid RF-MLP model with a 199-tree RF and 10-512-250-120-80-60-6 MLP was developed. It increased LR sensitivity and reduced RF and MLP model limitations through generalisation, and demonstrated statistical and practical performance. It minimised classification errors with Type I and II errors, and Negative Likelihood of 0.8%, 9.1%, and 1%; Discriminant Power of 2.61; and 90% to 96% Accuracy, Balanced Accuracy, Precision, Recall, F1 Score, G-mean, Cohen’s Kappa, Youden Index, and Area Under the Curve. Past LR scenarios confirmed RF-MLP performance improvement over MLA. The unavailability of LCR and NSFR data hindered a comprehensive evaluation. This study extended LR factors and proposed a model to complement LR classification.
{"title":"Using machine learning for detecting liquidity risk in banks","authors":"Rweyemamu Ignatius Barongo , Jimmy Tibangayuka Mbelwa","doi":"10.1016/j.mlwa.2023.100511","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100511","url":null,"abstract":"<div><p>The accurate classification of banks’ Liquidity Risk (LR) for regulatory supervision is hindered by limitations in the measures, such as Minimum Liquid Assets (MLA), Net-Stable Funding Ratio (NSFR), and Liquidity Coverage Ratio (LCR). This study addressed two limitations on data integrity vulnerabilities and the narrow composition of LR factors excluding practical LR determinants such as credit portfolio quality, market conditions, strategies of assets and funding. Theoretical gaps included the eight new LR factors in this study, benchmarking study results with measures to interpret the studies’ contributions and the selection of suitable prediction methods for non-linear, imbalanced, scaling, and near real-time data. We used data from 38 Tanzanian banks (2010-2021) from the Bank of Tanzania (BOT). Extensive factors experimentation using Random Forest (RF) and Multi-Layer Perceptron (MLP) models identified ten features for Machine Learning (ML) analysis and LR rating as output. A hybrid RF-MLP model with a 199-tree RF and 10-512-250-120-80-60-6 MLP was developed. It increased LR sensitivity and reduced RF and MLP model limitations through generalisation, and demonstrated statistical and practical performance. It minimised classification errors with Type I and II errors, and Negative Likelihood of 0.8%, 9.1%, and 1%; Discriminant Power of 2.61; and 90% to 96% Accuracy, Balanced Accuracy, Precision, Recall, F1 Score, G-mean, Cohen’s Kappa, Youden Index, and Area Under the Curve. Past LR scenarios confirmed RF-MLP performance improvement over MLA. The unavailability of LCR and NSFR data hindered a comprehensive evaluation. This study extended LR factors and proposed a model to complement LR classification.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"15 ","pages":"Article 100511"},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000646/pdfft?md5=1a2b1e48bca56948123e7558d5a1060e&pid=1-s2.0-S2666827023000646-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138466740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The adoption of Advanced Driver Assistance Systems (ADAS) has expanded dramatically in recent years, with the goal of improving road safety and driving comfort. Driver monitoring is important to ADAS since it identifies abnormalities such as sleepiness, distraction, and impairment to guarantee safe vehicle operation. Traditional methods of detecting driver anomalies rely on intrusive physiological measures, while ADAS with built-in cameras offers a non-intrusive and cost-effective option. This study investigates the application of ensemble model learning for driver anomaly detection in automobiles employing ADAS and in-vehicle cameras. Deep learning models such as ResNet50, DenseNet201, and Inception V3 were deployed as learner models to classify driving behavior. The raw dataset used in this study was in the form of videos obtained from the National Tsinghua Driver Drowsiness Detection (NTHUDD) dataset. Amongst the two ensemble models used, the eXtreme Gradient Boost (XGBoost) classifier pooled predictions from the learner models. It attained a remarkable average accuracy and precision of on the validation dataset. Classes such as laughtalk and yawning were properly and separately distinguished. The ensemble technique capitalized on the strengths of various models while mitigating their weaknesses, resulting in robust and trustworthy forecasts. The findings highlight the potential of ensemble modeling to enhance driver anomaly detection systems, providing valuable insights for improving road safety. By continually monitoring driver behavior and detecting abnormalities, ADAS can provide timely warnings and interventions to prevent accidents and save human lives.
{"title":"Improving road safety with ensemble learning: Detecting driver anomalies using vehicle inbuilt cameras","authors":"Tumlumbe Juliana Chengula , Judith Mwakalonge , Gurcan Comert , Saidi Siuhi","doi":"10.1016/j.mlwa.2023.100510","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100510","url":null,"abstract":"<div><p>The adoption of Advanced Driver Assistance Systems (ADAS) has expanded dramatically in recent years, with the goal of improving road safety and driving comfort. Driver monitoring is important to ADAS since it identifies abnormalities such as sleepiness, distraction, and impairment to guarantee safe vehicle operation. Traditional methods of detecting driver anomalies rely on intrusive physiological measures, while ADAS with built-in cameras offers a non-intrusive and cost-effective option. This study investigates the application of ensemble model learning for driver anomaly detection in automobiles employing ADAS and in-vehicle cameras. Deep learning models such as ResNet50, DenseNet201, and Inception V3 were deployed as learner models to classify driving behavior. The raw dataset used in this study was in the form of videos obtained from the National Tsinghua Driver Drowsiness Detection (NTHUDD) dataset. Amongst the two ensemble models used, the eXtreme Gradient Boost (XGBoost) classifier pooled predictions from the learner models. It attained a remarkable average accuracy and precision of <span><math><mrow><mn>99</mn><mo>%</mo></mrow></math></span> on the validation dataset. Classes such as laugh<span><math><mo>_</mo></math></span>talk and yawning were properly and separately distinguished. The ensemble technique capitalized on the strengths of various models while mitigating their weaknesses, resulting in robust and trustworthy forecasts. The findings highlight the potential of ensemble modeling to enhance driver anomaly detection systems, providing valuable insights for improving road safety. By continually monitoring driver behavior and detecting abnormalities, ADAS can provide timely warnings and interventions to prevent accidents and save human lives.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100510"},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000634/pdfft?md5=121ac73f5fe59607420bc305729c0111&pid=1-s2.0-S2666827023000634-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138396848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-07DOI: 10.1016/j.mlwa.2023.100509
Altan Cakir , Mert Gurkan
This work addresses an alternative approach for query expansion (QE) using a generative adversarial network (GAN) to enhance the effectiveness of information search in e-commerce. We propose a modified QE conditional GAN (mQE-CGAN) framework, which resolves keywords by expanding the query with a synthetically generated query that proposes semantic information from text input. we train a sequence-to-sequence transformer model as the generator to produce keywords and use a recurrent neural network model as the discriminator to classify an adversarial output with the generator. with the modified CGAN framework, Various forms of semantic insights gathered from the query-document corpus are introduced to the generation process. We leverage these insights as conditions for the generator model and discuss their effectiveness for the query expansion task. our experiments demonstrate that the utilization of condition structures within the mQE-CGAN framework can increase the semantic similarity between generated sequences and reference documents up to nearly 10% compared to baseline models.
{"title":"Modified query expansion through generative adversarial networks for information extraction in e-commerce","authors":"Altan Cakir , Mert Gurkan","doi":"10.1016/j.mlwa.2023.100509","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100509","url":null,"abstract":"<div><p>This work addresses an alternative approach for query expansion (QE) using a generative adversarial network (GAN) to enhance the effectiveness of information search in e-commerce. We propose a modified QE conditional GAN (<em>m</em>QE-CGAN) framework, which resolves keywords by expanding the query with a synthetically generated query that proposes semantic information from text input. we train a sequence-to-sequence transformer model as the generator to produce keywords and use a recurrent neural network model as the discriminator to classify an adversarial output with the generator. with the <em>modified</em> CGAN framework, Various forms of semantic insights gathered from the query-document corpus are introduced to the generation process. We leverage these insights as conditions for the generator model and discuss their effectiveness for the query expansion task. our experiments demonstrate that the utilization of condition structures within the <em>m</em>QE-CGAN framework can increase the semantic similarity between generated sequences and reference documents up to nearly 10% compared to baseline models.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100509"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000622/pdfft?md5=aae23ad5c735e599f23039060a8ca4d4&pid=1-s2.0-S2666827023000622-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91641419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Financial sentiment analysis plays a crucial role in decoding market trends and guiding strategic trading decisions. Despite the deployment of advanced deep learning techniques and language models to refine sentiment analysis in finance, this study breaks new ground by investigating the potential of large language models, particularly ChatGPT 3.5, in financial sentiment analysis, with a strong emphasis on the foreign exchange market (forex). Employing a zero-shot prompting approach, we examine multiple ChatGPT prompts on a meticulously curated dataset of forex-related news headlines, measuring performance using metrics such as precision, recall, f1-score, and Mean Absolute Error (MAE) of the sentiment class. Additionally, we probe the correlation between predicted sentiment and market returns as an addition evaluation approach. ChatGPT, compared to FinBERT, a well-established sentiment analysis model for financial texts, exhibited approximately 35% enhanced performance in sentiment classification and a 36% higher correlation with market returns. By underlining the significance of prompt engineering, particularly in zero-shot contexts, this study spotlights ChatGPT’s potential to substantially boost sentiment analysis in financial applications. By sharing the utilized dataset, our intention is to stimulate further research and advancements in the field of financial services.
{"title":"Transforming sentiment analysis in the financial domain with ChatGPT","authors":"Georgios Fatouros , John Soldatos , Kalliopi Kouroumali , Georgios Makridis , Dimosthenis Kyriazis","doi":"10.1016/j.mlwa.2023.100508","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100508","url":null,"abstract":"<div><p>Financial sentiment analysis plays a crucial role in decoding market trends and guiding strategic trading decisions. Despite the deployment of advanced deep learning techniques and language models to refine sentiment analysis in finance, this study breaks new ground by investigating the potential of large language models, particularly ChatGPT 3.5, in financial sentiment analysis, with a strong emphasis on the foreign exchange market (forex). Employing a zero-shot prompting approach, we examine multiple ChatGPT prompts on a meticulously curated dataset of forex-related news headlines, measuring performance using metrics such as precision, recall, f1-score, and Mean Absolute Error (MAE) of the sentiment class. Additionally, we probe the correlation between predicted sentiment and market returns as an addition evaluation approach. ChatGPT, compared to FinBERT, a well-established sentiment analysis model for financial texts, exhibited approximately 35% enhanced performance in sentiment classification and a 36% higher correlation with market returns. By underlining the significance of prompt engineering, particularly in zero-shot contexts, this study spotlights ChatGPT’s potential to substantially boost sentiment analysis in financial applications. By sharing the utilized dataset, our intention is to stimulate further research and advancements in the field of financial services.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100508"},"PeriodicalIF":0.0,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000610/pdfft?md5=b56ed0c4ed95fd46eff9618288753304&pid=1-s2.0-S2666827023000610-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-04DOI: 10.1016/j.mlwa.2023.100507
Gurinder Kaur, Fei Liu, Yi-Ping Phoebe Chen
Knowledge graphs are becoming the new state-of-the-art for recommender systems. This paper is based on knowledge graphs to alleviate the problem of data sparsity. Various methods have been recently deployed to solve this problem which largely attempts to study user-item representation and then recommend items to users based on these representations. Although these methods are effective, they lack explainability for recommendations and do not mine side information. In this paper, we propose the use of knowledge graphs which includes additional information about users and items in addition to the use of a user/item interaction matrix. The vital element of our model is neighbourhood aggregation for collaborative filtering. Every user and item are associated with an ID embedding, which is circulated on the interaction graph for users, items, and their attributes. We obtain the final embeddings by combining the embeddings learned at various hidden layers with a biased sum. Our model is easier to train and achieves better performance compared to graph neural network-based collaborative filtering (GCF) and other state-of-the-art recommender methods. We provide evidence for our argument by analytically comparing the knowledge graph convolution network (KGCN) with GCF and eight other state-of-the-art methods, using similar experimental settings and the same datasets.
{"title":"A deep learning knowledge graph neural network for recommender systems","authors":"Gurinder Kaur, Fei Liu, Yi-Ping Phoebe Chen","doi":"10.1016/j.mlwa.2023.100507","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100507","url":null,"abstract":"<div><p>Knowledge graphs are becoming the new state-of-the-art for recommender systems. This paper is based on knowledge graphs to alleviate the problem of data sparsity. Various methods have been recently deployed to solve this problem which largely attempts to study user-item representation and then recommend items to users based on these representations. Although these methods are effective, they lack explainability for recommendations and do not mine side information. In this paper, we propose the use of knowledge graphs which includes additional information about users and items in addition to the use of a user/item interaction matrix. The vital element of our model is neighbourhood aggregation for collaborative filtering. Every user and item are associated with an ID embedding, which is circulated on the interaction graph for users, items, and their attributes. We obtain the final embeddings by combining the embeddings learned at various hidden layers with a biased sum. Our model is easier to train and achieves better performance compared to graph neural network-based collaborative filtering (GCF) and other state-of-the-art recommender methods. We provide evidence for our argument by analytically comparing the knowledge graph convolution network (KGCN) with GCF and eight other state-of-the-art methods, using similar experimental settings and the same datasets.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100507"},"PeriodicalIF":0.0,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000609/pdfft?md5=a8708a4e43a99a7c87b3f5bcb9e4d108&pid=1-s2.0-S2666827023000609-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91641420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-29DOI: 10.1016/j.mlwa.2023.100506
Mingyu Zong, Bhaskar Krishnamachari
Researchers have been interested in developing AI tools to help students learn various mathematical subjects. One challenging set of tasks for school students is learning to solve math word problems. We explore how recent advances in natural language processing, specifically the rise of powerful transformer based models, can be applied to help math learners with such problems. Concretely, we evaluate the use of GPT-3, GPT-3.5, and GPT-4, all transformer models with billions of parameters recently released by OpenAI, for three related challenges pertaining to math word problems corresponding to systems of two linear equations. The three challenges are classifying word problems, extracting equations from word problems, and generating word problems. For the first challenge, we define a set of problem classes and find that GPT models generally result in classifying word problems with an overall accuracy around 70%. There is one class that all models struggle about, namely the “item and property” class, which significantly lowered the value. For the second challenge, our findings align with researchers’ expectation: newer models are better at extracting equations from word problems. The highest accuracy we get from fine-tuning GPT-3 with 1000 examples (78%) is surpassed by GPT-4 given only 20 examples (79%). For the third challenge, we again find that GPT-4 outperforms the other two models. It is able to generate problems with accuracy ranging from 76.7% to 100%, depending on the problem type.
{"title":"Solving math word problems concerning systems of equations with GPT models","authors":"Mingyu Zong, Bhaskar Krishnamachari","doi":"10.1016/j.mlwa.2023.100506","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100506","url":null,"abstract":"<div><p>Researchers have been interested in developing AI tools to help students learn various mathematical subjects. One challenging set of tasks for school students is learning to solve math word problems. We explore how recent advances in natural language processing, specifically the rise of powerful transformer based models, can be applied to help math learners with such problems. Concretely, we evaluate the use of GPT-3, GPT-3.5, and GPT-4, all transformer models with billions of parameters recently released by OpenAI, for three related challenges pertaining to math word problems corresponding to systems of two linear equations. The three challenges are classifying word problems, extracting equations from word problems, and generating word problems. For the first challenge, we define a set of problem classes and find that GPT models generally result in classifying word problems with an overall accuracy around 70%. There is one class that all models struggle about, namely the “item and property” class, which significantly lowered the value. For the second challenge, our findings align with researchers’ expectation: newer models are better at extracting equations from word problems. The highest accuracy we get from fine-tuning GPT-3 with 1000 examples (78%) is surpassed by GPT-4 given only 20 examples (79%). For the third challenge, we again find that GPT-4 outperforms the other two models. It is able to generate problems with accuracy ranging from 76.7% to 100%, depending on the problem type.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100506"},"PeriodicalIF":0.0,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000592/pdfft?md5=3e9a69094ef1d1b06354c3533f164953&pid=1-s2.0-S2666827023000592-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91641418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-24DOI: 10.1016/j.mlwa.2023.100504
Marie Alaghband , Hamid Reza Maghroor , Ivan Garibay
Individuals with hearing impairment encounter various types and levels of difficulties, highlighting the need for more research to provide effective support. One significant difficulty is communication and interaction with others. Given that these individuals employ sign language as their primary mode of communication, there exists a notable information void among those who can hear in comprehending and interpreting sign language. Consequently, to bridge this gap, the field of sign language research has seen significant growth. In this study, we emphasize the importance of sign language recognition and translation and provide a comprehensive review of relevant research conducted in this field. Our examination encompasses multiple perspectives, including sign language recognition, translation, and the availability of datasets. By exploring these aspects, we aim to contribute to the advancement of sign language literature and its practical applications.
{"title":"A survey on sign language literature","authors":"Marie Alaghband , Hamid Reza Maghroor , Ivan Garibay","doi":"10.1016/j.mlwa.2023.100504","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100504","url":null,"abstract":"<div><p>Individuals with hearing impairment encounter various types and levels of difficulties, highlighting the need for more research to provide effective support. One significant difficulty is communication and interaction with others. Given that these individuals employ sign language as their primary mode of communication, there exists a notable information void among those who can hear in comprehending and interpreting sign language. Consequently, to bridge this gap, the field of sign language research has seen significant growth. In this study, we emphasize the importance of sign language recognition and translation and provide a comprehensive review of relevant research conducted in this field. Our examination encompasses multiple perspectives, including sign language recognition, translation, and the availability of datasets. By exploring these aspects, we aim to contribute to the advancement of sign language literature and its practical applications.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100504"},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000579/pdfft?md5=9805cc1cb0d13025ad07897d4b4d9ca5&pid=1-s2.0-S2666827023000579-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91765561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17DOI: 10.1016/j.mlwa.2023.100505
Taehyun Kim , Byeongmin Ha , Soonho Hwangbo
In comparison to other countries, South Korea has a high reliance on energy, with the majority of its electricity being generated by a government-run company to ensure a stable and affordable supply. Unlike the "pay-as-bid" pricing approach, South Korea utilizes the system marginal price (SMP), also known as "pay-as-clear." Accurate SMP forecasting is crucial for guaranteeing steady economic growth because manufacturing is South Korea's flagship industry and accounts for 50 % of the nation's power consumption. In this study, a combination of machine learning-based batch learning and online learning techniques were employed to forecast the SMP in South Korea, utilizing a dataset consisting of five energy sectors, two financial sectors, and one transportation sector. The analysis of the F-value revealed that the coal sector, which is one of the energy sectors, had the most significant influence on SMP indicating the greatest score of 2,328. In this study, three machine learning models, namely support vector regression, simple deep neural network, and deep neural network, were suggested and compared for batch learning to determine the best-trained model. The evaluation metrics were used to assess the performance of these models. Based on the results obtained, the simple deep neural network was found to outperform the other models in terms of accuracy. Furthermore, two methods such as weight modification and time interval updating between inputs and output were employed for online learning based on the trained batch model. Upon the implementation of model updates, an ongoing assessment of its performance transpired utilizing the metrics of coefficient of determination, root mean square error, mean absolute error, and mean absolute percentage error. The average values for these metrics were observed to be 0.924, 7.991, 5.035, and 0.052, respectively. This study is expected to provide direct assistance in the formulation of energy plans for decision-makers in the industrial sector.
{"title":"Online machine learning approach for system marginal price forecasting using multiple economic indicators: A novel model for real-time decision making","authors":"Taehyun Kim , Byeongmin Ha , Soonho Hwangbo","doi":"10.1016/j.mlwa.2023.100505","DOIUrl":"https://doi.org/10.1016/j.mlwa.2023.100505","url":null,"abstract":"<div><p>In comparison to other countries, South Korea has a high reliance on energy, with the majority of its electricity being generated by a government-run company to ensure a stable and affordable supply. Unlike the \"pay-as-bid\" pricing approach, South Korea utilizes the system marginal price (SMP), also known as \"pay-as-clear.\" Accurate SMP forecasting is crucial for guaranteeing steady economic growth because manufacturing is South Korea's flagship industry and accounts for 50 % of the nation's power consumption. In this study, a combination of machine learning-based batch learning and online learning techniques were employed to forecast the SMP in South Korea, utilizing a dataset consisting of five energy sectors, two financial sectors, and one transportation sector. The analysis of the F-value revealed that the coal sector, which is one of the energy sectors, had the most significant influence on SMP indicating the greatest score of 2,328. In this study, three machine learning models, namely support vector regression, simple deep neural network, and deep neural network, were suggested and compared for batch learning to determine the best-trained model. The evaluation metrics were used to assess the performance of these models. Based on the results obtained, the simple deep neural network was found to outperform the other models in terms of accuracy. Furthermore, two methods such as weight modification and time interval updating between inputs and output were employed for online learning based on the trained batch model. Upon the implementation of model updates, an ongoing assessment of its performance transpired utilizing the metrics of coefficient of determination, root mean square error, mean absolute error, and mean absolute percentage error. The average values for these metrics were observed to be 0.924, 7.991, 5.035, and 0.052, respectively. This study is expected to provide direct assistance in the formulation of energy plans for decision-makers in the industrial sector.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"14 ","pages":"Article 100505"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000580/pdfft?md5=b92fc968d19bdf25faf4c6f48fc9fff3&pid=1-s2.0-S2666827023000580-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91765569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}