Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068452
David Herrera-Sánchez, E. Mezura-Montes, H. Acosta-Mesa
Feature construction and feature selection are essential pre-processing techniques in data mining, especially for high-dimensional data. The principal goals of such techniques are to increase accuracy in classification tasks and reduce runtime in the learning process. Genetic programming is used to construct a new high-level feature space. Additionally, the feature selection process, immersed in the task, is seized. Therefore, a set of features with relevant information is obtained. This paper presents an approach to reducing the features of high-dimensional data throughout genetic programming. Moreover, reducing the search space eliminates features that do not have considerable information over the generations of the search process. Although the approach is simple, competitive results are achieved. In the implementation, the wrapper approach is used for the classifier to lead the searching process.
{"title":"Feature Construction, Feature Reduction and Search Space Reduction Using Genetic Programming","authors":"David Herrera-Sánchez, E. Mezura-Montes, H. Acosta-Mesa","doi":"10.1109/ISCMI56532.2022.10068452","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068452","url":null,"abstract":"Feature construction and feature selection are essential pre-processing techniques in data mining, especially for high-dimensional data. The principal goals of such techniques are to increase accuracy in classification tasks and reduce runtime in the learning process. Genetic programming is used to construct a new high-level feature space. Additionally, the feature selection process, immersed in the task, is seized. Therefore, a set of features with relevant information is obtained. This paper presents an approach to reducing the features of high-dimensional data throughout genetic programming. Moreover, reducing the search space eliminates features that do not have considerable information over the generations of the search process. Although the approach is simple, competitive results are achieved. In the implementation, the wrapper approach is used for the classifier to lead the searching process.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116714593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068460
Apoorva Gupta, Smriti Arora, Niyati Baliyan
Online news articles, blogs, sites are a rich source of diverse text-based data. However, the data contained in these sources cannot be manually extricated, recorded, and listed because it comes in colossal size. Accurate mapping of precise news into their corresponding category is challenging in these times. Several methods have been proposed over time for news classification when training documents for each predefined class are present readily, however such methods were tried and tested upon a small dataset. With the underlying research, the aim is to propose a method that can be used when lakhs and lakhs of instances are present. This research analysis involves the task of news classification using multiclass classifiers - OneVsRest and OneVsOne classifiers over the Linear Support Vector Classification to learn the performance of multiclass news categorization. The proposed methodology “Keyword Based Classification Technique (KBCT)” in this study was executed and concluded using Python and deployed using Google Colaboratory. The result was expressed using four distinguished news classes over a multivariate dataset of 422419 instances from the uci-news-aggregator dataset. The OneVsRestClassifier's accuracy was computed to be 95.76% that was 0.09% more than the OneVsOneClassifier's accuracy of 95.67%. The proposed prototype was compared with some of the related studies and algorithms, and the outcomes produced by the OneVsRest model were the most optimum in terms of accuracy.
{"title":"Online News Extraction and Multiclass Classification Using Linear Support Vector Machines","authors":"Apoorva Gupta, Smriti Arora, Niyati Baliyan","doi":"10.1109/ISCMI56532.2022.10068460","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068460","url":null,"abstract":"Online news articles, blogs, sites are a rich source of diverse text-based data. However, the data contained in these sources cannot be manually extricated, recorded, and listed because it comes in colossal size. Accurate mapping of precise news into their corresponding category is challenging in these times. Several methods have been proposed over time for news classification when training documents for each predefined class are present readily, however such methods were tried and tested upon a small dataset. With the underlying research, the aim is to propose a method that can be used when lakhs and lakhs of instances are present. This research analysis involves the task of news classification using multiclass classifiers - OneVsRest and OneVsOne classifiers over the Linear Support Vector Classification to learn the performance of multiclass news categorization. The proposed methodology “Keyword Based Classification Technique (KBCT)” in this study was executed and concluded using Python and deployed using Google Colaboratory. The result was expressed using four distinguished news classes over a multivariate dataset of 422419 instances from the uci-news-aggregator dataset. The OneVsRestClassifier's accuracy was computed to be 95.76% that was 0.09% more than the OneVsOneClassifier's accuracy of 95.67%. The proposed prototype was compared with some of the related studies and algorithms, and the outcomes produced by the OneVsRest model were the most optimum in terms of accuracy.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127588695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068487
Ndolane Diouf, Massa Ndong, Dialo Diop, K. Talla, Mamadou Sarr, A. Beye
Prior knowledge of wireless channel quality with high accuracy is essential to enable anticipated networking tasks. Traditional channel quality prediction problems rely on past channel information to predict its future quality. In this paper, we investigate the channel quality prediction problem over different wireless channels. We propose an efficient prediction scheme based on deep learning, to predict channel quality. For the deep learning task, we use deep neural networks and long short-term memory networks. We compare their performance on a dataset collected from a commercial 4G mobile radio network of Orange Senegal. The performance evaluation performed on the benchmark dataset demonstrates the validity of the proposed deep learning approach, reaching a root mean square error of 0.27 for the LSTM model and 0.28 for the DNN model. The performances in terms of RMSE with the same dataset for each of the models used in this study were compared to other models. Thus, the DNN and LSTM models give low RMSEs compared to the models of our previous work. The proposed prediction method can be applied for 5G small cell networks.
{"title":"Channel Quality Prediction in 5G LTE Small Cell Mobile Network Using Deep Learning","authors":"Ndolane Diouf, Massa Ndong, Dialo Diop, K. Talla, Mamadou Sarr, A. Beye","doi":"10.1109/ISCMI56532.2022.10068487","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068487","url":null,"abstract":"Prior knowledge of wireless channel quality with high accuracy is essential to enable anticipated networking tasks. Traditional channel quality prediction problems rely on past channel information to predict its future quality. In this paper, we investigate the channel quality prediction problem over different wireless channels. We propose an efficient prediction scheme based on deep learning, to predict channel quality. For the deep learning task, we use deep neural networks and long short-term memory networks. We compare their performance on a dataset collected from a commercial 4G mobile radio network of Orange Senegal. The performance evaluation performed on the benchmark dataset demonstrates the validity of the proposed deep learning approach, reaching a root mean square error of 0.27 for the LSTM model and 0.28 for the DNN model. The performances in terms of RMSE with the same dataset for each of the models used in this study were compared to other models. Thus, the DNN and LSTM models give low RMSEs compared to the models of our previous work. The proposed prediction method can be applied for 5G small cell networks.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115413412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068446
K. Palaniappan, Ushasukhanya S, T. N. Malleswari, Prabha Selvaraj, Vijay Kumar Burugari
A disentangled representation is one in which each variable in the latent space is sensitive to one single generative factor and is relatively dormant to other factors. Disentanglement results in an incisive latent representation of the image which can be used for downstream tasks such as reinforcement learning and supervised learning. The discrete generative factors in image datasets are hard to capture in the form of a latent space and in order to perform efficient interpolations it requires smooth and continuous latent spaces in order to address this by disentangling the important factors of the input image in the latent space. Subsequently post training the model should be able to generate different versions of the input image by varying features/attributes. A technique Hybrid Optimized GAN using Dormant Variants (HOGDV) is proposed which can be deployed in multiple places if the number is made variable and works on a wide variety of data distribution.
{"title":"Learning Disentangled Representations Using Dormant Variations","authors":"K. Palaniappan, Ushasukhanya S, T. N. Malleswari, Prabha Selvaraj, Vijay Kumar Burugari","doi":"10.1109/ISCMI56532.2022.10068446","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068446","url":null,"abstract":"A disentangled representation is one in which each variable in the latent space is sensitive to one single generative factor and is relatively dormant to other factors. Disentanglement results in an incisive latent representation of the image which can be used for downstream tasks such as reinforcement learning and supervised learning. The discrete generative factors in image datasets are hard to capture in the form of a latent space and in order to perform efficient interpolations it requires smooth and continuous latent spaces in order to address this by disentangling the important factors of the input image in the latent space. Subsequently post training the model should be able to generate different versions of the input image by varying features/attributes. A technique Hybrid Optimized GAN using Dormant Variants (HOGDV) is proposed which can be deployed in multiple places if the number is made variable and works on a wide variety of data distribution.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122454976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068490
Mélanie Jouaiti, K. Dautenhahn
Dysfluency classification for stuttered speech has been tackled from different perspectives over the years, with research being more and more focused on deep learning. Here, we use a specific biological model of sound texture perception to extract a subband representation of speech and statistical features. A statistical analysis was also performed to identify relevant features. Afterwards, dysfluency classification was performed using a Random Forest Classifier to perform multi-label classification on the FluencyBank dataset and Support Vector Machine on the UCLASS dataset. This method performs as well or better than current state of the art deep learning algorithm, suggesting that approaching speech classification problems from a more biological point of view is a promising direction.
{"title":"Dysfluency Classification in Speech Using a Biological Sound Perception Model","authors":"Mélanie Jouaiti, K. Dautenhahn","doi":"10.1109/ISCMI56532.2022.10068490","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068490","url":null,"abstract":"Dysfluency classification for stuttered speech has been tackled from different perspectives over the years, with research being more and more focused on deep learning. Here, we use a specific biological model of sound texture perception to extract a subband representation of speech and statistical features. A statistical analysis was also performed to identify relevant features. Afterwards, dysfluency classification was performed using a Random Forest Classifier to perform multi-label classification on the FluencyBank dataset and Support Vector Machine on the UCLASS dataset. This method performs as well or better than current state of the art deep learning algorithm, suggesting that approaching speech classification problems from a more biological point of view is a promising direction.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068437
Geetika Tiwari, Ruchi Jain
Cloud computing has been promoted as one of the most effective methods of hosting and delivering services via the internet. But cloud security remains a serious concern for cloud computing. Many secure solutions have been developed to safeguard communication in such environments, the majority of which are based on attack signatures. These systems are often ineffective in detecting all forms of threats. To address this gap machine learning approaches are being explored. In this research, we present a novel firewall mechanism for safe cloud computing environments called machine learning system. Proposed Method identifies and classifies incoming traffic packets using a novel combination methodology named most frequent decision, in which the nodes' one previous decisions are coupled with the machine learning algorithm's current decision to estimate the final attack category classification. This method improves learning performance as well as system correctness. UNSW-NB-15, a publicly accessible dataset, is utilized to derive our findings. Our data demonstrate that it enhances anomaly detection to 97.68 percent.
{"title":"A Novel Framework for Secure Cloud Computing Based IDS Using Machine Learning Techniques","authors":"Geetika Tiwari, Ruchi Jain","doi":"10.1109/ISCMI56532.2022.10068437","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068437","url":null,"abstract":"Cloud computing has been promoted as one of the most effective methods of hosting and delivering services via the internet. But cloud security remains a serious concern for cloud computing. Many secure solutions have been developed to safeguard communication in such environments, the majority of which are based on attack signatures. These systems are often ineffective in detecting all forms of threats. To address this gap machine learning approaches are being explored. In this research, we present a novel firewall mechanism for safe cloud computing environments called machine learning system. Proposed Method identifies and classifies incoming traffic packets using a novel combination methodology named most frequent decision, in which the nodes' one previous decisions are coupled with the machine learning algorithm's current decision to estimate the final attack category classification. This method improves learning performance as well as system correctness. UNSW-NB-15, a publicly accessible dataset, is utilized to derive our findings. Our data demonstrate that it enhances anomaly detection to 97.68 percent.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124268807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-27DOI: 10.1109/ISCMI56532.2022.10068489
Nikolay Manchev, Michael W. Spratling
Initialising the synaptic weights of artificial neural networks (ANNs) with orthogonal matrices is known to alleviate vanishing and exploding gradient problems. A major objection against such initialisation schemes is that they are deemed biologically implausible as they mandate factorization techniques that are difficult to attribute to a neurobiological process. This paper presents two initialisation schemes that allow a network to naturally evolve its weights to form orthogonal matrices, provides theoretical analysis that pre-training orthogonalisation always converges, and empirically confirms that the proposed schemes outperform randomly initialised recurrent and feedforward networks.
{"title":"On the Biological Plausibility of Orthogonal Initialisation for Solving Gradient Instability in Deep Neural Networks","authors":"Nikolay Manchev, Michael W. Spratling","doi":"10.1109/ISCMI56532.2022.10068489","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068489","url":null,"abstract":"Initialising the synaptic weights of artificial neural networks (ANNs) with orthogonal matrices is known to alleviate vanishing and exploding gradient problems. A major objection against such initialisation schemes is that they are deemed biologically implausible as they mandate factorization techniques that are difficult to attribute to a neurobiological process. This paper presents two initialisation schemes that allow a network to naturally evolve its weights to form orthogonal matrices, provides theoretical analysis that pre-training orthogonalisation always converges, and empirically confirms that the proposed schemes outperform randomly initialised recurrent and feedforward networks.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"32 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114128167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more commonly used for sensitive matters than ever before. A need to authenticate users continuously is sought after as a single-factor or multi-factor authentication may only initially validate a user, which doesn't help if an impostor can bypass this initial validation. The field of touch dynamics emerges as a clear way to non-intrusively collect data about a user and their behaviors in order to develop and make imperative security-related decisions in real time. In this paper we present a novel dataset consisting of tracking 25 users playing two mobile games - Snake.io and Minecraft - each for 10 minutes, along with their relevant gesture data. From this data, we ran machine learning binary classifiers - namely Random Forest and K-Nearest Neighbor - to attempt to authenticate whether a sample of a particular user's actions were genuine. Our strongest model returned an average accuracy of roughly 93% for both games, showing touch dynamics can differentiate users effectively and is a feasible consideration for authentication schemes. Our dataset can be observed at https://github.com/zderidder/MC-Snake-Results
{"title":"Continuous User Authentication Using Machine Learning and Multi-finger Mobile Touch Dynamics with a Novel Dataset","authors":"Zachary Deridder, Nyle Siddiqui, Thomas Reither, Rushit Dave, Brendan Pelto, Naeem Seliya, Mounika Vanamala","doi":"10.1109/ISCMI56532.2022.10068450","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068450","url":null,"abstract":"As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more commonly used for sensitive matters than ever before. A need to authenticate users continuously is sought after as a single-factor or multi-factor authentication may only initially validate a user, which doesn't help if an impostor can bypass this initial validation. The field of touch dynamics emerges as a clear way to non-intrusively collect data about a user and their behaviors in order to develop and make imperative security-related decisions in real time. In this paper we present a novel dataset consisting of tracking 25 users playing two mobile games - Snake.io and Minecraft - each for 10 minutes, along with their relevant gesture data. From this data, we ran machine learning binary classifiers - namely Random Forest and K-Nearest Neighbor - to attempt to authenticate whether a sample of a particular user's actions were genuine. Our strongest model returned an average accuracy of roughly 93% for both games, showing touch dynamics can differentiate users effectively and is a feasible consideration for authentication schemes. Our dataset can be observed at https://github.com/zderidder/MC-Snake-Results","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124025479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-27DOI: 10.1109/ISCMI56532.2022.10068449
Jacob Mallet, Rushit Dave, Naeem Seliya, Mounika Vanamala
In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one person's face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.
{"title":"Using Deep Learning to Detecting Deepfakes","authors":"Jacob Mallet, Rushit Dave, Naeem Seliya, Mounika Vanamala","doi":"10.1109/ISCMI56532.2022.10068449","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068449","url":null,"abstract":"In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one person's face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124471037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-21DOI: 10.1109/ISCMI56532.2022.10068451
Nimesh Bhana, Terence L van Zyl
Language Models such as BERT (Bidirectional Encoder Representations from Transformers) have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques, they can produce semantic representations of text, useful for tasks such as semantic similarity. However, state-of-the-art models often have high computational requirements and lack global context or domain knowledge which is required for complete language understanding. To address these limitations, we investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT. An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language and extended to inject contextually relevant information into sentences. As a side-effect, changes made to K-BERT for accommodating the English language also extend to other word-based languages. Experiments conducted indicate that injected knowledge introduces noise. We see statistically significant improvements for knowledge-driven tasks when this noise is minimised. We show evidence that, given the appropriate task, modest injection with relevant, high-quality knowledge is most performant.
{"title":"Knowledge Graph Fusion for Language Model Fine-Tuning","authors":"Nimesh Bhana, Terence L van Zyl","doi":"10.1109/ISCMI56532.2022.10068451","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068451","url":null,"abstract":"Language Models such as BERT (Bidirectional Encoder Representations from Transformers) have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques, they can produce semantic representations of text, useful for tasks such as semantic similarity. However, state-of-the-art models often have high computational requirements and lack global context or domain knowledge which is required for complete language understanding. To address these limitations, we investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT. An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language and extended to inject contextually relevant information into sentences. As a side-effect, changes made to K-BERT for accommodating the English language also extend to other word-based languages. Experiments conducted indicate that injected knowledge introduces noise. We see statistically significant improvements for knowledge-driven tasks when this noise is minimised. We show evidence that, given the appropriate task, modest injection with relevant, high-quality knowledge is most performant.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"10 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121009686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}