Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068466
Gazmira Brahushi, Uzair Ahmad
In this paper, we have evaluated our hybrid two-way recommendation system with expert-ranked resumes and job descriptions. The aim of the paper is to compare the lists produced by the recommendation system with human-ranked lists for candidate and job descriptions. Firstly, we set up four scenarios such as the matching of resume to resumes, job to jobs, resume to jobs, and job to resumes, and prepared a human ranking based on the content similarity on a total of 400 documents. Based on this annotated corpus we tested our system to calculate the cosine-similarity-based ranking for each scenario using the Global Vectors for Word Embeddings and Term Frequency-Inverse Document Frequency representations. Finally, we compared the similarities of human ranked lists and system-ranked lists by using the Rank Biased Overlap (RBO) similarity score. In both methods, GloVe and TF-IDF, the median RBO between human-ranked lists and system ranked are greater than 0.5. The highest median score is achieved on TF-IDF with a slight difference compared to GloVe apart from the ranking of resume-to-resume scenario where the variation between the two methods is considerable. This is due to the similarity between human-ranked lists and program-generated lists.
{"title":"Empirical Evaluation of Word Representation Methods in the Context of Candidate-Job Recommender Systems","authors":"Gazmira Brahushi, Uzair Ahmad","doi":"10.1109/ISCMI56532.2022.10068466","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068466","url":null,"abstract":"In this paper, we have evaluated our hybrid two-way recommendation system with expert-ranked resumes and job descriptions. The aim of the paper is to compare the lists produced by the recommendation system with human-ranked lists for candidate and job descriptions. Firstly, we set up four scenarios such as the matching of resume to resumes, job to jobs, resume to jobs, and job to resumes, and prepared a human ranking based on the content similarity on a total of 400 documents. Based on this annotated corpus we tested our system to calculate the cosine-similarity-based ranking for each scenario using the Global Vectors for Word Embeddings and Term Frequency-Inverse Document Frequency representations. Finally, we compared the similarities of human ranked lists and system-ranked lists by using the Rank Biased Overlap (RBO) similarity score. In both methods, GloVe and TF-IDF, the median RBO between human-ranked lists and system ranked are greater than 0.5. The highest median score is achieved on TF-IDF with a slight difference compared to GloVe apart from the ranking of resume-to-resume scenario where the variation between the two methods is considerable. This is due to the similarity between human-ranked lists and program-generated lists.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"93 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131434223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068488
Méziane Aïder, M. Hifi, Khadidja Latram
In this paper, we study the max-min knapsack problem with multi-scenarios, where a cooperative population based method is designed for approximately solving it. Its instance is represented by a knapsack of fixed capacity, a set of items (with weights and profits) and possible scenarios related to overall items. Its goal is to select a subset of items whose total weight fills the knapsack, and whose total profit is maximized in the worst scenario according the whole scenarios. The designed method is based upon the grey wolf optimizer, where a series of local searches are employed for highlighting the performance of the method. It starts with a reference set of positions related to the wolves, which is provided with a random greedy procedure. In order to enhance the behavior of the standard version, a series of exploring strategies is employed. Next, in order to avoid premature convergence, a drop and rebuild strategy is added hopping to exploit new unexplored subspaces. Finally, the behavior of the method is computationally analyzed on benchmark instances of the literature, where its provided results are compared to the best results available in the literature. Encouraging results have been obtained.
{"title":"A Cooperative Population-Based Method for Solving the Max-Min Knapsack Problem with Multi-scenarios","authors":"Méziane Aïder, M. Hifi, Khadidja Latram","doi":"10.1109/ISCMI56532.2022.10068488","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068488","url":null,"abstract":"In this paper, we study the max-min knapsack problem with multi-scenarios, where a cooperative population based method is designed for approximately solving it. Its instance is represented by a knapsack of fixed capacity, a set of items (with weights and profits) and possible scenarios related to overall items. Its goal is to select a subset of items whose total weight fills the knapsack, and whose total profit is maximized in the worst scenario according the whole scenarios. The designed method is based upon the grey wolf optimizer, where a series of local searches are employed for highlighting the performance of the method. It starts with a reference set of positions related to the wolves, which is provided with a random greedy procedure. In order to enhance the behavior of the standard version, a series of exploring strategies is employed. Next, in order to avoid premature convergence, a drop and rebuild strategy is added hopping to exploit new unexplored subspaces. Finally, the behavior of the method is computationally analyzed on benchmark instances of the literature, where its provided results are compared to the best results available in the literature. Encouraging results have been obtained.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"50 s1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132389860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068439
V. Sangeetha, R. Krishankumar, K. S. Ravichandran, A. Gandomi
Increasing carbon emissions, and thus footprint, is one of the main reasons for the imbalance in environmental sustainability, which is primarily contributed to transportation. Transportation is a core functionality of logistics distribution and supply chain. In this paper, a hybrid gain-ant colony optimization and fruit fly optimization algorithm for green vehicle routing problem is proposed to plan shortest paths with reduced total fuel consumption efficiently. The proposed algorithm was simulated using the Erdogan and Miller Hooks dataset and compared with best-known solutions and existing methods.
{"title":"A Hybrid Gain-Ant Colony Algorithm for Green Vehicle Routing Problem","authors":"V. Sangeetha, R. Krishankumar, K. S. Ravichandran, A. Gandomi","doi":"10.1109/ISCMI56532.2022.10068439","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068439","url":null,"abstract":"Increasing carbon emissions, and thus footprint, is one of the main reasons for the imbalance in environmental sustainability, which is primarily contributed to transportation. Transportation is a core functionality of logistics distribution and supply chain. In this paper, a hybrid gain-ant colony optimization and fruit fly optimization algorithm for green vehicle routing problem is proposed to plan shortest paths with reduced total fuel consumption efficiently. The proposed algorithm was simulated using the Erdogan and Miller Hooks dataset and compared with best-known solutions and existing methods.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114235528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068455
Bhagawat Adhikari, R. Ranabhat, Mohammad Mizanur Rahman, R. Kashef
Segregation of recyclable waste items is one of the crucial aspects of smart cities and their industrial applications. CNN-based machine learning models are widely used to predict and classify image datasets. Traditional deep learning models are fast in training the image dataset, but the classification accuracy is usually too low. Different densely connected CNN architectures are widely used to improve the accuracy in the image waste classification. Despite the remarkable accuracy in such densely connected models, these models often suffer from high computational complexity during the training phase. To overcome this computational complexity, DenseNet121 has been developed, which reduces the training time due to its unique dense block architecture. RecycleNet is a modification of DenseNet121 where the skip connections in the dense block architecture are changed to reduce the computational complexity. In this paper, we propose a unique model called Enhanced RecycleNet, where the skip connections between the dense block architecture are reduced to one-third than in the DenseNet121 model. This unique architecture has improved the model's performance by 46.3% and decreased the trainable parameters from 7 million to about 2.4 million.
{"title":"Enhanced RecycleNet for Efficient Waste Classification","authors":"Bhagawat Adhikari, R. Ranabhat, Mohammad Mizanur Rahman, R. Kashef","doi":"10.1109/ISCMI56532.2022.10068455","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068455","url":null,"abstract":"Segregation of recyclable waste items is one of the crucial aspects of smart cities and their industrial applications. CNN-based machine learning models are widely used to predict and classify image datasets. Traditional deep learning models are fast in training the image dataset, but the classification accuracy is usually too low. Different densely connected CNN architectures are widely used to improve the accuracy in the image waste classification. Despite the remarkable accuracy in such densely connected models, these models often suffer from high computational complexity during the training phase. To overcome this computational complexity, DenseNet121 has been developed, which reduces the training time due to its unique dense block architecture. RecycleNet is a modification of DenseNet121 where the skip connections in the dense block architecture are changed to reduce the computational complexity. In this paper, we propose a unique model called Enhanced RecycleNet, where the skip connections between the dense block architecture are reduced to one-third than in the DenseNet121 model. This unique architecture has improved the model's performance by 46.3% and decreased the trainable parameters from 7 million to about 2.4 million.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068476
Jasper Kyle Catapang
Stochastic gradient descent (SGD) is a widely used optimization algorithm for training machine learning models. However, due to its slow convergence and high variance, SGD can be difficult to use in practice. In this paper, the author proposes the use of the 4th order Runge-Kutta-Nyström (RKN) method to approximate the gradient function in SGD and replace the Newton boosting and SGD found in XGBoost and multilayer perceptrons (MLPs), respectively. The new variants are called ASTRA-Boost and ASTRA perceptron, where ASTRA stands for “Accuracy-Speed Trade-off Reduction via Approximation”. Specifically, the ASTRA models, through the 4th order Runge-Kutta-Nyström, converge faster than MLP with SGD and they also produce lower variance outputs, all without compromising model accuracy and overall performance.
{"title":"Optimizing Speed and Accuracy Trade-off in Machine Learning Models via Stochastic Gradient Descent Approximation","authors":"Jasper Kyle Catapang","doi":"10.1109/ISCMI56532.2022.10068476","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068476","url":null,"abstract":"Stochastic gradient descent (SGD) is a widely used optimization algorithm for training machine learning models. However, due to its slow convergence and high variance, SGD can be difficult to use in practice. In this paper, the author proposes the use of the 4th order Runge-Kutta-Nyström (RKN) method to approximate the gradient function in SGD and replace the Newton boosting and SGD found in XGBoost and multilayer perceptrons (MLPs), respectively. The new variants are called ASTRA-Boost and ASTRA perceptron, where ASTRA stands for “Accuracy-Speed Trade-off Reduction via Approximation”. Specifically, the ASTRA models, through the 4th order Runge-Kutta-Nyström, converge faster than MLP with SGD and they also produce lower variance outputs, all without compromising model accuracy and overall performance.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114688930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068470
Yangjun Chen, Bobin Chen
By the package design problem we are given a set of queries (referred to as a query log) with each being a bit string indicating the favourite activities or items of customers and required to design a package of activities (or items) to satisfy as many customers as possible. It is a typical problem of data mining. In this paper, we address this issue and propose an efficient algorithm for solving the problem based on a new tree search strategy, the so-called priority-first search, by which the tree search is controlled by using a priority queue, instead of a stack or a queue data structure. Extensive experiments have been conducted, which show that our method for this problem is promising.
{"title":"Priority-First Search and Mining Popular Packages","authors":"Yangjun Chen, Bobin Chen","doi":"10.1109/ISCMI56532.2022.10068470","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068470","url":null,"abstract":"By the package design problem we are given a set of queries (referred to as a query log) with each being a bit string indicating the favourite activities or items of customers and required to design a package of activities (or items) to satisfy as many customers as possible. It is a typical problem of data mining. In this paper, we address this issue and propose an efficient algorithm for solving the problem based on a new tree search strategy, the so-called priority-first search, by which the tree search is controlled by using a priority queue, instead of a stack or a queue data structure. Extensive experiments have been conducted, which show that our method for this problem is promising.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"131 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130243764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068472
Anjli Varghese, M. Jawahar, A. Prince, A. Gandomi
This paper describes the relevance of texture analysis on leather images. The aim is to improve the prediction accuracy by quantifying the morphological and statistical behavior of the leather images. Hence, the present work proposed to combine the multi-resolution discrete wavelet transform (DWT) and local binary pattern (LBP) texture operators. The hybrid texture features (DWT + LBP) offer better species-specific feature discrimination. This work adopts a multi-layer perceptron (MLP) model to evaluate the discriminatory behavior of the texture features. The proposed work extract, analyze and learn the species' distinct texture features of the novel digital microscopic leather image data. The experimental results noted a significant improvement in species prediction with 99.58% accuracy. Therefore, texture analysis elevates the ability to interpret the leather images per species. It is thus a necessary key to learn the permissible leather species' behavior so as to prevent the trade of non-permissible leather and its products.
{"title":"Texture Analysis on Digital Microscopic Leather Images For Species Identification","authors":"Anjli Varghese, M. Jawahar, A. Prince, A. Gandomi","doi":"10.1109/ISCMI56532.2022.10068472","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068472","url":null,"abstract":"This paper describes the relevance of texture analysis on leather images. The aim is to improve the prediction accuracy by quantifying the morphological and statistical behavior of the leather images. Hence, the present work proposed to combine the multi-resolution discrete wavelet transform (DWT) and local binary pattern (LBP) texture operators. The hybrid texture features (DWT + LBP) offer better species-specific feature discrimination. This work adopts a multi-layer perceptron (MLP) model to evaluate the discriminatory behavior of the texture features. The proposed work extract, analyze and learn the species' distinct texture features of the novel digital microscopic leather image data. The experimental results noted a significant improvement in species prediction with 99.58% accuracy. Therefore, texture analysis elevates the ability to interpret the leather images per species. It is thus a necessary key to learn the permissible leather species' behavior so as to prevent the trade of non-permissible leather and its products.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129396632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068474
Lobna H. Kamal, Gerard McKee, N. A. Othman
This paper proposes an enhanced negation handling technique for sentiment analysis of Twitter data using the Naïve Bayes algorithm and Part-of-Speech (POS) tagging. Negation handling detects negated content in text and can thus improve sentiment prediction. The proposed technique focuses on the detection of direct negation words such as “not” and “no”, and implicitly negated content such as “could have been” and “should have been”. The paper compares the proposed negation handling technique with an existing negation handling technique. The Sentiment140 dataset is used in the experiments. Naïve Bayes with the proposed negation handling technique gave an accuracy of 77.57% while the accuracy of the Naïve Bayes with the existing negation handling was 76.93% and the accuracy of the standard Naïve Bayes was 76.12 % for a dataset of 1,000,000 tweets. Of these 1,000,000 tweets 197,381 contained one or more negations. Taking these negated tweets alone, the proposed technique showed an improvement over the existing technique and standard Naïve Bayes with accuracies respectively of 76.51%, 75.98%, and 75.09%. The improvements and shortcomings of the proposed technique are discussed.
本文提出了一种使用Naïve贝叶斯算法和词性标注的Twitter数据情感分析的增强否定处理技术。否定处理检测文本中的否定内容,从而可以提高情绪预测。该技术侧重于直接否定词(如“not”和“no”)和隐含否定内容(如“could have been”和“should have been”)的检测。本文将提出的否定处理技术与现有的否定处理技术进行了比较。实验中使用Sentiment140数据集。对于1,000,000条推文数据集,Naïve贝叶斯的否定处理准确率为77.57%,而现有的Naïve贝叶斯的否定处理准确率为76.93%,标准Naïve贝叶斯的准确率为76.12%。在这1,000,000条tweet中,197,381条包含一个或多个否定。单独考虑这些否定推文,本文提出的技术比现有技术和标准Naïve贝叶斯有了改进,准确率分别为76.51%、75.98%和75.09%。讨论了该技术的改进和不足。
{"title":"Naïve Bayes with Negation Handling for Sentiment Analysis of Twitter Data","authors":"Lobna H. Kamal, Gerard McKee, N. A. Othman","doi":"10.1109/ISCMI56532.2022.10068474","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068474","url":null,"abstract":"This paper proposes an enhanced negation handling technique for sentiment analysis of Twitter data using the Naïve Bayes algorithm and Part-of-Speech (POS) tagging. Negation handling detects negated content in text and can thus improve sentiment prediction. The proposed technique focuses on the detection of direct negation words such as “not” and “no”, and implicitly negated content such as “could have been” and “should have been”. The paper compares the proposed negation handling technique with an existing negation handling technique. The Sentiment140 dataset is used in the experiments. Naïve Bayes with the proposed negation handling technique gave an accuracy of 77.57% while the accuracy of the Naïve Bayes with the existing negation handling was 76.93% and the accuracy of the standard Naïve Bayes was 76.12 % for a dataset of 1,000,000 tweets. Of these 1,000,000 tweets 197,381 contained one or more negations. Taking these negated tweets alone, the proposed technique showed an improvement over the existing technique and standard Naïve Bayes with accuracies respectively of 76.51%, 75.98%, and 75.09%. The improvements and shortcomings of the proposed technique are discussed.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134026302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068457
Oliver Strauß, Damian Kutzias, H. Kett
With the advent of data ecosystems finding information in distributed and federated catalogs and marketplaces becomes more and more important. One of the problems in data search and search in general is the mismatch between the terminology of users and of the searched items, be it dataset metadata or web pages. The paper proposes an agent-based approach to document expansion (ADE). The idea is to represent documents with agents that exploit local information collected from user searches and relevant signals to improve the representation of the document in a search index and subsequently to improve the search performance of the system. The agents collect terms from relevant queries and perform topic modeling on these terms and publish different variants expanded with the topic terms to the search index. We find that the approach achieves good improvement in search performance and is a valuable tool because is places no burden on the information retrieval pipeline and is complementary to other document expansion and information retrieval approaches.
{"title":"Agent-Based Document Expansion for Information Retrieval Based on Topic Modeling of Local Information","authors":"Oliver Strauß, Damian Kutzias, H. Kett","doi":"10.1109/ISCMI56532.2022.10068457","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068457","url":null,"abstract":"With the advent of data ecosystems finding information in distributed and federated catalogs and marketplaces becomes more and more important. One of the problems in data search and search in general is the mismatch between the terminology of users and of the searched items, be it dataset metadata or web pages. The paper proposes an agent-based approach to document expansion (ADE). The idea is to represent documents with agents that exploit local information collected from user searches and relevant signals to improve the representation of the document in a search index and subsequently to improve the search performance of the system. The agents collect terms from relevant queries and perform topic modeling on these terms and publish different variants expanded with the topic terms to the search index. We find that the approach achieves good improvement in search performance and is a valuable tool because is places no burden on the information retrieval pipeline and is complementary to other document expansion and information retrieval approaches.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132931354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-26DOI: 10.1109/ISCMI56532.2022.10068473
Mostafa Bakhshi, S. L. Mirtaheri, S. Greco
Cardiovascular disease is the leading cause of death in the world. Nowadays, tremendous amount of data is collected on heart disease. Investigating the data and obtaining insight using data mining can improve the detection and prevention rate, especially in early stages. So far, many researches are performed on data mining models for diagnoses. In this paper, we intend to present a model for the diagnosis of heart disease using a feature-based approach as a preprocessing step. The proposed solution include four main steps as preprocessing the data, selecting effective features, clustering by using the K-Means algorithm and proposing a hybrid model of decision tree and neural network to determine the disease. In selecting the effective features, we use three methods as Pearson correlation coefficient, information gain, and component analysis. The evaluation results confirm that the proposed hybrid model outperforms the existing methods by 0.97 accuracy.
{"title":"Heart Disease Prediction Using Hybrid Machine Learning Model Based on Decision Tree and Neural Network","authors":"Mostafa Bakhshi, S. L. Mirtaheri, S. Greco","doi":"10.1109/ISCMI56532.2022.10068473","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068473","url":null,"abstract":"Cardiovascular disease is the leading cause of death in the world. Nowadays, tremendous amount of data is collected on heart disease. Investigating the data and obtaining insight using data mining can improve the detection and prevention rate, especially in early stages. So far, many researches are performed on data mining models for diagnoses. In this paper, we intend to present a model for the diagnosis of heart disease using a feature-based approach as a preprocessing step. The proposed solution include four main steps as preprocessing the data, selecting effective features, clustering by using the K-Means algorithm and proposing a hybrid model of decision tree and neural network to determine the disease. In selecting the effective features, we use three methods as Pearson correlation coefficient, information gain, and component analysis. The evaluation results confirm that the proposed hybrid model outperforms the existing methods by 0.97 accuracy.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133262801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}