首页 > 最新文献

2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)最新文献

英文 中文
Empirical Evaluation of Word Representation Methods in the Context of Candidate-Job Recommender Systems 词表示方法在候选人-职位推荐系统中的实证评价
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068466
Gazmira Brahushi, Uzair Ahmad
In this paper, we have evaluated our hybrid two-way recommendation system with expert-ranked resumes and job descriptions. The aim of the paper is to compare the lists produced by the recommendation system with human-ranked lists for candidate and job descriptions. Firstly, we set up four scenarios such as the matching of resume to resumes, job to jobs, resume to jobs, and job to resumes, and prepared a human ranking based on the content similarity on a total of 400 documents. Based on this annotated corpus we tested our system to calculate the cosine-similarity-based ranking for each scenario using the Global Vectors for Word Embeddings and Term Frequency-Inverse Document Frequency representations. Finally, we compared the similarities of human ranked lists and system-ranked lists by using the Rank Biased Overlap (RBO) similarity score. In both methods, GloVe and TF-IDF, the median RBO between human-ranked lists and system ranked are greater than 0.5. The highest median score is achieved on TF-IDF with a slight difference compared to GloVe apart from the ranking of resume-to-resume scenario where the variation between the two methods is considerable. This is due to the similarity between human-ranked lists and program-generated lists.
在本文中,我们用专家排名简历和职位描述来评估我们的混合双向推荐系统。本文的目的是将推荐系统产生的列表与人类排名的候选人和职位描述列表进行比较。首先,我们设置了简历到简历、工作到工作、简历到工作、工作到简历四种场景,并基于内容相似度对总共400份文档进行了人工排名。基于这个带注释的语料库,我们测试了我们的系统,使用词嵌入的全局向量和词频率逆文档频率表示来计算每个场景的基于余弦相似度的排名。最后,我们使用秩偏重叠(RBO)相似性评分比较了人类排名列表和系统排名列表的相似性。在GloVe和TF-IDF两种方法中,人类排名与系统排名之间的RBO中位数均大于0.5。TF-IDF的中位数得分最高,与GloVe相比略有差异,除了简历到简历场景的排名,两种方法之间的差异相当大。这是由于人类排名列表和程序生成列表之间的相似性。
{"title":"Empirical Evaluation of Word Representation Methods in the Context of Candidate-Job Recommender Systems","authors":"Gazmira Brahushi, Uzair Ahmad","doi":"10.1109/ISCMI56532.2022.10068466","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068466","url":null,"abstract":"In this paper, we have evaluated our hybrid two-way recommendation system with expert-ranked resumes and job descriptions. The aim of the paper is to compare the lists produced by the recommendation system with human-ranked lists for candidate and job descriptions. Firstly, we set up four scenarios such as the matching of resume to resumes, job to jobs, resume to jobs, and job to resumes, and prepared a human ranking based on the content similarity on a total of 400 documents. Based on this annotated corpus we tested our system to calculate the cosine-similarity-based ranking for each scenario using the Global Vectors for Word Embeddings and Term Frequency-Inverse Document Frequency representations. Finally, we compared the similarities of human ranked lists and system-ranked lists by using the Rank Biased Overlap (RBO) similarity score. In both methods, GloVe and TF-IDF, the median RBO between human-ranked lists and system ranked are greater than 0.5. The highest median score is achieved on TF-IDF with a slight difference compared to GloVe apart from the ranking of resume-to-resume scenario where the variation between the two methods is considerable. This is due to the similarity between human-ranked lists and program-generated lists.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"93 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131434223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cooperative Population-Based Method for Solving the Max-Min Knapsack Problem with Multi-scenarios 基于协作群体的多场景最大最小背包问题求解方法
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068488
Méziane Aïder, M. Hifi, Khadidja Latram
In this paper, we study the max-min knapsack problem with multi-scenarios, where a cooperative population based method is designed for approximately solving it. Its instance is represented by a knapsack of fixed capacity, a set of items (with weights and profits) and possible scenarios related to overall items. Its goal is to select a subset of items whose total weight fills the knapsack, and whose total profit is maximized in the worst scenario according the whole scenarios. The designed method is based upon the grey wolf optimizer, where a series of local searches are employed for highlighting the performance of the method. It starts with a reference set of positions related to the wolves, which is provided with a random greedy procedure. In order to enhance the behavior of the standard version, a series of exploring strategies is employed. Next, in order to avoid premature convergence, a drop and rebuild strategy is added hopping to exploit new unexplored subspaces. Finally, the behavior of the method is computationally analyzed on benchmark instances of the literature, where its provided results are compared to the best results available in the literature. Encouraging results have been obtained.
本文研究了多场景下的最大最小背包问题,设计了一种基于协作种群的方法来近似求解该问题。它的实例由一个固定容量的背包、一组物品(具有重量和利润)以及与总体物品相关的可能场景表示。它的目标是选择一个物品的子集,这些物品的总重量填满背包,并且根据整个场景,其总利润在最坏情况下最大。设计的方法基于灰狼优化器,采用一系列局部搜索来突出方法的性能。它从与狼相关的位置的参考集开始,该参考集提供了一个随机贪婪过程。为了提高标准版的行为,采用了一系列的探索策略。其次,为了避免过早收敛,添加了跳跃的掉落和重建策略来开发新的未探索的子空间。最后,在文献的基准实例上对该方法的行为进行计算分析,并将其提供的结果与文献中可用的最佳结果进行比较。取得了令人鼓舞的成果。
{"title":"A Cooperative Population-Based Method for Solving the Max-Min Knapsack Problem with Multi-scenarios","authors":"Méziane Aïder, M. Hifi, Khadidja Latram","doi":"10.1109/ISCMI56532.2022.10068488","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068488","url":null,"abstract":"In this paper, we study the max-min knapsack problem with multi-scenarios, where a cooperative population based method is designed for approximately solving it. Its instance is represented by a knapsack of fixed capacity, a set of items (with weights and profits) and possible scenarios related to overall items. Its goal is to select a subset of items whose total weight fills the knapsack, and whose total profit is maximized in the worst scenario according the whole scenarios. The designed method is based upon the grey wolf optimizer, where a series of local searches are employed for highlighting the performance of the method. It starts with a reference set of positions related to the wolves, which is provided with a random greedy procedure. In order to enhance the behavior of the standard version, a series of exploring strategies is employed. Next, in order to avoid premature convergence, a drop and rebuild strategy is added hopping to exploit new unexplored subspaces. Finally, the behavior of the method is computationally analyzed on benchmark instances of the literature, where its provided results are compared to the best results available in the literature. Encouraging results have been obtained.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"50 s1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132389860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Gain-Ant Colony Algorithm for Green Vehicle Routing Problem 绿色车辆路径问题的混合增益-蚁群算法
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068439
V. Sangeetha, R. Krishankumar, K. S. Ravichandran, A. Gandomi
Increasing carbon emissions, and thus footprint, is one of the main reasons for the imbalance in environmental sustainability, which is primarily contributed to transportation. Transportation is a core functionality of logistics distribution and supply chain. In this paper, a hybrid gain-ant colony optimization and fruit fly optimization algorithm for green vehicle routing problem is proposed to plan shortest paths with reduced total fuel consumption efficiently. The proposed algorithm was simulated using the Erdogan and Miller Hooks dataset and compared with best-known solutions and existing methods.
不断增加的碳排放和足迹是造成环境可持续性失衡的主要原因之一,而这主要是由交通造成的。运输是物流配送和供应链的核心功能。针对绿色车辆路径问题,提出了一种增益-蚁群优化和果蝇优化的混合算法,以有效地规划出最短路径并降低总油耗。提出的算法使用埃尔多安和米勒胡克斯数据集进行模拟,并与最知名的解决方案和现有方法进行比较。
{"title":"A Hybrid Gain-Ant Colony Algorithm for Green Vehicle Routing Problem","authors":"V. Sangeetha, R. Krishankumar, K. S. Ravichandran, A. Gandomi","doi":"10.1109/ISCMI56532.2022.10068439","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068439","url":null,"abstract":"Increasing carbon emissions, and thus footprint, is one of the main reasons for the imbalance in environmental sustainability, which is primarily contributed to transportation. Transportation is a core functionality of logistics distribution and supply chain. In this paper, a hybrid gain-ant colony optimization and fruit fly optimization algorithm for green vehicle routing problem is proposed to plan shortest paths with reduced total fuel consumption efficiently. The proposed algorithm was simulated using the Erdogan and Miller Hooks dataset and compared with best-known solutions and existing methods.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114235528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhanced RecycleNet for Efficient Waste Classification 强化 RecycleNet,实现高效废物分类
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068455
Bhagawat Adhikari, R. Ranabhat, Mohammad Mizanur Rahman, R. Kashef
Segregation of recyclable waste items is one of the crucial aspects of smart cities and their industrial applications. CNN-based machine learning models are widely used to predict and classify image datasets. Traditional deep learning models are fast in training the image dataset, but the classification accuracy is usually too low. Different densely connected CNN architectures are widely used to improve the accuracy in the image waste classification. Despite the remarkable accuracy in such densely connected models, these models often suffer from high computational complexity during the training phase. To overcome this computational complexity, DenseNet121 has been developed, which reduces the training time due to its unique dense block architecture. RecycleNet is a modification of DenseNet121 where the skip connections in the dense block architecture are changed to reduce the computational complexity. In this paper, we propose a unique model called Enhanced RecycleNet, where the skip connections between the dense block architecture are reduced to one-third than in the DenseNet121 model. This unique architecture has improved the model's performance by 46.3% and decreased the trainable parameters from 7 million to about 2.4 million.
可回收垃圾的分类是智慧城市及其工业应用的重要方面之一。基于 CNN 的机器学习模型被广泛用于预测和分类图像数据集。传统的深度学习模型在训练图像数据集时速度很快,但分类准确率通常太低。不同的密集连接 CNN 架构被广泛用于提高图像垃圾分类的准确性。尽管这类密集连接模型的准确率很高,但这些模型在训练阶段往往存在计算复杂度高的问题。为了克服这种计算复杂性,DenseNet121 应运而生,其独特的密集块架构缩短了训练时间。RecycleNet 是对 DenseNet 121 的改进,改变了密集块架构中的跳转连接,以降低计算复杂度。在本文中,我们提出了一种名为增强型 RecycleNet 的独特模型,其中密集块架构之间的跳转连接比 DenseNet121 模型减少了三分之一。这种独特的架构使模型的性能提高了 46.3%,可训练参数从 700 万个减少到约 240 万个。
{"title":"Enhanced RecycleNet for Efficient Waste Classification","authors":"Bhagawat Adhikari, R. Ranabhat, Mohammad Mizanur Rahman, R. Kashef","doi":"10.1109/ISCMI56532.2022.10068455","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068455","url":null,"abstract":"Segregation of recyclable waste items is one of the crucial aspects of smart cities and their industrial applications. CNN-based machine learning models are widely used to predict and classify image datasets. Traditional deep learning models are fast in training the image dataset, but the classification accuracy is usually too low. Different densely connected CNN architectures are widely used to improve the accuracy in the image waste classification. Despite the remarkable accuracy in such densely connected models, these models often suffer from high computational complexity during the training phase. To overcome this computational complexity, DenseNet121 has been developed, which reduces the training time due to its unique dense block architecture. RecycleNet is a modification of DenseNet121 where the skip connections in the dense block architecture are changed to reduce the computational complexity. In this paper, we propose a unique model called Enhanced RecycleNet, where the skip connections between the dense block architecture are reduced to one-third than in the DenseNet121 model. This unique architecture has improved the model's performance by 46.3% and decreased the trainable parameters from 7 million to about 2.4 million.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Speed and Accuracy Trade-off in Machine Learning Models via Stochastic Gradient Descent Approximation 基于随机梯度下降逼近的机器学习模型速度与精度权衡优化
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068476
Jasper Kyle Catapang
Stochastic gradient descent (SGD) is a widely used optimization algorithm for training machine learning models. However, due to its slow convergence and high variance, SGD can be difficult to use in practice. In this paper, the author proposes the use of the 4th order Runge-Kutta-Nyström (RKN) method to approximate the gradient function in SGD and replace the Newton boosting and SGD found in XGBoost and multilayer perceptrons (MLPs), respectively. The new variants are called ASTRA-Boost and ASTRA perceptron, where ASTRA stands for “Accuracy-Speed Trade-off Reduction via Approximation”. Specifically, the ASTRA models, through the 4th order Runge-Kutta-Nyström, converge faster than MLP with SGD and they also produce lower variance outputs, all without compromising model accuracy and overall performance.
随机梯度下降(SGD)是一种广泛应用于机器学习模型训练的优化算法。然而,由于其收敛缓慢和高方差,SGD在实践中很难使用。在本文中,作者提出使用四阶Runge-Kutta-Nyström (RKN)方法来近似SGD中的梯度函数,并分别取代XGBoost和多层感知器(mlp)中的牛顿增强和SGD。新的变体被称为ASTRA- boost和ASTRA感知器,其中ASTRA代表“通过近似降低精度-速度权衡”。具体来说,ASTRA模型通过4阶Runge-Kutta-Nyström比具有SGD的MLP收敛得更快,并且它们还产生更低的方差输出,所有这些都不会影响模型的准确性和整体性能。
{"title":"Optimizing Speed and Accuracy Trade-off in Machine Learning Models via Stochastic Gradient Descent Approximation","authors":"Jasper Kyle Catapang","doi":"10.1109/ISCMI56532.2022.10068476","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068476","url":null,"abstract":"Stochastic gradient descent (SGD) is a widely used optimization algorithm for training machine learning models. However, due to its slow convergence and high variance, SGD can be difficult to use in practice. In this paper, the author proposes the use of the 4th order Runge-Kutta-Nyström (RKN) method to approximate the gradient function in SGD and replace the Newton boosting and SGD found in XGBoost and multilayer perceptrons (MLPs), respectively. The new variants are called ASTRA-Boost and ASTRA perceptron, where ASTRA stands for “Accuracy-Speed Trade-off Reduction via Approximation”. Specifically, the ASTRA models, through the 4th order Runge-Kutta-Nyström, converge faster than MLP with SGD and they also produce lower variance outputs, all without compromising model accuracy and overall performance.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114688930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Priority-First Search and Mining Popular Packages 优先级搜索和挖掘热门软件包
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068470
Yangjun Chen, Bobin Chen
By the package design problem we are given a set of queries (referred to as a query log) with each being a bit string indicating the favourite activities or items of customers and required to design a package of activities (or items) to satisfy as many customers as possible. It is a typical problem of data mining. In this paper, we address this issue and propose an efficient algorithm for solving the problem based on a new tree search strategy, the so-called priority-first search, by which the tree search is controlled by using a priority queue, instead of a stack or a queue data structure. Extensive experiments have been conducted, which show that our method for this problem is promising.
通过包装设计问题,我们得到一组查询(称为查询日志),每个查询都是一个位字符串,表示客户最喜欢的活动或项目,并要求设计一个活动(或项目)包以满足尽可能多的客户。这是一个典型的数据挖掘问题。在本文中,我们解决了这个问题,并提出了一种基于新的树搜索策略的有效算法来解决这个问题,即所谓的优先级优先搜索,通过使用优先级队列来控制树搜索,而不是堆栈或队列数据结构。大量的实验表明,我们的方法解决这个问题是有希望的。
{"title":"Priority-First Search and Mining Popular Packages","authors":"Yangjun Chen, Bobin Chen","doi":"10.1109/ISCMI56532.2022.10068470","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068470","url":null,"abstract":"By the package design problem we are given a set of queries (referred to as a query log) with each being a bit string indicating the favourite activities or items of customers and required to design a package of activities (or items) to satisfy as many customers as possible. It is a typical problem of data mining. In this paper, we address this issue and propose an efficient algorithm for solving the problem based on a new tree search strategy, the so-called priority-first search, by which the tree search is controlled by using a priority queue, instead of a stack or a queue data structure. Extensive experiments have been conducted, which show that our method for this problem is promising.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"131 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130243764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Texture Analysis on Digital Microscopic Leather Images For Species Identification 用于物种识别的数字显微皮革图像纹理分析
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068472
Anjli Varghese, M. Jawahar, A. Prince, A. Gandomi
This paper describes the relevance of texture analysis on leather images. The aim is to improve the prediction accuracy by quantifying the morphological and statistical behavior of the leather images. Hence, the present work proposed to combine the multi-resolution discrete wavelet transform (DWT) and local binary pattern (LBP) texture operators. The hybrid texture features (DWT + LBP) offer better species-specific feature discrimination. This work adopts a multi-layer perceptron (MLP) model to evaluate the discriminatory behavior of the texture features. The proposed work extract, analyze and learn the species' distinct texture features of the novel digital microscopic leather image data. The experimental results noted a significant improvement in species prediction with 99.58% accuracy. Therefore, texture analysis elevates the ability to interpret the leather images per species. It is thus a necessary key to learn the permissible leather species' behavior so as to prevent the trade of non-permissible leather and its products.
本文阐述了皮革图像纹理分析的相关性。目的是通过量化皮革图像的形态和统计行为来提高预测精度。因此,本文提出将多分辨率离散小波变换(DWT)与局部二值模式(LBP)纹理算子相结合。混合纹理特征(DWT + LBP)提供了更好的物种特异性特征识别。本文采用多层感知器(MLP)模型来评估纹理特征的区别行为。本文对新型数字显微皮革图像数据中物种鲜明的纹理特征进行提取、分析和学习。实验结果表明,物种预测准确率显著提高,达到99.58%。因此,纹理分析提高了解释每个物种皮革图像的能力。因此,了解允许皮革品种的行为,是防止不允许皮革及其制品贸易的必要关键。
{"title":"Texture Analysis on Digital Microscopic Leather Images For Species Identification","authors":"Anjli Varghese, M. Jawahar, A. Prince, A. Gandomi","doi":"10.1109/ISCMI56532.2022.10068472","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068472","url":null,"abstract":"This paper describes the relevance of texture analysis on leather images. The aim is to improve the prediction accuracy by quantifying the morphological and statistical behavior of the leather images. Hence, the present work proposed to combine the multi-resolution discrete wavelet transform (DWT) and local binary pattern (LBP) texture operators. The hybrid texture features (DWT + LBP) offer better species-specific feature discrimination. This work adopts a multi-layer perceptron (MLP) model to evaluate the discriminatory behavior of the texture features. The proposed work extract, analyze and learn the species' distinct texture features of the novel digital microscopic leather image data. The experimental results noted a significant improvement in species prediction with 99.58% accuracy. Therefore, texture analysis elevates the ability to interpret the leather images per species. It is thus a necessary key to learn the permissible leather species' behavior so as to prevent the trade of non-permissible leather and its products.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129396632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Naïve Bayes with Negation Handling for Sentiment Analysis of Twitter Data Naïve推特数据情感分析的贝叶斯与否定处理
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068474
Lobna H. Kamal, Gerard McKee, N. A. Othman
This paper proposes an enhanced negation handling technique for sentiment analysis of Twitter data using the Naïve Bayes algorithm and Part-of-Speech (POS) tagging. Negation handling detects negated content in text and can thus improve sentiment prediction. The proposed technique focuses on the detection of direct negation words such as “not” and “no”, and implicitly negated content such as “could have been” and “should have been”. The paper compares the proposed negation handling technique with an existing negation handling technique. The Sentiment140 dataset is used in the experiments. Naïve Bayes with the proposed negation handling technique gave an accuracy of 77.57% while the accuracy of the Naïve Bayes with the existing negation handling was 76.93% and the accuracy of the standard Naïve Bayes was 76.12 % for a dataset of 1,000,000 tweets. Of these 1,000,000 tweets 197,381 contained one or more negations. Taking these negated tweets alone, the proposed technique showed an improvement over the existing technique and standard Naïve Bayes with accuracies respectively of 76.51%, 75.98%, and 75.09%. The improvements and shortcomings of the proposed technique are discussed.
本文提出了一种使用Naïve贝叶斯算法和词性标注的Twitter数据情感分析的增强否定处理技术。否定处理检测文本中的否定内容,从而可以提高情绪预测。该技术侧重于直接否定词(如“not”和“no”)和隐含否定内容(如“could have been”和“should have been”)的检测。本文将提出的否定处理技术与现有的否定处理技术进行了比较。实验中使用Sentiment140数据集。对于1,000,000条推文数据集,Naïve贝叶斯的否定处理准确率为77.57%,而现有的Naïve贝叶斯的否定处理准确率为76.93%,标准Naïve贝叶斯的准确率为76.12%。在这1,000,000条tweet中,197,381条包含一个或多个否定。单独考虑这些否定推文,本文提出的技术比现有技术和标准Naïve贝叶斯有了改进,准确率分别为76.51%、75.98%和75.09%。讨论了该技术的改进和不足。
{"title":"Naïve Bayes with Negation Handling for Sentiment Analysis of Twitter Data","authors":"Lobna H. Kamal, Gerard McKee, N. A. Othman","doi":"10.1109/ISCMI56532.2022.10068474","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068474","url":null,"abstract":"This paper proposes an enhanced negation handling technique for sentiment analysis of Twitter data using the Naïve Bayes algorithm and Part-of-Speech (POS) tagging. Negation handling detects negated content in text and can thus improve sentiment prediction. The proposed technique focuses on the detection of direct negation words such as “not” and “no”, and implicitly negated content such as “could have been” and “should have been”. The paper compares the proposed negation handling technique with an existing negation handling technique. The Sentiment140 dataset is used in the experiments. Naïve Bayes with the proposed negation handling technique gave an accuracy of 77.57% while the accuracy of the Naïve Bayes with the existing negation handling was 76.93% and the accuracy of the standard Naïve Bayes was 76.12 % for a dataset of 1,000,000 tweets. Of these 1,000,000 tweets 197,381 contained one or more negations. Taking these negated tweets alone, the proposed technique showed an improvement over the existing technique and standard Naïve Bayes with accuracies respectively of 76.51%, 75.98%, and 75.09%. The improvements and shortcomings of the proposed technique are discussed.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134026302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Agent-Based Document Expansion for Information Retrieval Based on Topic Modeling of Local Information 基于局部信息主题建模的基于agent的信息检索文档扩展
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068457
Oliver Strauß, Damian Kutzias, H. Kett
With the advent of data ecosystems finding information in distributed and federated catalogs and marketplaces becomes more and more important. One of the problems in data search and search in general is the mismatch between the terminology of users and of the searched items, be it dataset metadata or web pages. The paper proposes an agent-based approach to document expansion (ADE). The idea is to represent documents with agents that exploit local information collected from user searches and relevant signals to improve the representation of the document in a search index and subsequently to improve the search performance of the system. The agents collect terms from relevant queries and perform topic modeling on these terms and publish different variants expanded with the topic terms to the search index. We find that the approach achieves good improvement in search performance and is a valuable tool because is places no burden on the information retrieval pipeline and is complementary to other document expansion and information retrieval approaches.
随着数据生态系统的出现,在分布式和联合目录和市场中查找信息变得越来越重要。数据搜索和一般搜索中的一个问题是用户术语和搜索项之间的不匹配,无论是数据集元数据还是网页。提出了一种基于agent的文档扩展方法。其思想是用代理来表示文档,代理利用从用户搜索中收集的本地信息和相关信号来改进文档在搜索索引中的表示,从而提高系统的搜索性能。代理从相关查询中收集术语,对这些术语执行主题建模,并将随主题术语展开的不同变体发布到搜索索引中。我们发现该方法在搜索性能上取得了很好的提高,并且由于它不增加信息检索管道的负担,并且是其他文档扩展和信息检索方法的补充,是一种有价值的工具。
{"title":"Agent-Based Document Expansion for Information Retrieval Based on Topic Modeling of Local Information","authors":"Oliver Strauß, Damian Kutzias, H. Kett","doi":"10.1109/ISCMI56532.2022.10068457","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068457","url":null,"abstract":"With the advent of data ecosystems finding information in distributed and federated catalogs and marketplaces becomes more and more important. One of the problems in data search and search in general is the mismatch between the terminology of users and of the searched items, be it dataset metadata or web pages. The paper proposes an agent-based approach to document expansion (ADE). The idea is to represent documents with agents that exploit local information collected from user searches and relevant signals to improve the representation of the document in a search index and subsequently to improve the search performance of the system. The agents collect terms from relevant queries and perform topic modeling on these terms and publish different variants expanded with the topic terms to the search index. We find that the approach achieves good improvement in search performance and is a valuable tool because is places no burden on the information retrieval pipeline and is complementary to other document expansion and information retrieval approaches.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132931354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heart Disease Prediction Using Hybrid Machine Learning Model Based on Decision Tree and Neural Network 基于决策树和神经网络的混合机器学习模型的心脏病预测
Pub Date : 2022-11-26 DOI: 10.1109/ISCMI56532.2022.10068473
Mostafa Bakhshi, S. L. Mirtaheri, S. Greco
Cardiovascular disease is the leading cause of death in the world. Nowadays, tremendous amount of data is collected on heart disease. Investigating the data and obtaining insight using data mining can improve the detection and prevention rate, especially in early stages. So far, many researches are performed on data mining models for diagnoses. In this paper, we intend to present a model for the diagnosis of heart disease using a feature-based approach as a preprocessing step. The proposed solution include four main steps as preprocessing the data, selecting effective features, clustering by using the K-Means algorithm and proposing a hybrid model of decision tree and neural network to determine the disease. In selecting the effective features, we use three methods as Pearson correlation coefficient, information gain, and component analysis. The evaluation results confirm that the proposed hybrid model outperforms the existing methods by 0.97 accuracy.
心血管疾病是世界上导致死亡的主要原因。如今,人们收集了大量关于心脏病的数据。使用数据挖掘对数据进行调查并获得洞察力可以提高检测和预防率,特别是在早期阶段。目前,针对诊断数据挖掘模型的研究较多。在本文中,我们打算使用基于特征的方法作为预处理步骤,提出一个心脏病诊断模型。该方案主要包括数据预处理、选择有效特征、K-Means算法聚类以及提出决策树和神经网络混合模型进行疾病诊断四个主要步骤。在选择有效特征时,我们使用了Pearson相关系数、信息增益和成分分析三种方法。评价结果表明,所提混合模型的准确率比现有方法提高了0.97。
{"title":"Heart Disease Prediction Using Hybrid Machine Learning Model Based on Decision Tree and Neural Network","authors":"Mostafa Bakhshi, S. L. Mirtaheri, S. Greco","doi":"10.1109/ISCMI56532.2022.10068473","DOIUrl":"https://doi.org/10.1109/ISCMI56532.2022.10068473","url":null,"abstract":"Cardiovascular disease is the leading cause of death in the world. Nowadays, tremendous amount of data is collected on heart disease. Investigating the data and obtaining insight using data mining can improve the detection and prevention rate, especially in early stages. So far, many researches are performed on data mining models for diagnoses. In this paper, we intend to present a model for the diagnosis of heart disease using a feature-based approach as a preprocessing step. The proposed solution include four main steps as preprocessing the data, selecting effective features, clustering by using the K-Means algorithm and proposing a hybrid model of decision tree and neural network to determine the disease. In selecting the effective features, we use three methods as Pearson correlation coefficient, information gain, and component analysis. The evaluation results confirm that the proposed hybrid model outperforms the existing methods by 0.97 accuracy.","PeriodicalId":340397,"journal":{"name":"2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133262801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1