首页 > 最新文献

2022 25th International Conference on Computer and Information Technology (ICCIT)最新文献

英文 中文
Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language 率洞察:不同机器学习和深度学习方法在孟加拉语产品评论评级预测中的比较研究
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10055515
R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah
In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from "1" to "5". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.
在这个数字营销的当代时代,电子商务已经成为最受欢迎的日常购物方式之一。自新冠肺炎疫情以来,网上购物行为永远改变为人与人之间的互动减少或根本没有。因此,电子商务企业越来越难以观察和评估市场趋势,特别是通过消费者行为分析来进行观察和评估。为了识别行为模式和客户评价-评级差异,对产品评论的广泛分析是一个重要的研究领域。缺乏基准语料库和语言处理技术,预测孟加拉语的评论评分已经变得越来越成问题。本文利用从电子商务网站DarazBD1收集的我们自己构建的数据集,深入分析了孟加拉语文本评论的产品评论评级预测方法。我们获得了带有标签的产品评论,这些标签被称为从“1”到“5”的五个情感等级的评级。值得注意的是,我们使用自动抓取系统建立了一个平衡良好的数据集,并且通过人工注释过程花费了大量的时间和精力来维持质量标准。探索了逻辑回归、随机森林、多项naïve贝叶斯、支持向量机等多种机器学习模型方法,SVM的分类准确率得分最高,达到78.63%。随后,将Word2Vec、FastText和GloVe与CNN、Bi-LSTM以及CNN和Bi-LSTM的组合三种深度神经网络(DNN)架构进行嵌入,CNN+Bi-LSTM在DNN架构中准确率最高,达到75.25%。
{"title":"Rate Insight: A Comparative Study on Different Machine Learning and Deep Learning Approaches for Product Review Rating Prediction in Bengali Language","authors":"R. Chowdhury, Farhad Uz Zaman, Arman Sharker, Mashfiq Rahman, F. Shah","doi":"10.1109/ICCIT57492.2022.10055515","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10055515","url":null,"abstract":"In this contemporary era of digital marketing, ecommerce has emerged as one of the most preferred methods for day-to-day shopping. Ever since the COVID-19 pandemic, online shopping behavior has forever changed to less or no human-to-human interaction. As a result, it is getting more difficult for e-commerce enterprises to observe and evaluate market trends, particularly when done through consumer behavior analysis. To identify behavioral patterns and customer review-rating discrepancies, extensive analysis of product reviews is a substantial research field. Lack of benchmark corpora and language processing techniques, predicting review ratings in Bengali has become increasingly problematic. This paper thoroughly analyzes the approach to product review rating prediction for Bengali text reviews exploiting our own constructed dataset that was collected from an e-commerce website called DarazBD1. We acquired product reviews with labels known as ratings of five sentiment classes, from \"1\" to \"5\". It is noteworthy that we established a well-balanced dataset using our automated scraping system and a significant amount of time and effort is spent to maintain quality standards through the human annotation process. Exploration of multiple approaches to machine learning models such as logistic regression, random forest, multinomial naïve Bayes, and support vector machine, the best classification accuracy score of 78.63% is achieved by SVM. Subsequently, using Word2Vec, FastText, and GloVe embeddings with three deep neural network(DNN) architectures: CNN, Bi-LSTM, and a combination of CNN and Bi-LSTM, CNN+Bi-LSTM gave the highest accuracy score of 75.25% among the DNN architectures.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132033898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing a Bangla Parser using TRIE Based on Deterministic Finite Automata 基于确定性有限自动机的TRIE孟加拉语解析器设计
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10056008
K. Hasan, Md. Sakhawat Hossain, Md. Abdulla Al-Sun, Md. Mostafizur Rahman
We describe a new method of parsing Bangla language based on Deterministic Finite Automata (DFA) and implement the parser using a TRIE data structure. Hence we call the parser as TRIE parser. TRIE parser successfully parses sentences faster than other important parsing schemes as it needs no formal rules, no parameters and no Context Free Grammars (CFG). the scheme stores the Bangla grammar symbols or Pasts Of Speech (POS) as a state of the DFA and process a sentence following the operations of a DFA. If the set of POS symbols reaches to final state, then parsing is successful otherwise unsuccessful. The parser uses the grammar rules in compressed form hence it becomes very less space consuming. Therefore, it can be implemented in light weight fashion in main memory. The TRIE parser is compared with two other parsers and it shows that the proposed TRIE parser outperforms others in terms of processing time with an increasing number of sentences in the input paragraph. Necessary figures and examples are used to properly explain the proposed TRIE parser.
本文提出了一种基于确定性有限自动机(Deterministic Finite Automata, DFA)的孟加拉语解析新方法,并使用TRIE数据结构实现该解析器。因此,我们将解析器称为TRIE解析器。TRIE解析器比其他重要的解析方案更快地解析句子,因为它不需要正式规则,不需要参数,也不需要上下文无关语法(CFG)。该方案将孟加拉语语法符号或词性过去时(POS)存储为DFA的状态,并按照DFA的操作处理句子。如果一组POS符号达到最终状态,则解析成功,否则不成功。解析器使用压缩形式的语法规则,因此它消耗的空间非常少。因此,它可以在主存中以轻量级的方式实现。将TRIE解析器与另外两种解析器进行比较,结果表明,随着输入段落中句子数量的增加,所建议的TRIE解析器在处理时间方面优于其他解析器。使用必要的图和示例来正确解释所建议的TRIE解析器。
{"title":"Designing a Bangla Parser using TRIE Based on Deterministic Finite Automata","authors":"K. Hasan, Md. Sakhawat Hossain, Md. Abdulla Al-Sun, Md. Mostafizur Rahman","doi":"10.1109/ICCIT57492.2022.10056008","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10056008","url":null,"abstract":"We describe a new method of parsing Bangla language based on Deterministic Finite Automata (DFA) and implement the parser using a TRIE data structure. Hence we call the parser as TRIE parser. TRIE parser successfully parses sentences faster than other important parsing schemes as it needs no formal rules, no parameters and no Context Free Grammars (CFG). the scheme stores the Bangla grammar symbols or Pasts Of Speech (POS) as a state of the DFA and process a sentence following the operations of a DFA. If the set of POS symbols reaches to final state, then parsing is successful otherwise unsuccessful. The parser uses the grammar rules in compressed form hence it becomes very less space consuming. Therefore, it can be implemented in light weight fashion in main memory. The TRIE parser is compared with two other parsers and it shows that the proposed TRIE parser outperforms others in terms of processing time with an increasing number of sentences in the input paragraph. Necessary figures and examples are used to properly explain the proposed TRIE parser.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132214725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aspect-Based Sentiment Analysis of Bangla Comments on Entertainment Domain 基于方面的孟加拉语娱乐评论情感分析
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10055705
N. Sultana, R. Sultana, Risul Islam Rasel, M. M. Hoque
Low-resource natural language processing is getting more attention nowadays. Aspect-Based Sentiment Analysis (ABSA) from a high-resource language such as English becomes unchallenging because of sufficient datasets and experimentation tools. However, Aspect-Based Sentiment Analysis from low-resource languages such as Bangla is quite hard. So, many researchers are investing their time and knowledge in low-resource natural language processing. In this paper, we are proposing a Bangla Aspect-Based Sentiment Analysis model using Bangla natural language processing. We have collected 4012 Bangla text comments related to cricket, drama, movie, and music from YouTube. We have applied some very prominent supervised machine learning techniques such as Support Vector Classifier (SVC), Random Forest (RF), and Linear Regression (LR). We have achieved more than 75% accuracy in classifying positive, negative, and neutral sentiments and 80% accuracy in extracting aspects from Bangla texts. Finally, we used publicly available datasets to test our proposed model's generalizability. Furthermore, we find that our proposed approach surpasses earlier related research.
低资源的自然语言处理越来越受到人们的关注。由于有足够的数据集和实验工具,来自高资源语言(如英语)的基于方面的情感分析(ABSA)变得没有挑战性。然而,对于像孟加拉语这样的低资源语言,基于方面的情感分析是相当困难的。因此,许多研究人员将他们的时间和知识投入到低资源的自然语言处理中。在本文中,我们提出了一个使用孟加拉语自然语言处理的基于孟加拉语方面的情感分析模型。我们从YouTube上收集了4012条与板球、戏剧、电影和音乐相关的孟加拉文评论。我们应用了一些非常突出的监督机器学习技术,如支持向量分类器(SVC)、随机森林(RF)和线性回归(LR)。我们对正面、负面和中性情绪的分类准确率达到75%以上,从孟加拉语文本中提取方面的准确率达到80%以上。最后,我们使用公开可用的数据集来测试我们提出的模型的泛化性。此外,我们发现我们所提出的方法超越了先前的相关研究。
{"title":"Aspect-Based Sentiment Analysis of Bangla Comments on Entertainment Domain","authors":"N. Sultana, R. Sultana, Risul Islam Rasel, M. M. Hoque","doi":"10.1109/ICCIT57492.2022.10055705","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10055705","url":null,"abstract":"Low-resource natural language processing is getting more attention nowadays. Aspect-Based Sentiment Analysis (ABSA) from a high-resource language such as English becomes unchallenging because of sufficient datasets and experimentation tools. However, Aspect-Based Sentiment Analysis from low-resource languages such as Bangla is quite hard. So, many researchers are investing their time and knowledge in low-resource natural language processing. In this paper, we are proposing a Bangla Aspect-Based Sentiment Analysis model using Bangla natural language processing. We have collected 4012 Bangla text comments related to cricket, drama, movie, and music from YouTube. We have applied some very prominent supervised machine learning techniques such as Support Vector Classifier (SVC), Random Forest (RF), and Linear Regression (LR). We have achieved more than 75% accuracy in classifying positive, negative, and neutral sentiments and 80% accuracy in extracting aspects from Bangla texts. Finally, we used publicly available datasets to test our proposed model's generalizability. Furthermore, we find that our proposed approach surpasses earlier related research.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131022975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BloodComm: A Peer-to-Peer Blockchain-based Community for Blood Donation Network BloodComm:一个基于点对点区块链的献血网络社区
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10055757
Chowdhury Mohammad Abdullah, M. Kamal, Fairuz Shaiara, A. Kamal, Md. Azam Hossain
Blood transfusion is an integral part of the healthcare system that plays an important role in ensuring the quality of care for patients undergoing a variety of medical procedures and treatments. A large portion of this blood comes from voluntary donors. The existing blood donor management systems are unable to offer a reliable audit trail and traceability. Hence, there is a significant risk that patients may get transfusion of blood from unreliable sources. In this paper, we propose a system built on Ethereum with the goal of creating a decentralized, transparent, traceable, and secure network of blood donors. The platform uses smart contracts to facilitate peer-to-peer interactions. To encourage donors to donate blood more regularly, the system also offers rewards in the form of tokens. Our source code is available in a public Github repository1.
输血是医疗保健系统的一个组成部分,在确保接受各种医疗程序和治疗的患者的护理质量方面发挥着重要作用。其中很大一部分血液来自自愿献血者。现有的献血者管理系统无法提供可靠的审计跟踪和可追溯性。因此,患者可能从不可靠的来源接受输血的风险很大。在本文中,我们提出了一个基于以太坊的系统,其目标是创建一个分散、透明、可追踪和安全的献血者网络。该平台使用智能合约来促进点对点交互。为了鼓励献血者更经常地献血,该系统还以代币的形式提供奖励。我们的源代码可以在Github公共存储库中获得1。
{"title":"BloodComm: A Peer-to-Peer Blockchain-based Community for Blood Donation Network","authors":"Chowdhury Mohammad Abdullah, M. Kamal, Fairuz Shaiara, A. Kamal, Md. Azam Hossain","doi":"10.1109/ICCIT57492.2022.10055757","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10055757","url":null,"abstract":"Blood transfusion is an integral part of the healthcare system that plays an important role in ensuring the quality of care for patients undergoing a variety of medical procedures and treatments. A large portion of this blood comes from voluntary donors. The existing blood donor management systems are unable to offer a reliable audit trail and traceability. Hence, there is a significant risk that patients may get transfusion of blood from unreliable sources. In this paper, we propose a system built on Ethereum with the goal of creating a decentralized, transparent, traceable, and secure network of blood donors. The platform uses smart contracts to facilitate peer-to-peer interactions. To encourage donors to donate blood more regularly, the system also offers rewards in the form of tokens. Our source code is available in a public Github repository1.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133048961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bangla Fake News Detection using Machine Learning, Deep Learning and Transformer Models 使用机器学习、深度学习和变压器模型检测孟加拉假新闻
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10055592
Risul Islam Rasel, Anower Hossen Zihad, N. Sultana, M. M. Hoque
News Categorization is one of the primary applications of Text Classification, especially, Fake news classification. In recent days, many researchers have done plenty of work on Fake news detection in rich resource languages like English. But, due to a lack of resources and language processing tools, research on low-resource languages like Bangla is still insignificant. In this study, we try to build a Bangla Fake news dataset combining newly collected fake news data and available secondary datasets. Previously available datasets contained redundant data, which we reduced in our experiment. Finally, we build a Fake news dataset that contains 4678 distinct news data. We experimented with our data with multiple Machine Learning (LR, SVM, KNN, MNB, Adaboost, and DT), Deep Neural Networks (LSTM, BiLSTM, CNN, LSTM-CNN, BiLSTM-CNN), and Transformer (Bangla-BERT, m-BERT) models to attain some state of the art results. The best performing models are CNN, CNN-LSTM, and BiLSTM, with the accuracy of 95.9%, 95.5%, and 95.3%, respectively. We also tested our models by applying the previously existing datasets, and we got a 1.4% to 3.4% improvement in accuracy from previous results. Besides accuracy improvement, our models show a significant increase in recall of fake news data compared to the prior studies.
新闻分类是文本分类的主要应用之一,尤其是假新闻分类。最近几天,许多研究人员在英语等资源丰富的语言中进行了大量的假新闻检测工作。但是,由于缺乏资源和语言处理工具,对孟加拉语等低资源语言的研究仍然微不足道。在这项研究中,我们试图将新收集的假新闻数据和现有的二手数据集相结合,建立一个孟加拉假新闻数据集。以前可用的数据集包含冗余数据,我们在实验中减少了冗余数据。最后,我们构建了一个包含4678个不同新闻数据的假新闻数据集。我们用多个机器学习(LR、SVM、KNN、MNB、Adaboost和DT)、深度神经网络(LSTM、BiLSTM、CNN、LSTM-CNN、BiLSTM-CNN)和Transformer (Bangla-BERT、m-BERT)模型对我们的数据进行了实验,以获得一些最先进的结果。表现最好的模型是CNN、CNN- lstm和BiLSTM,准确率分别为95.9%、95.5%和95.3%。我们还通过应用先前存在的数据集来测试我们的模型,我们得到了比以前的结果提高1.4%到3.4%的准确性。除了准确性的提高,我们的模型显示,与之前的研究相比,假新闻数据的召回率显著提高。
{"title":"Bangla Fake News Detection using Machine Learning, Deep Learning and Transformer Models","authors":"Risul Islam Rasel, Anower Hossen Zihad, N. Sultana, M. M. Hoque","doi":"10.1109/ICCIT57492.2022.10055592","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10055592","url":null,"abstract":"News Categorization is one of the primary applications of Text Classification, especially, Fake news classification. In recent days, many researchers have done plenty of work on Fake news detection in rich resource languages like English. But, due to a lack of resources and language processing tools, research on low-resource languages like Bangla is still insignificant. In this study, we try to build a Bangla Fake news dataset combining newly collected fake news data and available secondary datasets. Previously available datasets contained redundant data, which we reduced in our experiment. Finally, we build a Fake news dataset that contains 4678 distinct news data. We experimented with our data with multiple Machine Learning (LR, SVM, KNN, MNB, Adaboost, and DT), Deep Neural Networks (LSTM, BiLSTM, CNN, LSTM-CNN, BiLSTM-CNN), and Transformer (Bangla-BERT, m-BERT) models to attain some state of the art results. The best performing models are CNN, CNN-LSTM, and BiLSTM, with the accuracy of 95.9%, 95.5%, and 95.3%, respectively. We also tested our models by applying the previously existing datasets, and we got a 1.4% to 3.4% improvement in accuracy from previous results. Besides accuracy improvement, our models show a significant increase in recall of fake news data compared to the prior studies.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133158581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion of Shallow and Deep Features for Classifying Skin Lesions 融合浅、深特征分类皮肤病变
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10055219
Ishmamur Rahman, M. K. Islam, Abu Nowshed Chy, Muhammad Anwarul Azim
A skin lesion is an unusual change of skin tissues. While this can be caused by harmless skin diseases, there is also the chance of the lesion being cancerous. Skin cancer is one of the most common and deadly cancers in the world, which is caused by exposure to the ultraviolet radiation emitted by the sun. Due to the difficulty in visually differentiating between harmless and cancerous skin lesions, people are less likely to get medical attention straight away. Early diagnosis is crucial to ensure an effective treatment. Clinical and dermoscopy based diagnosis of cancerous skin lesions is costly, painful and sometimes inaccurate. Various researches report performing the classification of skin lesions using image processing techniques. Previous works in this domain are plenty, which reported fairly good results, where image processing and the use of both machine learning and deep learning models are seen. In this research, we propose a novel method which focused on important feature extraction, and fusing multiple features to improve the classification of malignant skin cells using traditional machine learning models, despite having imbalanced data distribution. The ISIC 2018 challenge dataset HAM10000 was used in our work. After preprocessing, we extracted shallow and deep features from the images. Shallow features consisted of position-wise color features and Scale Invariant Feature Transform (SIFT) features. Deep features were extracted by a transfer learning model MobileNetV3, which is pre-trained on Imagenet. These features were combined to form a more representative feature for the data. We parameter tuned five machine learning classifiers to do a binary classification on the processed data. The best accuracy, 81%, was obtained by using Support Vector Machine with an f1-score of 68%. Second best results were achieved by Random Forest Classifier, with an accuracy and F1-score of 80% and 67% respectively.
皮肤病变是皮肤组织的一种不寻常的变化。虽然这可能是由无害的皮肤病引起的,但也有可能是癌变的。皮肤癌是世界上最常见和最致命的癌症之一,它是由暴露在太阳发出的紫外线辐射下引起的。由于很难在视觉上区分无害和癌变的皮肤病变,人们不太可能立即就医。早期诊断对于确保有效治疗至关重要。基于临床和皮肤镜的诊断是昂贵的,痛苦的,有时是不准确的。各种研究报告使用图像处理技术对皮肤病变进行分类。以前在这个领域的工作很多,报告了相当好的结果,其中可以看到图像处理和机器学习和深度学习模型的使用。在这项研究中,我们提出了一种新的方法,该方法专注于重要特征的提取,并融合多个特征来改进传统机器学习模型对恶性皮肤细胞的分类,尽管数据分布不平衡。我们的工作使用了ISIC 2018挑战数据集HAM10000。预处理后,提取图像的浅层和深层特征。浅特征由位置颜色特征和尺度不变特征变换(SIFT)特征组成。利用迁移学习模型MobileNetV3提取深度特征,该模型在Imagenet上进行预训练。将这些特征组合起来,形成更有代表性的数据特征。我们对五个机器学习分类器进行了参数调整,以便对处理过的数据进行二元分类。支持向量机的准确率最高,为81%,f1分为68%。随机森林分类器获得了第二好的结果,准确率和f1得分分别为80%和67%。
{"title":"Fusion of Shallow and Deep Features for Classifying Skin Lesions","authors":"Ishmamur Rahman, M. K. Islam, Abu Nowshed Chy, Muhammad Anwarul Azim","doi":"10.1109/ICCIT57492.2022.10055219","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10055219","url":null,"abstract":"A skin lesion is an unusual change of skin tissues. While this can be caused by harmless skin diseases, there is also the chance of the lesion being cancerous. Skin cancer is one of the most common and deadly cancers in the world, which is caused by exposure to the ultraviolet radiation emitted by the sun. Due to the difficulty in visually differentiating between harmless and cancerous skin lesions, people are less likely to get medical attention straight away. Early diagnosis is crucial to ensure an effective treatment. Clinical and dermoscopy based diagnosis of cancerous skin lesions is costly, painful and sometimes inaccurate. Various researches report performing the classification of skin lesions using image processing techniques. Previous works in this domain are plenty, which reported fairly good results, where image processing and the use of both machine learning and deep learning models are seen. In this research, we propose a novel method which focused on important feature extraction, and fusing multiple features to improve the classification of malignant skin cells using traditional machine learning models, despite having imbalanced data distribution. The ISIC 2018 challenge dataset HAM10000 was used in our work. After preprocessing, we extracted shallow and deep features from the images. Shallow features consisted of position-wise color features and Scale Invariant Feature Transform (SIFT) features. Deep features were extracted by a transfer learning model MobileNetV3, which is pre-trained on Imagenet. These features were combined to form a more representative feature for the data. We parameter tuned five machine learning classifiers to do a binary classification on the processed data. The best accuracy, 81%, was obtained by using Support Vector Machine with an f1-score of 68%. Second best results were achieved by Random Forest Classifier, with an accuracy and F1-score of 80% and 67% respectively.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132565505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic Clustering of Bangla Natural Word using Different Word Embedding Techniques 基于不同词嵌入技术的孟加拉语自然词语义聚类
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10054703
Aroni Saha Prapty, K. Hasan
Natural language processing is referred to as NLP that applies computational techniques for inter-communication between human and computer through human natural language on the basis of computer science, computational linguistic and artificial intelligence. The progression of NLP in different revolutionary techniques, word embedding has brought magnificent changes in the field of computational linguistic, statistical inference and so on. Semantic clustering can be interpreted as classify the group of identical objects that are semantically analogous. The main focus of the work is to manifest different word embedding techniques for semantic clustering of natural Bangla words. Earlier N-gram models were applied for the relevant field but dynamic word clustering models are currently popular due to the advancement of deep learning techniques because they speed up memory retrieval and decrease processing time. We discuss the effectiveness of Word2Vec, TF-IDF, FastText and GloVe word embedding models in this work and appraise the performance based on the models accuracy and competence.
自然语言处理(Natural language processing,简称NLP)是在计算机科学、计算语言学和人工智能的基础上,运用计算技术,通过人类的自然语言实现人与计算机之间的相互通信。自然语言处理在不同革命性技术中的发展,如词嵌入,给计算语言学、统计推理等领域带来了巨大的变化。语义聚类可以解释为对语义相似的相同对象组进行分类。本文的研究重点是揭示不同的词嵌入技术对自然孟加拉语词语义聚类的影响。早期的N-gram模型被应用于相关领域,但由于深度学习技术的进步,动态词聚类模型目前很流行,因为它们加快了记忆检索和减少了处理时间。本文讨论了Word2Vec、TF-IDF、FastText和GloVe四种词嵌入模型的有效性,并从模型的准确性和能力两方面对模型的性能进行了评价。
{"title":"Semantic Clustering of Bangla Natural Word using Different Word Embedding Techniques","authors":"Aroni Saha Prapty, K. Hasan","doi":"10.1109/ICCIT57492.2022.10054703","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10054703","url":null,"abstract":"Natural language processing is referred to as NLP that applies computational techniques for inter-communication between human and computer through human natural language on the basis of computer science, computational linguistic and artificial intelligence. The progression of NLP in different revolutionary techniques, word embedding has brought magnificent changes in the field of computational linguistic, statistical inference and so on. Semantic clustering can be interpreted as classify the group of identical objects that are semantically analogous. The main focus of the work is to manifest different word embedding techniques for semantic clustering of natural Bangla words. Earlier N-gram models were applied for the relevant field but dynamic word clustering models are currently popular due to the advancement of deep learning techniques because they speed up memory retrieval and decrease processing time. We discuss the effectiveness of Word2Vec, TF-IDF, FastText and GloVe word embedding models in this work and appraise the performance based on the models accuracy and competence.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133210277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Recommendation by Incremental Tensor Factorization 增量张量分解的渐进式推荐
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10054697
Dipannita Biswas, K. M. Azharul Hasan, Zaima Zarnaz
There are several circumstances in which constantly updated multidimensional tensor data must be analyzed in real-time in order to yield quick recommendation in our fast-changing data world. Methods for incremental tensor decompositions are powerful tools for analyzing and predicting fast-growing multidimensional datasets. In this research, we provide a robust model that overcomes the limitations of the most popular incremental tensor decomposition methods and yields high-accuracy prediction results for enormous datasets with fast execution time. Testing our model on datasets, we discovered that it was able to create a tensor summary that could reflect both the new and old datasets properly and performed better than traditional static methods.
在一些情况下,必须实时分析不断更新的多维张量数据,以便在快速变化的数据世界中产生快速推荐。增量张量分解方法是分析和预测快速增长的多维数据集的有力工具。在本研究中,我们提供了一个鲁棒模型,克服了最流行的增量张量分解方法的局限性,并在快速执行的情况下对大量数据集产生高精度的预测结果。在数据集上测试我们的模型,我们发现它能够创建一个张量摘要,可以正确地反映新的和旧的数据集,并且比传统的静态方法执行得更好。
{"title":"Progressive Recommendation by Incremental Tensor Factorization","authors":"Dipannita Biswas, K. M. Azharul Hasan, Zaima Zarnaz","doi":"10.1109/ICCIT57492.2022.10054697","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10054697","url":null,"abstract":"There are several circumstances in which constantly updated multidimensional tensor data must be analyzed in real-time in order to yield quick recommendation in our fast-changing data world. Methods for incremental tensor decompositions are powerful tools for analyzing and predicting fast-growing multidimensional datasets. In this research, we provide a robust model that overcomes the limitations of the most popular incremental tensor decomposition methods and yields high-accuracy prediction results for enormous datasets with fast execution time. Testing our model on datasets, we discovered that it was able to create a tensor summary that could reflect both the new and old datasets properly and performed better than traditional static methods.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127395464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Largest Shift String Matching Algorithm: Blend of Berry Ravindran, Zhu-Takaoka and Back & Forth Matching Algorithm 最大移位字符串匹配算法:混合Berry Ravindran, Zhu-Takaoka和来回匹配算法
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10055575
M. Hasan, Prince Mahmud, Mst. Merina Khatun, Md. Hasibur Rahman, Md. Tarequl Islam
With the exponential growth of biological databases, it is a highly challenging and essential task to find the exact string from these databases. Various string pattern searching strategies played an essential role in solving string search problems. Mainly, the string pattern matching algorithm aims to reduce the running time which depends on attempts and character comparisons. Our paper proposed a new hybrid strategy, the Largest Shift Algorithm (LSA), to solve these string pattern matching problems more effectively. Our suggested new hybrid algorithm combines the most advantageous characteristics of Berry Ravindran, Zhu-Takaoka, and a customized Back & Forth Matching (BFM) algorithm. These three algorithms were chosen as they perform better in the tests for counting attempts and character comparisons. Three distinct types of algorithms were tested to analyze the performance of our proposed LSA algorithm, discussed in the literature which are BRR, maximum-shift, and quick-search algorithms. We have used English text, DNA, and protein sequences as data for their different nature. The proposed algorithm outperforms the previous algorithms in terms of performance such as runtime, total attempts, and comparisons of character.
随着生物数据库的指数级增长,从这些数据库中找到准确的字符串是一项极具挑战性和必要的任务。各种字符串模式搜索策略在解决字符串搜索问题中发挥了重要作用。字符串模式匹配算法的主要目的是减少依赖于尝试和字符比较的运行时间。为了更有效地解决这些字符串模式匹配问题,本文提出了一种新的混合策略——最大移位算法(LSA)。我们提出的新的混合算法结合了Berry Ravindran, Zhu-Takaoka最有利的特征,以及定制的Back & Forth Matching (BFM)算法。选择这三种算法是因为它们在计数尝试和字符比较测试中表现更好。我们测试了三种不同类型的算法来分析我们提出的LSA算法的性能,在文献中讨论了BRR、maximum-shift和快速搜索算法。我们使用英文文本、DNA和蛋白质序列作为数据,因为它们的性质不同。该算法在运行时间、总尝试次数和字符比较等性能方面优于先前的算法。
{"title":"Largest Shift String Matching Algorithm: Blend of Berry Ravindran, Zhu-Takaoka and Back & Forth Matching Algorithm","authors":"M. Hasan, Prince Mahmud, Mst. Merina Khatun, Md. Hasibur Rahman, Md. Tarequl Islam","doi":"10.1109/ICCIT57492.2022.10055575","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10055575","url":null,"abstract":"With the exponential growth of biological databases, it is a highly challenging and essential task to find the exact string from these databases. Various string pattern searching strategies played an essential role in solving string search problems. Mainly, the string pattern matching algorithm aims to reduce the running time which depends on attempts and character comparisons. Our paper proposed a new hybrid strategy, the Largest Shift Algorithm (LSA), to solve these string pattern matching problems more effectively. Our suggested new hybrid algorithm combines the most advantageous characteristics of Berry Ravindran, Zhu-Takaoka, and a customized Back & Forth Matching (BFM) algorithm. These three algorithms were chosen as they perform better in the tests for counting attempts and character comparisons. Three distinct types of algorithms were tested to analyze the performance of our proposed LSA algorithm, discussed in the literature which are BRR, maximum-shift, and quick-search algorithms. We have used English text, DNA, and protein sequences as data for their different nature. The proposed algorithm outperforms the previous algorithms in terms of performance such as runtime, total attempts, and comparisons of character.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115728271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Aerodynamic Data via Supervised Learning 通过监督学习估计空气动力学数据
Pub Date : 2022-12-17 DOI: 10.1109/ICCIT57492.2022.10054896
Azizul Haque, Tanzim Hossain, M. N. Murshed, K. I. B. Iqbal, Mohammad Monir Uddin
Supervised learning extracts a relationship between the input and the output from a training dataset. We consider four models – Support Vector Machine, Random Forest, Gradient Boost, and K-Nearest Neighbor – and employ them on data pertaining to airfoils in two different cases. First, given data about several different airfoil configurations, our objective is to predict the aerodynamic coefficients of a new airfoil at different angles of attack. Second, we seek to investigate how the coefficients can be estimated for a specific airfoil if the Reynolds number dramatically changes. It is our finding that the Random Forest and the Gradient Boost show promising performance in both the scenarios.
监督学习从训练数据集中提取输入和输出之间的关系。我们考虑了四种模型-支持向量机,随机森林,梯度增强和k近邻-并将它们用于两种不同情况下与翼型有关的数据。首先,给出的数据关于几个不同的翼型配置,我们的目标是预测一个新的翼型在不同的攻角气动系数。第二,我们寻求调查如何系数可以估计为一个特定的翼型,如果雷诺数急剧变化。我们发现随机森林和梯度提升在这两种情况下都表现出很好的性能。
{"title":"Estimating Aerodynamic Data via Supervised Learning","authors":"Azizul Haque, Tanzim Hossain, M. N. Murshed, K. I. B. Iqbal, Mohammad Monir Uddin","doi":"10.1109/ICCIT57492.2022.10054896","DOIUrl":"https://doi.org/10.1109/ICCIT57492.2022.10054896","url":null,"abstract":"Supervised learning extracts a relationship between the input and the output from a training dataset. We consider four models – Support Vector Machine, Random Forest, Gradient Boost, and K-Nearest Neighbor – and employ them on data pertaining to airfoils in two different cases. First, given data about several different airfoil configurations, our objective is to predict the aerodynamic coefficients of a new airfoil at different angles of attack. Second, we seek to investigate how the coefficients can be estimated for a specific airfoil if the Reynolds number dramatically changes. It is our finding that the Random Forest and the Gradient Boost show promising performance in both the scenarios.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115782403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 25th International Conference on Computer and Information Technology (ICCIT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1