首页 > 最新文献

Malaysian Journal of Computer Science最新文献

英文 中文
IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAÏVE BAYES CLASSIFIER 利用加权信息增益和联合训练的多项式NA-VE-BAYES分类器改进多标签文本分类
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-31 DOI: 10.22452/mjcs.vol35no1.2
W. Kaur, Vimala Balakrishnan, Kok-Seng Wong
Over recent years, the emergence of electronic text processing systems has generated a vast amount of structured and unstructured data, thus creating a challenging situation for users to rummage through irrelevant information. Therefore, studies are continually looking to improve the classification process to produce more accurate results that would benefit users. This paper looks into the weighted information gain method that re-assigns wrongly classified features with new weights to provide better classification. The method focuses on the weights of the frequency bins, assuming every time a certain word frequency bin is iterated, it provides information on the target word feature. Therefore, the more iteration and re-assigning of weight occur within the bin, the more important the bin becomes, eventually providing better classification. The proposed algorithm was trained and tested using a corpus extracted from dedicated Facebook pages related to diabetes. The weighted information gain feature selection technique is then fed into a co-trained Multinomial Naïve Bayes classification algorithm that captures the labels' dependencies. The algorithm incorporates class value dependencies since the dataset used multi-label data before converting string vectors that allow the sparse distribution between features to be minimised, thus producing more accurate results. The results of this study show an improvement in classification to 61%.
近年来,电子文本处理系统的出现产生了大量的结构化和非结构化数据,从而给用户在不相关的信息中翻找带来了挑战。因此,研究一直在寻求改进分类过程,以产生更准确的结果,从而使用户受益。本文研究了加权信息增益方法,用新的权重重新分配错误分类的特征,以提供更好的分类。该方法关注频率箱的权值,假设每次迭代某个词频率箱时,它提供了目标词特征的信息。因此,在bin中迭代和重新分配权重的次数越多,bin就越重要,最终提供更好的分类。所提出的算法使用从与糖尿病相关的专用Facebook页面中提取的语料库进行训练和测试。然后将加权信息增益特征选择技术馈送到共同训练的多项式Naïve贝叶斯分类算法中,该算法捕获标签的依赖关系。该算法结合了类值依赖关系,因为数据集在转换字符串向量之前使用了多标签数据,从而使特征之间的稀疏分布最小化,从而产生更准确的结果。这项研究的结果表明,分类提高到61%。
{"title":"IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAÏVE BAYES CLASSIFIER","authors":"W. Kaur, Vimala Balakrishnan, Kok-Seng Wong","doi":"10.22452/mjcs.vol35no1.2","DOIUrl":"https://doi.org/10.22452/mjcs.vol35no1.2","url":null,"abstract":"Over recent years, the emergence of electronic text processing systems has generated a vast amount of structured and unstructured data, thus creating a challenging situation for users to rummage through irrelevant information. Therefore, studies are continually looking to improve the classification process to produce more accurate results that would benefit users. This paper looks into the weighted information gain method that re-assigns wrongly classified features with new weights to provide better classification. The method focuses on the weights of the frequency bins, assuming every time a certain word frequency bin is iterated, it provides information on the target word feature. Therefore, the more iteration and re-assigning of weight occur within the bin, the more important the bin becomes, eventually providing better classification. The proposed algorithm was trained and tested using a corpus extracted from dedicated Facebook pages related to diabetes. The weighted information gain feature selection technique is then fed into a co-trained Multinomial Naïve Bayes classification algorithm that captures the labels' dependencies. The algorithm incorporates class value dependencies since the dataset used multi-label data before converting string vectors that allow the sparse distribution between features to be minimised, thus producing more accurate results. The results of this study show an improvement in classification to 61%.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46739090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AN EFFICIENT SENTIMENT ANALYSIS BASED DEEP LEARNING CLASSIFICATION MODEL TO EVALUATE TREATMENT QUALITY 一种有效的基于情绪分析的深度学习分类模型用于评估治疗质量
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-31 DOI: 10.22452/mjcs.vol35no1.1
Samer Abdulateef Waheeb, Naseer Ahmed Khan, Xuequn Shang
Extracting information using an automated system from unstructured medical documents related to patients discharge summaries in the health care centers is considered a big challenge. Sentiment analysis of medical records has gained significant attention worldwide to understand the behaviors of both clinicians and patients. However, Sentiment analysis of discharge summary still does not provide a clear picture of the information available in these summaries. This study proposes a machine learning-based novel sentiment analysis unsupervised techniques to classify discharge summaries using TF-IDF, Word2Vec, GloVe, FastText, and BERT as deep learning approaches with statistical methods, and clustering. Our proposed model is an unsupervised sentiment framework that provides good understanding and insights of the clinical features that are not captured in the electronic health data records. Moreover, it’s a hybrid sentiment model consisting of clustering technique and vector space models for selecting the distinctive terms. The main intensity of measured sentiment is captured using the polarity of positive and negative terms in the discharge summary. The combination of SentiWordNet platform and our approach is used to build a lexicon sentiment dataset (assignment polarity). Experiments shows that our suggested method achieves 93% accuracy and significantly outperforms other state of the art approaches based on the inspiration of sentiment analysis technique to examine the treatment quality for discharge summaries.
在医疗保健中心,使用自动化系统从与患者出院摘要相关的非结构化医疗文档中提取信息被认为是一个巨大的挑战。对病历的情感分析在全世界引起了极大的关注,以了解临床医生和患者的行为。然而,出院总结的情绪分析仍然没有提供这些总结中可用信息的清晰画面。本研究提出了一种基于机器学习的新型情绪分析无监督技术,将TF-IDF、Word2Vec、GloVe、FastText和BERT作为深度学习方法,结合统计方法和聚类,对出院总结进行分类。我们提出的模型是一个无监督的情绪框架,它提供了对电子健康数据记录中未捕获的临床特征的良好理解和见解。此外,它是一个由聚类技术和向量空间模型组成的混合情感模型,用于选择不同的术语。测量情绪的主要强度是使用出院总结中正负项的极性来捕捉的。SentiWordNet平台和我们的方法相结合,用于构建词典情感数据集(赋值极性)。实验表明,我们提出的方法达到了93%的准确率,显著优于其他基于情感分析技术的现有方法,以检查出院总结的治疗质量。
{"title":"AN EFFICIENT SENTIMENT ANALYSIS BASED DEEP LEARNING CLASSIFICATION MODEL TO EVALUATE TREATMENT QUALITY","authors":"Samer Abdulateef Waheeb, Naseer Ahmed Khan, Xuequn Shang","doi":"10.22452/mjcs.vol35no1.1","DOIUrl":"https://doi.org/10.22452/mjcs.vol35no1.1","url":null,"abstract":"Extracting information using an automated system from unstructured medical documents related to patients discharge summaries in the health care centers is considered a big challenge. Sentiment analysis of medical records has gained significant attention worldwide to understand the behaviors of both clinicians and patients. However, Sentiment analysis of discharge summary still does not provide a clear picture of the information available in these summaries. This study proposes a machine learning-based novel sentiment analysis unsupervised techniques to classify discharge summaries using TF-IDF, Word2Vec, GloVe, FastText, and BERT as deep learning approaches with statistical methods, and clustering. Our proposed model is an unsupervised sentiment framework that provides good understanding and insights of the clinical features that are not captured in the electronic health data records. Moreover, it’s a hybrid sentiment model consisting of clustering technique and vector space models for selecting the distinctive terms. The main intensity of measured sentiment is captured using the polarity of positive and negative terms in the discharge summary. The combination of SentiWordNet platform and our approach is used to build a lexicon sentiment dataset (assignment polarity). Experiments shows that our suggested method achieves 93% accuracy and significantly outperforms other state of the art approaches based on the inspiration of sentiment analysis technique to examine the treatment quality for discharge summaries.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43429501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
AUTOMATED ARABIC ESSAY SCORING BASED ON HYBRID STEMMING WITH WORDNET 基于混合词干和WORDNET的阿拉伯语作文自动评分
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.4
Mohammad Alobed, Abdallah M M Altrad, Zainab Binti Abu Bakar, N. Zamin
Schools, universities, and other educational institutions have been forced to close their doors because of the coronavirus outbreak. E-learning has become an option and has long been discussed about the need to integrate it into the educational process-learning uses a variety of evaluation methods, one of which is the essay. This research introduces a new model for Arabic Automated Essay Grading (AAEG) that has been developed to reduce human bias mistakes and costs while saving time. However, (AAEG) is still in its infancy. The model relies on new hybrid stemming with Arabic WordNet (AWN). The primary goal of stemming is reducing inflectional forms of words to root words. The hybrid method is based on different techniques: Extended Light Stemmer, ISRI, and looking at tables (AWN). Data used in this study consists of 3050 words with their roots were retrieved from (AWN) and then stemmed using algorithms (Light10, ISRI, Hybrid...). For evaluation, the metrics used were accuracy, precision, recall, and F1-score. While comparing the performance of the different stemming algorithms, the hybrid stemming method had the greatest results, therefore the (AAEG) will improve with Hybrid Stemming.
由于冠状病毒的爆发,学校、大学和其他教育机构被迫关闭。电子学习已经成为一种选择,并且长期以来一直在讨论将其纳入教育过程的必要性-学习使用多种评估方法,其中一种是论文。本研究介绍了一种阿拉伯语自动论文评分(AAEG)的新模型,该模型的开发旨在减少人为偏见错误和成本,同时节省时间。然而,(AAEG)仍处于起步阶段。该模型依赖于阿拉伯语WordNet (AWN)的混合词干。词干提取的主要目的是减少单词的屈折形式来词根。这种混合方法是基于不同的技术:扩展的Light Stemmer、ISRI和查找表(AWN)。本研究使用的数据包括3050个单词,从(AWN)中检索词根,然后使用(Light10, ISRI, Hybrid…)算法进行词根提取。用于评估的指标是准确性、精密度、召回率和f1分数。在比较不同词干提取算法的性能时,混合词干提取方法的效果最好,因此混合词干提取将会提高(AAEG)的性能。
{"title":"AUTOMATED ARABIC ESSAY SCORING BASED ON HYBRID STEMMING WITH WORDNET","authors":"Mohammad Alobed, Abdallah M M Altrad, Zainab Binti Abu Bakar, N. Zamin","doi":"10.22452/mjcs.sp2021no2.4","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.4","url":null,"abstract":"Schools, universities, and other educational institutions have been forced to close their doors because of the coronavirus outbreak. E-learning has become an option and has long been discussed about the need to integrate it into the educational process-learning uses a variety of evaluation methods, one of which is the essay. This research introduces a new model for Arabic Automated Essay Grading (AAEG) that has been developed to reduce human bias mistakes and costs while saving time. However, (AAEG) is still in its infancy. The model relies on new hybrid stemming with Arabic WordNet (AWN). The primary goal of stemming is reducing inflectional forms of words to root words. The hybrid method is based on different techniques: Extended Light Stemmer, ISRI, and looking at tables (AWN). Data used in this study consists of 3050 words with their roots were retrieved from (AWN) and then stemmed using algorithms (Light10, ISRI, Hybrid...). For evaluation, the metrics used were accuracy, precision, recall, and F1-score. While comparing the performance of the different stemming algorithms, the hybrid stemming method had the greatest results, therefore the (AAEG) will improve with Hybrid Stemming.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43206632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
IDENTIFYING THE ETHICAL ISSUES IN TWITTER: A KNOWLEDGE ACQUISITION FOR ONTOLOGY 识别TWITTER中的伦理问题:本体论的知识获取
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.7
Mohamad Hafizuddin Mohamed Najid, Z. Zulkifli, R. Othman, Rohaiza Rokis, A. A. Salahuddin
Social media is an open platform to communicate, share and exchange information freely. This uncontrolled exchanged information carries out both negative and positive impacts in others’ lives. In this regard, this study aims to identify ethical issues on this information in line with Ibn Khaldun’s ethical considerations. Out of many other social networking sites, Twitter has been identified as one of the most popular microblogging social networking platforms. Using a simple algorithm in R programming and 43 keywords based on Ibn Khaldun’s thoughts, 1075 public tweets have been extracted from Twitter as a sample of ethical issues. The sentiment analysis in Parallel Dots was performed on the collected tweets, and it was discovered that 700 of the tweets are positive statements, 229 are neutral statements, and 146 are negative statements. Having done the validation process on these sentiments, the study proposed these identified ethical issues from tweets as a domain in developing ontology relationships with Ibn Khaldun’s thoughts. In this process, further study can be carried on with wider data from various sources beyond the limitation of this study. Thus, a semantic database could serve as a guideline for SNS ethical issues based on Ibn Khaldun’s thoughts.
社交媒体是一个自由交流、分享和交换信息的开放平台。这种不受控制的信息交流对他人的生活产生了负面和积极的影响。在这方面,本研究旨在根据伊本·哈尔顿的伦理考虑,确定这些信息中的伦理问题。在许多其他社交网站中,推特被认为是最受欢迎的微博社交网络平台之一。使用R编程中的一个简单算法和基于Ibn Khaldun思想的43个关键词,从推特中提取了1075条公共推文作为道德问题的样本。对收集到的推文进行了Parallel Dots中的情绪分析,发现700条推文是积极的陈述,229条是中立的陈述,146条是消极的陈述。在对这些情绪进行了验证之后,该研究提出,推文中的这些伦理问题是与伊本·哈尔顿思想发展本体关系的一个领域。在这个过程中,可以在本研究的限制之外,利用来自各种来源的更广泛的数据进行进一步的研究。因此,基于伊本·哈尔顿的思想,语义数据库可以作为SNS伦理问题的指导方针。
{"title":"IDENTIFYING THE ETHICAL ISSUES IN TWITTER: A KNOWLEDGE ACQUISITION FOR ONTOLOGY","authors":"Mohamad Hafizuddin Mohamed Najid, Z. Zulkifli, R. Othman, Rohaiza Rokis, A. A. Salahuddin","doi":"10.22452/mjcs.sp2021no2.7","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.7","url":null,"abstract":"Social media is an open platform to communicate, share and exchange information freely. This uncontrolled exchanged information carries out both negative and positive impacts in others’ lives. In this regard, this study aims to identify ethical issues on this information in line with Ibn Khaldun’s ethical considerations. Out of many other social networking sites, Twitter has been identified as one of the most popular microblogging social networking platforms. Using a simple algorithm in R programming and 43 keywords based on Ibn Khaldun’s thoughts, 1075 public tweets have been extracted from Twitter as a sample of ethical issues. The sentiment analysis in Parallel Dots was performed on the collected tweets, and it was discovered that 700 of the tweets are positive statements, 229 are neutral statements, and 146 are negative statements. Having done the validation process on these sentiments, the study proposed these identified ethical issues from tweets as a domain in developing ontology relationships with Ibn Khaldun’s thoughts. In this process, further study can be carried on with wider data from various sources beyond the limitation of this study. Thus, a semantic database could serve as a guideline for SNS ethical issues based on Ibn Khaldun’s thoughts.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47646328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MAPPING DEFORESTATION IN PERMANENT FOREST RESERVE OF PENINSULAR MALAYSIA WITH MULTI-TEMPORAL SAR IMAGERY AND U-NET BASED SEMANTIC SEGMENTATION 基于多时间SAR图像和U-NET语义分割的马来西亚半岛永久性森林保护区森林砍伐地图绘制
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.2
Muhammad Azzam A. Wahab, Ely Salwana Mat Surin, Norshita Mat Nayan, Hameedur Rahman
Deforestation is the long-term or permanent conversion of forest land to other uses, such as agriculture, mining, and urban development. As a result, deforestation has catastrophic consequences for the environment, including the loss of biodiversity, disruption of clean water supplies, and the acceleration of climate change. According to statistics, the deforestation trend in developing countries is at an alarming rate including Malaysia where plantation activities are the primary cause of forest loss. Recent anecdotal studies have demonstrated the effectiveness of the deep learning-based (DL) approach in producing deforestation maps. However, there are limited studies concentrating on DL approach for synthetic aperture radar (SAR) imaging due to complexity of the computational concepts of the method. The SAR imagery can be challenging to interpret but its all-weather and all-day capability can be critical in forest monitoring compared to optical imagery. Thus, in this study, we propose to map deforestation areas in Permanent Forest Reserve (HSK) using multi-temporal Sentinel-1 SAR data. Deep learning-based U-Net was employed to classify the SAR imagery as forest and non-forest due to its semantic segmentation capabilities. The experiment results showed that the proposed deep learning-based technique successfully achieved 0.993 of intersection over union (IoU) and 0.980 of overall accuracy (OA). Also, we explained the entire procedure from beginning to end as simple as possible for beginners to comprehend. In brief, the findings of this study have the potential to improve monitoring of damaged HSK areas, prioritize the restoration of the affected forest areas and protecting the forest lands from illegal deforestation activities.
森林砍伐是指将林地长期或永久地改为其他用途,如农业、采矿和城市发展。因此,森林砍伐对环境造成了灾难性后果,包括生物多样性的丧失、清洁水供应的中断以及气候变化的加速。据统计,发展中国家的森林砍伐趋势令人震惊,包括马来西亚,那里的种植活动是森林损失的主要原因。最近的轶事研究证明了基于深度学习的方法在绘制森林砍伐地图方面的有效性。然而,由于合成孔径雷达(SAR)成像的计算概念的复杂性,对DL方法的研究有限。SAR图像的解释可能很有挑战性,但与光学图像相比,其全天候和全天的能力在森林监测中至关重要。因此,在本研究中,我们建议使用多时相Sentinel-1 SAR数据绘制永久森林保护区(HSK)的森林砍伐面积图。基于深度学习的U-Net由于其语义分割能力,被用于将SAR图像分为森林和非森林。实验结果表明,所提出的基于深度学习的技术成功地实现了0.993的联合交集(IoU)和0.980的总体精度(OA)。此外,我们从头到尾尽可能简单地解释了整个过程,供初学者理解。简言之,这项研究的结果有可能改善对HSK受损地区的监测,优先恢复受影响的林区,并保护林地免受非法砍伐活动的影响。
{"title":"MAPPING DEFORESTATION IN PERMANENT FOREST RESERVE OF PENINSULAR MALAYSIA WITH MULTI-TEMPORAL SAR IMAGERY AND U-NET BASED SEMANTIC SEGMENTATION","authors":"Muhammad Azzam A. Wahab, Ely Salwana Mat Surin, Norshita Mat Nayan, Hameedur Rahman","doi":"10.22452/mjcs.sp2021no2.2","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.2","url":null,"abstract":"Deforestation is the long-term or permanent conversion of forest land to other uses, such as agriculture, mining, and urban development. As a result, deforestation has catastrophic consequences for the environment, including the loss of biodiversity, disruption of clean water supplies, and the acceleration of climate change. According to statistics, the deforestation trend in developing countries is at an alarming rate including Malaysia where plantation activities are the primary cause of forest loss. Recent anecdotal studies have demonstrated the effectiveness of the deep learning-based (DL) approach in producing deforestation maps. However, there are limited studies concentrating on DL approach for synthetic aperture radar (SAR) imaging due to complexity of the computational concepts of the method. The SAR imagery can be challenging to interpret but its all-weather and all-day capability can be critical in forest monitoring compared to optical imagery. Thus, in this study, we propose to map deforestation areas in Permanent Forest Reserve (HSK) using multi-temporal Sentinel-1 SAR data. Deep learning-based U-Net was employed to classify the SAR imagery as forest and non-forest due to its semantic segmentation capabilities. The experiment results showed that the proposed deep learning-based technique successfully achieved 0.993 of intersection over union (IoU) and 0.980 of overall accuracy (OA). Also, we explained the entire procedure from beginning to end as simple as possible for beginners to comprehend. In brief, the findings of this study have the potential to improve monitoring of damaged HSK areas, prioritize the restoration of the affected forest areas and protecting the forest lands from illegal deforestation activities.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45357013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AN EXPERIMENTAL EVALUATION OF DEEP NEURAL NETWORK MODEL PERFORMANCE FOR THE RECOGNITION OF CONTRADICTORY MEDICAL RESEARCH CLAIMS USING SMALL AND MEDIUM-SIZED CORPORA 利用中小型语料库识别矛盾医学研究主张的深度神经网络模型性能实验评价
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.5
Fatin Syafiqah Yazi, Wan-Tze Vong, V. Raman, Patrick Hang Hui Then, Mukulraj J Lunia
Corpora come in various shapes and sizes and play an essential role in facilitating Natural Language Processing (NLP) tasks. However, the availability of corpora specialized for Evidence-Based Medicine (EBM) related tasks is limited. The study is aimed to discover how the size of a corpus influence the performance of our Deep Neural Network (DNN) model developed for contradiction detection in medical literature. We explored the potential of the EBM Summarizer corpus by Mollá and Santiago-Martínez, a medium-sized corpus to be used with our contradiction detection model. The dataset preparation involves the filtering of open-ended questions, duplicates of claims, and vague claims. As a result, two datasets were created with the claim input represented by sniptext in one dataset and longtext in the other. Experiments were conducted with varying numbers of hidden layers and units of the model using different datasets. The performance of the DNN model was recorded and compared with the result of using a small-sized corpus. It was found that the DNN model performance did not improve even after it was trained with a larger dataset derived from the medium-sized corpus. The factors may include the limitation of the DNN model itself and the quality of the datasets.
语料库有各种形状和大小,在促进自然语言处理(NLP)任务中起着至关重要的作用。然而,专门用于循证医学(EBM)相关任务的语料库的可用性有限。该研究旨在发现语料库的大小如何影响我们为医学文献中的矛盾检测开发的深度神经网络(DNN)模型的性能。我们探索了moll和Santiago-Martínez的EBM Summarizer语料库的潜力,这是一个中型语料库,将与我们的矛盾检测模型一起使用。数据集的准备包括对开放式问题、重复声明和模糊声明的过滤。结果,创建了两个数据集,其中一个数据集中的snippet text和另一个数据集中的longtext分别表示索赔输入。使用不同的数据集对模型的不同隐藏层和单元数量进行实验。记录了DNN模型的性能,并与使用小型语料库的结果进行了比较。研究发现,即使在使用来自中型语料库的更大数据集进行训练后,DNN模型的性能也没有提高。这些因素可能包括深度神经网络模型本身的局限性和数据集的质量。
{"title":"AN EXPERIMENTAL EVALUATION OF DEEP NEURAL NETWORK MODEL PERFORMANCE FOR THE RECOGNITION OF CONTRADICTORY MEDICAL RESEARCH CLAIMS USING SMALL AND MEDIUM-SIZED CORPORA","authors":"Fatin Syafiqah Yazi, Wan-Tze Vong, V. Raman, Patrick Hang Hui Then, Mukulraj J Lunia","doi":"10.22452/mjcs.sp2021no2.5","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.5","url":null,"abstract":"Corpora come in various shapes and sizes and play an essential role in facilitating Natural Language Processing (NLP) tasks. However, the availability of corpora specialized for Evidence-Based Medicine (EBM) related tasks is limited. The study is aimed to discover how the size of a corpus influence the performance of our Deep Neural Network (DNN) model developed for contradiction detection in medical literature. We explored the potential of the EBM Summarizer corpus by Mollá and Santiago-Martínez, a medium-sized corpus to be used with our contradiction detection model. The dataset preparation involves the filtering of open-ended questions, duplicates of claims, and vague claims. As a result, two datasets were created with the claim input represented by sniptext in one dataset and longtext in the other. Experiments were conducted with varying numbers of hidden layers and units of the model using different datasets. The performance of the DNN model was recorded and compared with the result of using a small-sized corpus. It was found that the DNN model performance did not improve even after it was trained with a larger dataset derived from the medium-sized corpus. The factors may include the limitation of the DNN model itself and the quality of the datasets.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44865159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SEMANTIC GRAPH KNOWLEDGE REPRESENTATION FOR AL-QURAN VERSES BASED ON WORD DEPENDENCIES 基于词相关性的AL-QURAN诗句语义图知识表示
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.9
Muhammad Muhtadi Mohamad Khazani, H. Mohamed, Tengku Mohd Tengku Sembok, Nurhafizah Moziyana Mohd Yusop, Sharyar Wani, Yonis Gulzar, Mohd Hazali Mohamed Halip, Syahaneim Marzukh, Zahri Yunos
Semantic approaches present an efficient, detailed and easily understandable representation of knowledge from documents. Al-Quran contains a vast amount of knowledge that needs appropriate knowledge extraction. A semantic based approach can help in designing an efficient and explainable knowledge representation model for Al-Quran. This research aims to propose a semantic-graph knowledge representation model for verses of Al-Quran based on word dependencies. These features are used in the proposed knowledge representation model allowing the semantic graph matching to improve Al-Quran search applications' accuracy. The proposed knowledge representation model is essentially a formalism for generating a semantic graph representation of Quranic verses, which can be applied for knowledge base construction for other applications such as information retrieval system. A set of rules called Semantic Dependency Triple Rules are defined to be mapped into the semantic graph representing the verse's logic. The rules translate word dependencies and other NLP metadata into a triple form that holds logical information. The proposed model has been tested with English translation of Al-Quran on a document retrieval prototype The basic system has been enhanced with anaphoric pronouns correction, which has shown improvement in retrieval performance. The results have been compared with a closely related system and evaluated on the accuracy of the document retrieval in Precision, Recall and F-score measurements. The proposed model has achieved 65%, 60% and 62.4% for the measurements, respectively. It has also improved the overall accuracy of previous system by 43.8%.
语义方法提供了一种高效、详细且易于理解的文档知识表示。《古兰经》包含了大量需要适当提取知识的知识。基于语义的方法可以帮助为《古兰经》设计一个高效且可解释的知识表示模型。本研究旨在提出一个基于单词依赖性的《古兰经》诗句语义图知识表示模型。这些特征被用于所提出的知识表示模型,允许语义图匹配来提高Al-Quran搜索应用程序的准确性。所提出的知识表示模型本质上是一种用于生成古兰经诗句的语义图表示的形式主义,可用于其他应用程序(如信息检索系统)的知识库构建。定义了一组称为语义依赖三重规则的规则,将其映射到表示诗句逻辑的语义图中。这些规则将单词相关性和其他NLP元数据转换为包含逻辑信息的三重形式。该模型已在文献检索原型上用《古兰经》的英文翻译进行了测试。回指代词校正增强了基本系统,显示出检索性能的提高。将结果与一个密切相关的系统进行了比较,并在精度、召回率和F评分测量中对文档检索的准确性进行了评估。所提出的模型分别获得了65%、60%和62.4%的测量结果。它还将以前系统的总体精度提高了43.8%。
{"title":"SEMANTIC GRAPH KNOWLEDGE REPRESENTATION FOR AL-QURAN VERSES BASED ON WORD DEPENDENCIES","authors":"Muhammad Muhtadi Mohamad Khazani, H. Mohamed, Tengku Mohd Tengku Sembok, Nurhafizah Moziyana Mohd Yusop, Sharyar Wani, Yonis Gulzar, Mohd Hazali Mohamed Halip, Syahaneim Marzukh, Zahri Yunos","doi":"10.22452/mjcs.sp2021no2.9","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.9","url":null,"abstract":"Semantic approaches present an efficient, detailed and easily understandable representation of knowledge from documents. Al-Quran contains a vast amount of knowledge that needs appropriate knowledge extraction. A semantic based approach can help in designing an efficient and explainable knowledge representation model for Al-Quran. This research aims to propose a semantic-graph knowledge representation model for verses of Al-Quran based on word dependencies. These features are used in the proposed knowledge representation model allowing the semantic graph matching to improve Al-Quran search applications' accuracy. The proposed knowledge representation model is essentially a formalism for generating a semantic graph representation of Quranic verses, which can be applied for knowledge base construction for other applications such as information retrieval system. A set of rules called Semantic Dependency Triple Rules are defined to be mapped into the semantic graph representing the verse's logic. The rules translate word dependencies and other NLP metadata into a triple form that holds logical information. The proposed model has been tested with English translation of Al-Quran on a document retrieval prototype The basic system has been enhanced with anaphoric pronouns correction, which has shown improvement in retrieval performance. The results have been compared with a closely related system and evaluated on the accuracy of the document retrieval in Precision, Recall and F-score measurements. The proposed model has achieved 65%, 60% and 62.4% for the measurements, respectively. It has also improved the overall accuracy of previous system by 43.8%.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43309466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MELODY TRAINING WITH SEGMENT-BASED TILT CONTOUR FOR QURANIC TARANNUM 基于分段倾斜轮廓的曲美塔兰农旋律训练
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.1
Haslizatul Mohamed Hanum, Luqmanul Hakim Md Abas, Aiman Syamil Aziz, Zainab Abu Bakar, Norizan Mat Diah, W. F. Wan Ahmad, Nazlena Mohamad Ali, N. Zamin
Tarannum, or melodic recitation of Quranic verses, employs the softness of the voice in reading the holy verses of the Quran. Melody training technology allows users to practise repetitively while also providing feedback on their performance. This paper describes an application that captures the pattern of tarannum melodies (from Quranic recitations) and provides feedback to the user. Recordings of Quranic verses are collected from an expert reciting Bayati tarannum. The samples are pre-processed into segmented tarannum verse-contours using pitch sequences. Using the k-Nearest Neighbor (kNN) classifier, the melody patterns are trained on 20 samples. Input vectors are formed by computing the melody verse-contour representation using mean, standard deviation, and slope values and combining them with an identified Tilt-based contour label. A tarannum training prototype is built to test similarity between a user’s recitation and the trained patterns. To identify similarity between a pair of verse-contours, the application employs a shape-based contour similarity algorithm. The proposed application also provides feedback in the form of a grade and a percentage of accuracy, as determined by a melody curve similarity algorithm. As results, the current samples have an overall shape-based weighted score of 66%. Some samples are successfully classified with a similarity score as high as 80% individually. The study provides an alternative interactive session for people who want to learn Tarannum, as well as a preliminary step toward understanding the melodic patterns for tarannum. The application provides a repetitive training experience and encourages users to improve their recitations in order to achieve the highest possible score.
Tarannum,或古兰经诗句的旋律朗诵,在阅读《古兰经》的神圣诗句时使用了柔和的声音。旋律训练技术允许用户反复练习,同时提供对其表现的反馈。本文描述了一个应用程序,它可以捕捉塔兰农旋律的模式(来自古兰经背诵)并向用户提供反馈。古兰经经文的记录是从一位背诵巴亚提·塔兰努姆的专家那里收集的。使用音高序列将样本预处理为分段的塔兰农诗句轮廓。使用k近邻(kNN)分类器,在20个样本上训练旋律模式。输入向量是通过使用平均值、标准差和斜率值计算旋律诗句轮廓表示,并将它们与已识别的基于倾斜的轮廓标签组合而形成的。建立了一个tarannum训练原型来测试用户的背诵和训练模式之间的相似性。为了识别一对横向轮廓之间的相似性,该应用程序采用了基于形状的轮廓相似性算法。所提出的应用程序还提供由旋律曲线相似性算法确定的等级和准确率百分比形式的反馈。结果,当前样本具有66%的基于形状的总体加权得分。一些样本被成功分类,单个样本的相似性得分高达80%。这项研究为想要学习塔兰农的人提供了一个替代的互动环节,也是理解塔兰农旋律模式的初步步骤。该应用程序提供了重复的训练体验,并鼓励用户改进背诵,以获得尽可能高的分数。
{"title":"MELODY TRAINING WITH SEGMENT-BASED TILT CONTOUR FOR QURANIC TARANNUM","authors":"Haslizatul Mohamed Hanum, Luqmanul Hakim Md Abas, Aiman Syamil Aziz, Zainab Abu Bakar, Norizan Mat Diah, W. F. Wan Ahmad, Nazlena Mohamad Ali, N. Zamin","doi":"10.22452/mjcs.sp2021no2.1","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.1","url":null,"abstract":"Tarannum, or melodic recitation of Quranic verses, employs the softness of the voice in reading the holy verses of the Quran. Melody training technology allows users to practise repetitively while also providing feedback on their performance. This paper describes an application that captures the pattern of tarannum melodies (from Quranic recitations) and provides feedback to the user. Recordings of Quranic verses are collected from an expert reciting Bayati tarannum. The samples are pre-processed into segmented tarannum verse-contours using pitch sequences. Using the k-Nearest Neighbor (kNN) classifier, the melody patterns are trained on 20 samples. Input vectors are formed by computing the melody verse-contour representation using mean, standard deviation, and slope values and combining them with an identified Tilt-based contour label. A tarannum training prototype is built to test similarity between a user’s recitation and the trained patterns. To identify similarity between a pair of verse-contours, the application employs a shape-based contour similarity algorithm. The proposed application also provides feedback in the form of a grade and a percentage of accuracy, as determined by a melody curve similarity algorithm. As results, the current samples have an overall shape-based weighted score of 66%. Some samples are successfully classified with a similarity score as high as 80% individually. The study provides an alternative interactive session for people who want to learn Tarannum, as well as a preliminary step toward understanding the melodic patterns for tarannum. The application provides a repetitive training experience and encourages users to improve their recitations in order to achieve the highest possible score.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45713079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HYBRID DISCRETE WAVELET TRANSFORM AND TEXTURE ANALYSIS METHODS FOR FEATURE EXTRACTION AND CLASSIFICATION OF BREAST DYNAMIC THERMOGRAM SEQUENCES 离散小波变换与纹理分析相结合的乳腺动态热图序列特征提取与分类方法
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.8
Khaleel Al-Rababah, M. Mustaffa, S. Doraisamy, F. Khalid, Luís Filipe de Pina Júnior
Breast cancer is a common cancer that hits women causing thousands of casualties every year. A cancerous tumor causes an increase of temperature near the region of the tumor. The heat generated by the temperature transferred to the skin surface. The temperature in the tumor area is warmer than in the healthy area. Detecting breast cancer in early stages can save women’s lives and lower the burden on the cost. Thermography is an imaging technique used for breast cancer detection. A dynamic thermography technique which is used to generate infrared images over a fixed time measured in minutes to detect the difference between the normal and cancerous areas in images. In this research, we propose a methodology to deal with the changes of temperature in patient's breasts by defining a set of efficient features resulted from extraction and reduction of coefficients obtained from breast thermogram images followed by classification. Texture feature methods (Histogram of Oriented Gradients (HOG) and Discrete Curvelet transform) are applied separately using the HH (high-high) and HL (high-low) sub band images of Discrete Wavelet transform (DWT). HOG-based features and Curvelet features are extracted by reducing coefficients’ vectors returned by the two methods. Finally, Support Vector Machine (SVM) binary classifier is used to classify the images to either normal or abnormal. The proposed work has successfully achieved an Accuracy of 98.2%, Sensitivity of 97.7%, and Specificity of 98.2% through empirical studies using dynamic breast thermogram dataset.
癌症是一种常见的癌症,每年侵袭女性,造成数千人死亡。癌性肿瘤导致肿瘤区域附近的温度升高。温度产生的热量传递到皮肤表面。肿瘤区域的温度比健康区域的温度高。早期发现癌症可以挽救妇女的生命,降低成本负担。热成像是一种用于检测癌症的成像技术。一种动态热成像技术,用于在以分钟为单位的固定时间内生成红外图像,以检测图像中正常区域和癌性区域之间的差异。在这项研究中,我们提出了一种方法来处理患者乳房温度的变化,方法是定义一组有效的特征,这些特征是从乳房体温图图像中提取和减少系数后进行分类的结果。使用离散小波变换(DWT)的HH(高-高)和HL(高-低)子带图像,分别应用纹理特征方法(定向梯度直方图(HOG)和离散Curvelet变换)。通过减少两种方法返回的系数向量,提取了基于HOG的特征和Curvelet特征。最后,使用支持向量机(SVM)二值分类器对图像进行正常或异常分类。通过使用动态乳房体温图数据集的实证研究,所提出的工作成功地实现了98.2%的准确度、97.7%的灵敏度和98.2%的特异性。
{"title":"HYBRID DISCRETE WAVELET TRANSFORM AND TEXTURE ANALYSIS METHODS FOR FEATURE EXTRACTION AND CLASSIFICATION OF BREAST DYNAMIC THERMOGRAM SEQUENCES","authors":"Khaleel Al-Rababah, M. Mustaffa, S. Doraisamy, F. Khalid, Luís Filipe de Pina Júnior","doi":"10.22452/mjcs.sp2021no2.8","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.8","url":null,"abstract":"Breast cancer is a common cancer that hits women causing thousands of casualties every year. A cancerous tumor causes an increase of temperature near the region of the tumor. The heat generated by the temperature transferred to the skin surface. The temperature in the tumor area is warmer than in the healthy area. Detecting breast cancer in early stages can save women’s lives and lower the burden on the cost. Thermography is an imaging technique used for breast cancer detection. A dynamic thermography technique which is used to generate infrared images over a fixed time measured in minutes to detect the difference between the normal and cancerous areas in images. In this research, we propose a methodology to deal with the changes of temperature in patient's breasts by defining a set of efficient features resulted from extraction and reduction of coefficients obtained from breast thermogram images followed by classification. Texture feature methods (Histogram of Oriented Gradients (HOG) and Discrete Curvelet transform) are applied separately using the HH (high-high) and HL (high-low) sub band images of Discrete Wavelet transform (DWT). HOG-based features and Curvelet features are extracted by reducing coefficients’ vectors returned by the two methods. Finally, Support Vector Machine (SVM) binary classifier is used to classify the images to either normal or abnormal. The proposed work has successfully achieved an Accuracy of 98.2%, Sensitivity of 97.7%, and Specificity of 98.2% through empirical studies using dynamic breast thermogram dataset.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45199726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMPLEMENTATION OF HYPERPARAMETER OPTIMISATION AND OVER-SAMPLING IN DETECTING CYBERBULLYING USING MACHINE LEARNING APPROACH 利用机器学习方法实现超参数优化和超采样检测网络欺凌
IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-31 DOI: 10.22452/mjcs.sp2021no2.6
Wan Noor Anira Wan Ali, M. Mohd, F. Fauzi, Kiyoaki Shirai, Muhammad Junaidi Mahamad Noor
Online social networks have become a necessity to everyone around the world. Particularly, online social networks have enabled us to connect to one another regardless of time, for as long as we have social media and social networking as platforms for broadcasting information and communicating, respectively. However, this evolution has resulted in people possibly committing various cybercrimes, such as cyberbullying. To address this issue, machine learning can be utilised to counter cyberbullying in online social networks. Thus, this study proposed a framework with a set of features consisting of word and character term frequency–inverse document frequency and word embedding by using Word2vec and six types of list terms: profane words, proper nouns, negation words, ‘allness’ term, diminisher words and intensifier words. These features were divided into four groups before being fed into the linear support vector classifier to train our model using ASKfm as data set in hyperparameter tuning and over-sampling environment. Results indicated that the proposed framework provided significant outcomes, in which the highest percentage of area under curve is 99.24% and F-measure is 97.38% as performed by our trained model.
在线社交网络已经成为世界各地每个人的必需品。特别是,只要我们有社交媒体和社交网络分别作为传播信息和交流的平台,在线社交网络就可以让我们不分时间地相互联系。然而,这种演变导致人们可能犯下各种网络犯罪,例如网络欺凌。为了解决这个问题,机器学习可以用来对抗在线社交网络中的网络欺凌。因此,本研究利用Word2vec和亵渎词、专有名词、否定词、“allness”术语、弱化词和强化词六种类型的列表术语,提出了一个具有一组特征的框架,包括单词和字符术语频率——逆文档频率和单词嵌入。在将这些特征输入线性支持向量分类器之前,将其分为四组,以在超参数调整和过采样环境中使用ASKfm作为数据集来训练我们的模型。结果表明,所提出的框架提供了显著的结果,其中曲线下面积的最高百分比为99.24%,F度量为97.38%,正如我们训练的模型所执行的那样。
{"title":"IMPLEMENTATION OF HYPERPARAMETER OPTIMISATION AND OVER-SAMPLING IN DETECTING CYBERBULLYING USING MACHINE LEARNING APPROACH","authors":"Wan Noor Anira Wan Ali, M. Mohd, F. Fauzi, Kiyoaki Shirai, Muhammad Junaidi Mahamad Noor","doi":"10.22452/mjcs.sp2021no2.6","DOIUrl":"https://doi.org/10.22452/mjcs.sp2021no2.6","url":null,"abstract":"Online social networks have become a necessity to everyone around the world. Particularly, online social networks have enabled us to connect to one another regardless of time, for as long as we have social media and social networking as platforms for broadcasting information and communicating, respectively. However, this evolution has resulted in people possibly committing various cybercrimes, such as cyberbullying. To address this issue, machine learning can be utilised to counter cyberbullying in online social networks. Thus, this study proposed a framework with a set of features consisting of word and character term frequency–inverse document frequency and word embedding by using Word2vec and six types of list terms: profane words, proper nouns, negation words, ‘allness’ term, diminisher words and intensifier words. These features were divided into four groups before being fed into the linear support vector classifier to train our model using ASKfm as data set in hyperparameter tuning and over-sampling environment. Results indicated that the proposed framework provided significant outcomes, in which the highest percentage of area under curve is 99.24% and F-measure is 97.38% as performed by our trained model.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":0.6,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47125759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Malaysian Journal of Computer Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1