首页 > 最新文献

2022 International Conference on Data Science and Its Applications (ICoDSA)最新文献

英文 中文
Separating Hate Speech from Abusive Language on Indonesian Twitter 区分印尼推特上的仇恨言论和辱骂语言
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862850
Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo
Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.
社交媒体是人们联系和传播信息的有效工具。然而,许多人经常利用社交媒体传播仇恨言论和辱骂性语言。与仇恨言论相反,辱骂性语言经常被用作笑话,没有冒犯个人或团体的目的,即使它们可能包含亵渎。因此,仇恨言论和辱骂语言之间的区别往往是模糊的。在许多情况下,传播仇恨言论的个人可能会受到起诉,因为这涉及法律问题。之前的研究主要集中在仇恨言论和正常推文的二元分类上。本研究旨在对印尼Twitter上的仇恨言论、辱骂语言和正常信息进行分类。一些机器学习模型,如逻辑回归和BERT模型,被用来完成文本分类任务。模型的性能使用F1-Score评估指标进行评估。结果表明,BERT模型在F1-Score方面优于其他模型,其中在社交媒体文本数据上进行预训练的BERT-indobenchmark模型的F1-Score最高,为85.59。这也表明使用社交媒体数据对BERT模型进行预训练可以显著改善分类模型。开发这种可以区分仇恨言论和辱骂性语言的分类模型将有助于个人防止具有法律影响的仇恨言论的传播。
{"title":"Separating Hate Speech from Abusive Language on Indonesian Twitter","authors":"Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo","doi":"10.1109/ICoDSA55874.2022.9862850","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862850","url":null,"abstract":"Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124111137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Caries Level Classification using K-Nearest Neighbor, Support Vector Machine, and Decision Tree using Zernike Moment Invariant Features 基于k近邻、支持向量机和Zernike矩不变特征的决策树的龋齿级分类
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862879
Y. Jusman, Muhammad Ahdan Fawwaz Nurkholid, Muhammad Fajrul Faiz, Sartika Puspita, Lady Olivia Evellyne, Kahfi Muhammad
Dental caries is the most common disease and is reported as one of the oldest diseases. To avoid the occurrence of dental caries, there are four ways; maintaining oral hygiene, consuming healthy food, adequate fluoride and giving fracture sealers. Regular dental check-ups can also reduce the risk of developing this disease. In detecting this disease, dentists often fail. This failure was due to the inability to detect early enamel lesions that had not yet developed into cavitation. In this regard, new techniques were developed to help detect this disease. This method uses 10-folds cross validation. This cross validation divides 90% (1256 images) for the train data and 10% (132 images) for the test. In this research using the Zernike moment method for feature extraction. The average results of training accuracy are 94.55%, 84.24%, and 88.46% and the average results of training times are 0.74, 1.63, and 0.77 seconds for K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree (DT), respectively. This research has obtained perfect performances of classification which are represented with AUC values more than 0.95 for each model.
龋齿是最常见的疾病,也是最古老的疾病之一。要避免蛀牙的发生,有四种方法;保持口腔卫生,食用健康食品,摄入充足的氟化物,并进行骨折封口剂。定期的牙齿检查也可以降低患这种疾病的风险。在诊断这种疾病时,牙医经常失败。这种失败是由于无法发现尚未发展成空化的早期牙釉质病变。在这方面,开发了新的技术来帮助发现这种疾病。该方法使用10倍交叉验证。这种交叉验证将90%(1256张图像)用于训练数据,10%(132张图像)用于测试。本研究采用泽尼克矩法进行特征提取。k -最近邻(KNN)、支持向量机(SVM)和决策树(DT)的平均训练准确率分别为94.55%、84.24%和88.46%,平均训练时间分别为0.74、1.63和0.77秒。本研究取得了较好的分类性能,每个模型的AUC值均大于0.95。
{"title":"Caries Level Classification using K-Nearest Neighbor, Support Vector Machine, and Decision Tree using Zernike Moment Invariant Features","authors":"Y. Jusman, Muhammad Ahdan Fawwaz Nurkholid, Muhammad Fajrul Faiz, Sartika Puspita, Lady Olivia Evellyne, Kahfi Muhammad","doi":"10.1109/ICoDSA55874.2022.9862879","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862879","url":null,"abstract":"Dental caries is the most common disease and is reported as one of the oldest diseases. To avoid the occurrence of dental caries, there are four ways; maintaining oral hygiene, consuming healthy food, adequate fluoride and giving fracture sealers. Regular dental check-ups can also reduce the risk of developing this disease. In detecting this disease, dentists often fail. This failure was due to the inability to detect early enamel lesions that had not yet developed into cavitation. In this regard, new techniques were developed to help detect this disease. This method uses 10-folds cross validation. This cross validation divides 90% (1256 images) for the train data and 10% (132 images) for the test. In this research using the Zernike moment method for feature extraction. The average results of training accuracy are 94.55%, 84.24%, and 88.46% and the average results of training times are 0.74, 1.63, and 0.77 seconds for K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree (DT), respectively. This research has obtained perfect performances of classification which are represented with AUC values more than 0.95 for each model.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134270827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning CNN Implementation on Packed Malware for Cloud Cross Domain Solution Filters 深度学习CNN在打包恶意软件的云跨域解决方案过滤器上的实现
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862936
Leonardo Aguilera, Doug Jacobson
This research focuses on Windows Portable Executable (PE) packed malware detection and Deep Learning (DL) using the Convolutional Neural Network (CNN) algorithm. Our primary goal is to improve the usage of DL techniques in Cybersecurity to strengthen the defenses against cyberattacks on U.S. Department of Defense (DoD) systems. According to our hypothesis, existing Cross Domain Solutions (CDSs) can be upgraded to include built-in DL-CNN algorithms for identifying well-crafted packed malware. To put this into perspective, implementing DL-CNN into the Cross Domain Solution (CDS) filter software will significantly enhance the effectiveness and detection of packed malware. CDSs are strategically positioned between unclassified and classified systems, and with DL-CNN capabilities, the CDS virus detection filter will learn to detect malware on its own, regardless of whether the malware is well-crafted, packed, or encrypted. Using our trained model, we were able to identify Windows packed PE malicious executables from Windows packed PE benign executables with an average training accuracy of 94 percent and a validation accuracy of 93 percent. Although the DL-CNN algorithm’s results could be enhanced through further development and refinement using KerasTuner, this research provides a solid foundation. Our experiments were conducted on our lab computer system and in the Amazon SageMaker Studio Lab and Google Collab cloud environments.
本研究的重点是使用卷积神经网络(CNN)算法的Windows可移植可执行文件(PE)打包恶意软件检测和深度学习(DL)。我们的主要目标是改进DL技术在网络安全中的使用,以加强对美国国防部(DoD)系统的网络攻击的防御。根据我们的假设,现有的跨域解决方案(cds)可以升级为包含内置的DL-CNN算法,用于识别精心制作的打包恶意软件。从这个角度来看,在跨域解决方案(CDS)过滤软件中实现DL-CNN将显著提高对打包恶意软件的有效性和检测。CDS被战略性地定位在非机密系统和机密系统之间,有了DL-CNN的功能,CDS病毒检测过滤器将学会自己检测恶意软件,而不管恶意软件是否精心制作、打包或加密。使用我们训练过的模型,我们能够识别Windows包装PE恶意可执行文件和Windows包装PE良性可执行文件,平均训练准确率为94%,验证准确率为93%。虽然DL-CNN算法的结果可以通过KerasTuner的进一步开发和改进来增强,但本研究提供了坚实的基础。我们的实验是在我们的实验室计算机系统、Amazon SageMaker Studio lab和Google Collab云环境中进行的。
{"title":"Deep Learning CNN Implementation on Packed Malware for Cloud Cross Domain Solution Filters","authors":"Leonardo Aguilera, Doug Jacobson","doi":"10.1109/ICoDSA55874.2022.9862936","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862936","url":null,"abstract":"This research focuses on Windows Portable Executable (PE) packed malware detection and Deep Learning (DL) using the Convolutional Neural Network (CNN) algorithm. Our primary goal is to improve the usage of DL techniques in Cybersecurity to strengthen the defenses against cyberattacks on U.S. Department of Defense (DoD) systems. According to our hypothesis, existing Cross Domain Solutions (CDSs) can be upgraded to include built-in DL-CNN algorithms for identifying well-crafted packed malware. To put this into perspective, implementing DL-CNN into the Cross Domain Solution (CDS) filter software will significantly enhance the effectiveness and detection of packed malware. CDSs are strategically positioned between unclassified and classified systems, and with DL-CNN capabilities, the CDS virus detection filter will learn to detect malware on its own, regardless of whether the malware is well-crafted, packed, or encrypted. Using our trained model, we were able to identify Windows packed PE malicious executables from Windows packed PE benign executables with an average training accuracy of 94 percent and a validation accuracy of 93 percent. Although the DL-CNN algorithm’s results could be enhanced through further development and refinement using KerasTuner, this research provides a solid foundation. Our experiments were conducted on our lab computer system and in the Amazon SageMaker Studio Lab and Google Collab cloud environments.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125052180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Filter-Based Feature Selection Method for Predicting Students’ Academic Performance 基于滤波器的特征选择方法预测学生学业成绩
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862883
Dafid, Ermatita
Generally, almost all higher education often face the same problem of improving their quality according to students' academic performance. The need to get early information about the poor students' academic performance has forced higher education to find the best solution that the prediction model could achieve. Data mining offers various algorithms for predicting. Therefore, constructing an accurate prediction model becomes a challenging task for higher education. Two factors that drive the accuracy of the prediction model are classifiers and feature selection. Each classifier gives the best result if it meets the appropriate categorized data on a dataset. A few research has provided excellent results in predicting students' academic performance. But, the research only focuses on the classification technique rather than the right feature selection. Vice versa, a few research have reported excellent results increasing the prediction model accuracy. But the research only focuses on feature selection techniques rather than carrying out the right classifier on the right data. Therefore, the prediction model has not given the best accuracy yet. Unlike than existing framework to build a model and select the features ignoring the categorized data on a dataset, this research proposes the right filter-based feature selection methods and the right classifiers based on categorized data. The result will help the researcher find the best combination of filter-based feature selection methods and classifiers. Various classification algorithms and various feature selections that have been tested show classification with appropriate classifiers for specific categorized data and proper feature selection increase the prediction model's accuracy.
一般来说,几乎所有的高等教育都面临着根据学生的学习成绩来提高质量的问题。由于需要尽早获得有关贫困学生学习成绩的信息,高等教育不得不寻找预测模型所能实现的最佳解决方案。数据挖掘提供了各种预测算法。因此,构建准确的预测模型成为高等教育面临的一项具有挑战性的任务。驱动预测模型准确性的两个因素是分类器和特征选择。如果每个分类器满足数据集上适当的分类数据,则给出最佳结果。一些研究在预测学生的学习成绩方面提供了很好的结果。但是,目前的研究主要集中在分类技术上,而不是正确的特征选择。反之,也有少数研究报告了极好的结果,提高了预测模型的准确性。但是研究只关注特征选择技术,而不是在正确的数据上进行正确的分类器。因此,预测模型还没有给出最好的精度。与现有的忽略数据集上已分类数据的模型构建和特征选择框架不同,本研究提出了基于过滤器的特征选择方法和基于已分类数据的分类器。该结果将帮助研究者找到基于滤波器的特征选择方法和分类器的最佳组合。经过测试的各种分类算法和各种特征选择表明,针对特定的分类数据使用合适的分类器进行分类,适当的特征选择可以提高预测模型的准确性。
{"title":"Filter-Based Feature Selection Method for Predicting Students’ Academic Performance","authors":"Dafid, Ermatita","doi":"10.1109/ICoDSA55874.2022.9862883","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862883","url":null,"abstract":"Generally, almost all higher education often face the same problem of improving their quality according to students' academic performance. The need to get early information about the poor students' academic performance has forced higher education to find the best solution that the prediction model could achieve. Data mining offers various algorithms for predicting. Therefore, constructing an accurate prediction model becomes a challenging task for higher education. Two factors that drive the accuracy of the prediction model are classifiers and feature selection. Each classifier gives the best result if it meets the appropriate categorized data on a dataset. A few research has provided excellent results in predicting students' academic performance. But, the research only focuses on the classification technique rather than the right feature selection. Vice versa, a few research have reported excellent results increasing the prediction model accuracy. But the research only focuses on feature selection techniques rather than carrying out the right classifier on the right data. Therefore, the prediction model has not given the best accuracy yet. Unlike than existing framework to build a model and select the features ignoring the categorized data on a dataset, this research proposes the right filter-based feature selection methods and the right classifiers based on categorized data. The result will help the researcher find the best combination of filter-based feature selection methods and classifiers. Various classification algorithms and various feature selections that have been tested show classification with appropriate classifiers for specific categorized data and proper feature selection increase the prediction model's accuracy.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ICoDSA 2022 Committee
Pub Date : 2022-07-06 DOI: 10.1109/icodsa55874.2022.9862823
{"title":"ICoDSA 2022 Committee","authors":"","doi":"10.1109/icodsa55874.2022.9862823","DOIUrl":"https://doi.org/10.1109/icodsa55874.2022.9862823","url":null,"abstract":"","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Commerce from Seller and Region Perspective: A Data Mining for Indonesian E-commerce 卖家与区域视角下的社交商务:印尼电子商务的数据挖掘
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862835
Gunawan
As a subset of e-commerce, social commerce grows fast in many countries. Studies on social commerce primarily focused on consumer behavior, especially purchase intention. This study takes a different perspective by focusing on the e-commerce sellers as an aggregate in regions within a country. The study object is provinces in Indonesia, a country with the most prominent e-commerce and social commerce among Southeast Asian countries. The general objective of this study is to characterize social commerce firms across regions in Indonesia. The specific objectives are (1) to group provinces based on the e-commerce and social commerce-related variables and (2) to specify a group of provinces based on business and e-commerce profiles. This secondary and quantitative research adopts a data mining approach to analyze the official data from the BPS-Statistics Indonesia. The Cross-Industry Standard Process for Data Mining framework was adopted as a methodology and the Knime Analytics Platform as a computational software. The result classifies provinces into two: high and low social commerce. Provinces with high social commerce firms are characterized by younger entrepreneurs, more entrepreneurs with university backgrounds, newer e-commerce establishments, more fashion and beauty products, more resellers, and more revenue from the social media channel. Local governments might consider the finding to understand their province's position in the cluster and make policies to increase social commerce entrepreneurs.
作为电子商务的一个分支,社交商务在许多国家发展迅速。对社交商务的研究主要集中在消费者行为,尤其是购买意愿方面。本研究采用了不同的视角,将电子商务卖家作为一个整体集中在一个国家的各个地区。研究对象为印尼的省份,印尼是东南亚国家中电子商务和社交商务最为突出的国家。本研究的总体目标是表征印尼各地区的社交商务公司。具体目标是:(1)根据电子商务和社会电子商务相关变量对省份进行分组,(2)根据商业和电子商务概况指定一组省份。这项二级定量研究采用数据挖掘方法来分析来自BPS-Statistics Indonesia的官方数据。采用跨行业数据挖掘标准流程框架作为方法,采用Knime分析平台作为计算软件。结果将各省分为高社会商业和低社会商业两类。社交电商企业数量较多的省份,其特点是企业家更年轻,拥有大学背景的企业家更多,电子商务机构更新,时尚和美容产品更多,经销商更多,社交媒体渠道收入更多。地方政府可以考虑这一发现,以了解本省在集群中的地位,并制定政策来增加社交商务企业家。
{"title":"Social Commerce from Seller and Region Perspective: A Data Mining for Indonesian E-commerce","authors":"Gunawan","doi":"10.1109/ICoDSA55874.2022.9862835","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862835","url":null,"abstract":"As a subset of e-commerce, social commerce grows fast in many countries. Studies on social commerce primarily focused on consumer behavior, especially purchase intention. This study takes a different perspective by focusing on the e-commerce sellers as an aggregate in regions within a country. The study object is provinces in Indonesia, a country with the most prominent e-commerce and social commerce among Southeast Asian countries. The general objective of this study is to characterize social commerce firms across regions in Indonesia. The specific objectives are (1) to group provinces based on the e-commerce and social commerce-related variables and (2) to specify a group of provinces based on business and e-commerce profiles. This secondary and quantitative research adopts a data mining approach to analyze the official data from the BPS-Statistics Indonesia. The Cross-Industry Standard Process for Data Mining framework was adopted as a methodology and the Knime Analytics Platform as a computational software. The result classifies provinces into two: high and low social commerce. Provinces with high social commerce firms are characterized by younger entrepreneurs, more entrepreneurs with university backgrounds, newer e-commerce establishments, more fashion and beauty products, more resellers, and more revenue from the social media channel. Local governments might consider the finding to understand their province's position in the cluster and make policies to increase social commerce entrepreneurs.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"4193 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127571261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Algorithm for Classifying Heart Disease Diagnose 心脏疾病诊断分类的增强算法
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862861
Patrik Gunti Pratama, Dedy Rahman Wijaya, Heru Nugroho, Rathimala Kannan
The heart is a component of the human body that is responsible for pumping blood and distributing oxygen throughout the body. Hospitals and doctors are still checking heart disease diagnoses manually at this time. However, this method is expensive and time-consuming. In this study, the Gradient Tree Boosting (GTB) algorithm was used to detect patients diagnosed with heart disease (disease and no disease). The purpose of the method is to provide convenience to obtain early information on heart health. With the dataset provided from the UCI Machine Learning Repository, there are 13 supporting features to detect heart disease with a total of 304 data. This study uses the GTB model with the best four parameters and utilizes feature selection which is used to classify. From the results of the study to get a recall score of 0.98, the proposed method succeeded in classifying patients who were diagnosed with heart disease correctly.
心脏是人体的一个组成部分,负责向全身输送血液和分配氧气。目前,医院和医生仍在手工检查心脏病诊断。然而,这种方法既昂贵又耗时。本研究采用梯度树增强(Gradient Tree Boosting, GTB)算法对诊断为心脏病(有和无疾病)的患者进行检测。该方法的目的是为早期获得心脏健康信息提供方便。使用UCI机器学习存储库提供的数据集,有13个支持功能,共304个数据来检测心脏病。本研究采用四参数最优的GTB模型,并利用特征选择进行分类。从研究结果得到的回忆分数为0.98,所提出的方法成功地对被诊断为心脏病的患者进行了正确的分类。
{"title":"Boosting Algorithm for Classifying Heart Disease Diagnose","authors":"Patrik Gunti Pratama, Dedy Rahman Wijaya, Heru Nugroho, Rathimala Kannan","doi":"10.1109/ICoDSA55874.2022.9862861","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862861","url":null,"abstract":"The heart is a component of the human body that is responsible for pumping blood and distributing oxygen throughout the body. Hospitals and doctors are still checking heart disease diagnoses manually at this time. However, this method is expensive and time-consuming. In this study, the Gradient Tree Boosting (GTB) algorithm was used to detect patients diagnosed with heart disease (disease and no disease). The purpose of the method is to provide convenience to obtain early information on heart health. With the dataset provided from the UCI Machine Learning Repository, there are 13 supporting features to detect heart disease with a total of 304 data. This study uses the GTB model with the best four parameters and utilizes feature selection which is used to classify. From the results of the study to get a recall score of 0.98, the proposed method succeeded in classifying patients who were diagnosed with heart disease correctly.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127631158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Named Entity Recognition for Drone Forensic Using BERT and DistilBERT 基于BERT和DistilBERT的无人机取证命名实体识别
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862916
Swardiantara Silalahi, T. Ahmad, H. Studiawan
The increase in UAV usage and popularity in many fields opens new opportunities and challenges. Many business sectors are benefiting from the UAV device employment. The wide range of drone implementation is varied, from business purposes to crime. Hence, further mechanisms are needed to deal with drone crime and attacks both administratively and technically. From a technical view, the security protocol is needed to keep the drone safe from various logical or physical attacks. In case a drone experiences incidents, a forensic protocol is needed to perform analysis and investigation to uncover the incident, understand the attack behavior, and mitigate the incident risk. Among the existing drone forensic research efforts, there is limited attempt to utilize specific drone artifacts to perform forensic analysis. Therefore, this paper investigates the potential of NER (Named Entity Recognition) as an initial step to perform information extraction from drone flight logs data. We use Transformers-based techniques to perform NER and assist the forensic investigation. BERT and DistilBERT pre-trained models are fine-tuned using the annotated data and get the F1 scores of 98.63% and of 95.9%, respectively.
无人机在许多领域的使用和普及带来了新的机遇和挑战。许多商业部门都受益于无人机设备的就业。无人机的应用范围很广,从商业目的到犯罪目的都有。因此,需要进一步的机制在行政和技术上处理无人机犯罪和攻击。从技术角度来看,需要安全协议来保护无人机免受各种逻辑或物理攻击。如果无人机遇到事故,需要一个取证协议来执行分析和调查,以发现事件,了解攻击行为,并降低事件风险。在现有的无人机法医研究工作中,利用特定无人机文物进行法医分析的尝试有限。因此,本文研究了NER(命名实体识别)作为从无人机飞行日志数据中进行信息提取的第一步的潜力。我们使用基于transformer的技术来执行NER并协助法医调查。使用标注数据对BERT和DistilBERT预训练模型进行微调,分别获得98.63%和95.9%的F1分数。
{"title":"Named Entity Recognition for Drone Forensic Using BERT and DistilBERT","authors":"Swardiantara Silalahi, T. Ahmad, H. Studiawan","doi":"10.1109/ICoDSA55874.2022.9862916","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862916","url":null,"abstract":"The increase in UAV usage and popularity in many fields opens new opportunities and challenges. Many business sectors are benefiting from the UAV device employment. The wide range of drone implementation is varied, from business purposes to crime. Hence, further mechanisms are needed to deal with drone crime and attacks both administratively and technically. From a technical view, the security protocol is needed to keep the drone safe from various logical or physical attacks. In case a drone experiences incidents, a forensic protocol is needed to perform analysis and investigation to uncover the incident, understand the attack behavior, and mitigate the incident risk. Among the existing drone forensic research efforts, there is limited attempt to utilize specific drone artifacts to perform forensic analysis. Therefore, this paper investigates the potential of NER (Named Entity Recognition) as an initial step to perform information extraction from drone flight logs data. We use Transformers-based techniques to perform NER and assist the forensic investigation. BERT and DistilBERT pre-trained models are fine-tuned using the annotated data and get the F1 scores of 98.63% and of 95.9%, respectively.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130947668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Sellybot: Conversational Recommender System Based on Functional Requirements Sellybot:基于功能需求的会话推荐系统
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862908
Nurani Solechah, Z. Baizal, N. Ikhsan
Recently, high-tech products are very fast in issuing new types. For example, smartphones have various brands and types with different specifications. This condition triggers doubts among the public to buy the product due to limited knowledge about the technical specifications that suit their needs. Therefore, it is necessary to develop a recommender system based on product functional requirements. In our prior work, a Conversational Recommender System (CRS) has been developed to recommend smartphones based on high-level requirements (product functional requirements) by combining Navigation by Asking (NBA) and Navigation by Proposing (NBP). Thus, users who are unfamiliar with the technical features of the product can express their needs more easily. However, the system uses a dialog form, so users are still less flexible in expressing their needs. In this study, we further develop this research by building Sellybot, a CRS that uses natural language in its interactions with users. We built Sellybot using the RASA framework. Evaluation is done by observing the accuracy and user satisfaction. The evaluation results show that the system has an accuracy of 84.8% and for the questionnaire, it is found that 80.3% of users choose Sellybot, where users feel more flexible in using the system, and get a better experience.
最近,高科技产品的新产品层出不穷。例如,智能手机有不同的品牌和类型,不同的规格。这种情况引发了公众对购买该产品的疑虑,因为他们对适合自己需求的技术规格了解有限。因此,有必要开发一个基于产品功能需求的推荐系统。在我们之前的工作中,我们已经开发了一个会话推荐系统(CRS),通过结合询问导航(NBA)和建议导航(NBP),根据高级需求(产品功能需求)推荐智能手机。这样,不熟悉产品技术特性的用户可以更容易地表达他们的需求。然而,该系统使用对话框形式,因此用户在表达他们的需求时仍然不够灵活。在这项研究中,我们通过构建Sellybot来进一步发展这一研究,Sellybot是一种在与用户交互时使用自然语言的CRS。我们使用RASA框架构建Sellybot。评估通过观察准确性和用户满意度来完成。评估结果表明,系统的准确率为84.8%,对于问卷调查,发现80.3%的用户选择Sellybot,用户在使用系统时感觉更灵活,获得了更好的体验。
{"title":"Sellybot: Conversational Recommender System Based on Functional Requirements","authors":"Nurani Solechah, Z. Baizal, N. Ikhsan","doi":"10.1109/ICoDSA55874.2022.9862908","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862908","url":null,"abstract":"Recently, high-tech products are very fast in issuing new types. For example, smartphones have various brands and types with different specifications. This condition triggers doubts among the public to buy the product due to limited knowledge about the technical specifications that suit their needs. Therefore, it is necessary to develop a recommender system based on product functional requirements. In our prior work, a Conversational Recommender System (CRS) has been developed to recommend smartphones based on high-level requirements (product functional requirements) by combining Navigation by Asking (NBA) and Navigation by Proposing (NBP). Thus, users who are unfamiliar with the technical features of the product can express their needs more easily. However, the system uses a dialog form, so users are still less flexible in expressing their needs. In this study, we further develop this research by building Sellybot, a CRS that uses natural language in its interactions with users. We built Sellybot using the RASA framework. Evaluation is done by observing the accuracy and user satisfaction. The evaluation results show that the system has an accuracy of 84.8% and for the questionnaire, it is found that 80.3% of users choose Sellybot, where users feel more flexible in using the system, and get a better experience.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133692110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of The Quality of The School Website using WEBUSE and IPA 用WEBUSE和IPA评价学校网站的质量
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862534
Eric Reynara Karoza, S. Widowati, Arfive Gandhi
A website is a collection of interconnected web pages that can be accessed by the public and share a single domain. Individuals, clubs, enterprises, and organizations can construct and maintain websites for a variety of objectives. The website offers an almost limitless number of options that may be used anywhere and at any time. One of them is in the field of education. SMA Harapan 1 Medan is one of the schools that use the internet as a source of information. However, according to the website administrator, the website (sma1.harapan.ac.id) continues to have issues with features, appearance, and insufficient information. The WEBUSE (Website Usability Evaluation Tools) method can be used to tackle the problem on the website. WEBUSE is a questionnaire-based usability evaluation approach for assessing the usability of a website. WEBUSE was chosen because website problems are categorized in WEBUSE, and it can assess usability across all types of websites and domains. The evaluation's findings will be examined using Importance Performance Analysis (IPA). The Importance Performance Analysis approach to determine the level of conformance and how satisfied users are with the website is also used to determine which parts need to be fixed or maintained. The results became insight to formulate improvement strategies. The analysis findings will be considered while creating the improvement plan.
一个网站是一个相互连接的网页的集合,这些网页可以被公众访问并共享一个域。个人、俱乐部、企业和组织都可以为各种目的建立和维护网站。该网站提供了几乎无限的选择,可以随时随地使用。其中之一是在教育领域。SMA Harapan 1棉兰是使用互联网作为信息来源的学校之一。然而,根据网站管理员的说法,网站(sma1.harapan.ac.id)仍然存在功能,外观和信息不足的问题。WEBUSE(网站可用性评估工具)方法可以用来解决网站上的问题。WEBUSE是一种基于问卷的可用性评估方法,用于评估网站的可用性。之所以选择WEBUSE,是因为网站问题可以在WEBUSE中分类,并且它可以评估所有类型的网站和域的可用性。评估结果将使用重要性绩效分析(IPA)进行检查。重要性性能分析方法用于确定一致性水平以及用户对网站的满意程度,也用于确定需要修复或维护的部分。结果成为制定改进策略的洞察力。在制定改进计划时将考虑分析结果。
{"title":"Evaluation of The Quality of The School Website using WEBUSE and IPA","authors":"Eric Reynara Karoza, S. Widowati, Arfive Gandhi","doi":"10.1109/ICoDSA55874.2022.9862534","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862534","url":null,"abstract":"A website is a collection of interconnected web pages that can be accessed by the public and share a single domain. Individuals, clubs, enterprises, and organizations can construct and maintain websites for a variety of objectives. The website offers an almost limitless number of options that may be used anywhere and at any time. One of them is in the field of education. SMA Harapan 1 Medan is one of the schools that use the internet as a source of information. However, according to the website administrator, the website (sma1.harapan.ac.id) continues to have issues with features, appearance, and insufficient information. The WEBUSE (Website Usability Evaluation Tools) method can be used to tackle the problem on the website. WEBUSE is a questionnaire-based usability evaluation approach for assessing the usability of a website. WEBUSE was chosen because website problems are categorized in WEBUSE, and it can assess usability across all types of websites and domains. The evaluation's findings will be examined using Importance Performance Analysis (IPA). The Importance Performance Analysis approach to determine the level of conformance and how satisfied users are with the website is also used to determine which parts need to be fixed or maintained. The results became insight to formulate improvement strategies. The analysis findings will be considered while creating the improvement plan.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123882369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Data Science and Its Applications (ICoDSA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1