首页 > 最新文献

2022 International Conference on Data Science and Its Applications (ICoDSA)最新文献

英文 中文
What Affects User Satisfaction of Payroll Information Systems? 影响工资信息系统用户满意度的因素?
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862938
R. V. Priyanka Hafidz, Amelia Setiawan
With digital developments that continue to occur, it can affect several parts of the institution or company by changing the system from manual to online. With this progress, government institutions have also begun to implement a computerized system and payroll is one of the sections that is affected by it. For example, payroll which is usually given in person, can now be sent via bank transfer. This study was conducted to analyze the quality of information, system quality, and information system security on user satisfaction in the payroll section of a government institution, namely, the Center for Financial Transaction Reports and Analysis. This study will use data that has been processed in two ways. The first is one of the functions of Microsoft Excel, namely data analysis and the second uses SEM PLS analysis to test the three pre-determined hypotheses. The results of hypothesis testing indicate that the quality of information and information system security affect user satisfaction significantly, while system quality does not substantially affect user satisfaction. The limitation of this research is the limited number of employees in the payroll section of the Financial Transaction Reports and Analysis Center. Suggestions for further research are to use a more general section that has more employees.
随着数字化的不断发展,它可以通过将系统从手动更改为在线来影响机构或公司的几个部分。随着这一进展,政府机构也开始实施计算机化系统,工资是受其影响的部分之一。例如,通常亲自发放的工资单现在可以通过银行转账发送。本研究旨在分析某政府机构,即金融交易报告与分析中心的工资部门的信息质量、系统质量和信息系统安全对用户满意度的影响。本研究将使用以两种方式处理的数据。第一个是Microsoft Excel的功能之一,即数据分析,第二个是使用SEM PLS分析来检验三个预先确定的假设。假设检验结果表明,信息质量和信息系统安全对用户满意度影响显著,而系统质量对用户满意度影响不显著。本研究的局限性在于金融交易报告和分析中心的工资单部分的员工数量有限。对进一步研究的建议是使用一个更一般的部门,有更多的员工。
{"title":"What Affects User Satisfaction of Payroll Information Systems?","authors":"R. V. Priyanka Hafidz, Amelia Setiawan","doi":"10.1109/ICoDSA55874.2022.9862938","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862938","url":null,"abstract":"With digital developments that continue to occur, it can affect several parts of the institution or company by changing the system from manual to online. With this progress, government institutions have also begun to implement a computerized system and payroll is one of the sections that is affected by it. For example, payroll which is usually given in person, can now be sent via bank transfer. This study was conducted to analyze the quality of information, system quality, and information system security on user satisfaction in the payroll section of a government institution, namely, the Center for Financial Transaction Reports and Analysis. This study will use data that has been processed in two ways. The first is one of the functions of Microsoft Excel, namely data analysis and the second uses SEM PLS analysis to test the three pre-determined hypotheses. The results of hypothesis testing indicate that the quality of information and information system security affect user satisfaction significantly, while system quality does not substantially affect user satisfaction. The limitation of this research is the limited number of employees in the payroll section of the Financial Transaction Reports and Analysis Center. Suggestions for further research are to use a more general section that has more employees.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of Influencers and Community of Lazada Using Social Network Analysis 利用社会网络分析识别Lazada的影响者和社区
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862849
Nabila Ammara Diliana, Indrawati
Indonesia reaches more than 200 million internet users in 2022. Considering a large number of internet users in Indonesia, there are many digital-based businesses and one of them is e-commerce platforms. Understanding user in social media plays an important role in determining a suitable digital marketing strategy, one of which is by knowing the user interaction or User Generated Content in social media. The research related to User Generated Content is commonly used as one the data to analyze marketing strategy in social media. One of the previous research about User Generated Content as data to analyze marketing strategy for an educational platform and tourism platform. Both the previous study used User Generated Content and Social Network Analysis to analyze the data. Therefore, this study aims to form a network using data sources based on tweets on social media that models Lazada network interaction in Indonesia. We use User Generated Content as a source to find the influencer's and communities' representations of these are the two most important in deciding on a digital marketing strategy, especially for their social media. The results of this study identify the influencers and communities of the social network. The research will be worthwhile for business opportunities, policymakers, and Lazada.
到2022年,印度尼西亚的互联网用户将超过2亿。考虑到印尼有大量的互联网用户,有许多基于数字的企业,其中之一就是电子商务平台。了解社交媒体中的用户对于确定合适的数字营销策略起着重要的作用,其中之一就是了解社交媒体中的用户交互或用户生成内容。用户生成内容(User Generated Content)的相关研究通常被用作分析社交媒体营销策略的数据之一。之前的一项研究是将用户生成内容作为数据来分析教育平台和旅游平台的营销策略。之前的研究都使用了用户生成内容和社会网络分析来分析数据。因此,本研究旨在利用基于社交媒体推文的数据源形成一个网络,模拟Lazada在印度尼西亚的网络互动。我们使用用户生成内容作为来源,找到影响者和社区对这些内容的代表,这是决定数字营销策略的两个最重要的因素,特别是对于他们的社交媒体。本研究的结果确定了社会网络的影响者和社区。这项研究对商业机会、政策制定者和Lazada都是有价值的。
{"title":"Identification of Influencers and Community of Lazada Using Social Network Analysis","authors":"Nabila Ammara Diliana, Indrawati","doi":"10.1109/ICoDSA55874.2022.9862849","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862849","url":null,"abstract":"Indonesia reaches more than 200 million internet users in 2022. Considering a large number of internet users in Indonesia, there are many digital-based businesses and one of them is e-commerce platforms. Understanding user in social media plays an important role in determining a suitable digital marketing strategy, one of which is by knowing the user interaction or User Generated Content in social media. The research related to User Generated Content is commonly used as one the data to analyze marketing strategy in social media. One of the previous research about User Generated Content as data to analyze marketing strategy for an educational platform and tourism platform. Both the previous study used User Generated Content and Social Network Analysis to analyze the data. Therefore, this study aims to form a network using data sources based on tweets on social media that models Lazada network interaction in Indonesia. We use User Generated Content as a source to find the influencer's and communities' representations of these are the two most important in deciding on a digital marketing strategy, especially for their social media. The results of this study identify the influencers and communities of the social network. The research will be worthwhile for business opportunities, policymakers, and Lazada.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"abs/1402.4986 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128089975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DDoS Attack Detection System using Neural Network on Internet of Things 基于物联网的神经网络DDoS攻击检测系统
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862848
Lulus Wahyu Prasetya Adi, Satria Mandala, Y. Nugraha
Distributed Denial-of-Service (DDoS) is an attack launched over a computer network to make the server unable to provide services to users. DDoS is also effectively used to stop services on Internet of Things systems based on the message Queuing Telemetry Transport (MQTT) protocol. In the system, attackers usually attack brokers who are used to manage data traffic between the issuer and the customer. Several research projects have been undertaken to detect DDoS in the Internet of Things (IoT) using machine learning. However, existing research projects still generally have low detection accuracy in predicting DDoS. This study provides a solution to the above problems by proposing the development of a machine learning model based on Neural Network (NN) to detect DDoS. Furthermore, this study also compared the results of NN predictions with K-Nearest Neighbor (KNN). The methods used in this study are as follows: 1. Conducting literature studies. 2. Develop both machine learning models. 3. Conduct analysis. Rigorous experiments have been carried out using dataset derived from other research and dataset generated through DDOS simulations in IoT environments. By using the dataset generated through simulation, the results obtained showed that the accuracy of NN is better than KNN, which is 99.99% and 99.82%, respectively.
分布式拒绝服务(DDoS)是一种通过计算机网络发起的攻击,目的是使服务器无法向用户提供服务。DDoS还可以有效地用于停止基于消息队列遥测传输(MQTT)协议的物联网系统上的服务。在系统中,攻击者通常攻击用于管理发行者和客户之间数据流量的代理。已经开展了几个研究项目,利用机器学习检测物联网(IoT)中的DDoS。然而,现有的研究项目对DDoS的预测准确率普遍较低。本研究提出了一种基于神经网络(NN)的机器学习模型来检测DDoS,为上述问题提供了解决方案。此外,本研究还将神经网络预测结果与k -最近邻(KNN)进行了比较。本研究采用的方法如下:1。进行文献研究。2. 开发两种机器学习模型。3.进行分析。使用来自其他研究的数据集和通过物联网环境中的DDOS模拟生成的数据集进行了严格的实验。利用仿真生成的数据集,得到的结果表明,NN的准确率优于KNN,分别为99.99%和99.82%。
{"title":"DDoS Attack Detection System using Neural Network on Internet of Things","authors":"Lulus Wahyu Prasetya Adi, Satria Mandala, Y. Nugraha","doi":"10.1109/ICoDSA55874.2022.9862848","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862848","url":null,"abstract":"Distributed Denial-of-Service (DDoS) is an attack launched over a computer network to make the server unable to provide services to users. DDoS is also effectively used to stop services on Internet of Things systems based on the message Queuing Telemetry Transport (MQTT) protocol. In the system, attackers usually attack brokers who are used to manage data traffic between the issuer and the customer. Several research projects have been undertaken to detect DDoS in the Internet of Things (IoT) using machine learning. However, existing research projects still generally have low detection accuracy in predicting DDoS. This study provides a solution to the above problems by proposing the development of a machine learning model based on Neural Network (NN) to detect DDoS. Furthermore, this study also compared the results of NN predictions with K-Nearest Neighbor (KNN). The methods used in this study are as follows: 1. Conducting literature studies. 2. Develop both machine learning models. 3. Conduct analysis. Rigorous experiments have been carried out using dataset derived from other research and dataset generated through DDOS simulations in IoT environments. By using the dataset generated through simulation, the results obtained showed that the accuracy of NN is better than KNN, which is 99.99% and 99.82%, respectively.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127862285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Topic Classification in Indonesian-language Tweets using Fast-Text Feature Expansion with Support Vector Machine (SVM) 基于支持向量机快速文本特征扩展的印尼语推文主题分类
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862899
Imaduddin Muhammad Fadhil, Y. Sibaroni
Twitter is a popular social media platform that gives users the ability to send text messages with a maximum length of 280 characters which causes a lot of use of word variations that cause vocabulary writing errors and nowadays more and more tweets are spread and because of the very rapid spread it causes information overload. From the problems raised, it is necessary to be able to recognize words that have errors in writing and categorize tweets into certain categories. Therefore, this study aims to build a topic classification system on tweets that can study writing errors in a word and feature expansion using pretrained from FastText can be used to recognize writing errors in a word because the process of building word vectors from FastText can learn the internal structure of a word that will be used in the Support Vector Machine. The best results from this study get an accuracy of 76.88% with the application of feature expansion on top-1 but the application of feature expansion using pretrained classification Support Vector Machine.
Twitter是一个受欢迎的社交媒体平台,用户可以发送最长280个字符的短信,这会导致大量的单词变化,导致词汇写作错误,现在越来越多的推文被传播,由于传播速度非常快,导致信息过载。从提出的问题来看,有必要能够识别写作错误的单词,并将tweet分类为特定的类别。因此,本研究的目的是在推特上建立一个主题分类系统,该系统可以研究一个词的写作错误,使用FastText预训练的特征扩展可以用来识别一个词的写作错误,因为从FastText构建词向量的过程可以学习到一个词的内部结构,这些结构将被用于支持向量机。在top-1上应用特征展开的准确率为76.88%,而在预训练分类支持向量机上应用特征展开的准确率为76.88%。
{"title":"Topic Classification in Indonesian-language Tweets using Fast-Text Feature Expansion with Support Vector Machine (SVM)","authors":"Imaduddin Muhammad Fadhil, Y. Sibaroni","doi":"10.1109/ICoDSA55874.2022.9862899","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862899","url":null,"abstract":"Twitter is a popular social media platform that gives users the ability to send text messages with a maximum length of 280 characters which causes a lot of use of word variations that cause vocabulary writing errors and nowadays more and more tweets are spread and because of the very rapid spread it causes information overload. From the problems raised, it is necessary to be able to recognize words that have errors in writing and categorize tweets into certain categories. Therefore, this study aims to build a topic classification system on tweets that can study writing errors in a word and feature expansion using pretrained from FastText can be used to recognize writing errors in a word because the process of building word vectors from FastText can learn the internal structure of a word that will be used in the Support Vector Machine. The best results from this study get an accuracy of 76.88% with the application of feature expansion on top-1 but the application of feature expansion using pretrained classification Support Vector Machine.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131663076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ICoDSA 2022 Homepage
Pub Date : 2022-07-06 DOI: 10.1109/icodsa55874.2022.9862860
{"title":"ICoDSA 2022 Homepage","authors":"","doi":"10.1109/icodsa55874.2022.9862860","DOIUrl":"https://doi.org/10.1109/icodsa55874.2022.9862860","url":null,"abstract":"","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128514653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary Data Correction Simulation Using Convolutional Code on Additive White Gaussian Noise Channel 加性高斯白噪声信道上卷积码的二值数据校正仿真
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862932
H. Nuha, Tafta Zani, Muhammad Fadhly Ridha, Adiwijaya
Access to digital communications in remote areas requires a mechanism to increase the robustness of the transmitted data. Many areas in Indonesia still have difficulty accessing the Internet. This is because the location of the settlement is remote from the signal transmitter Convolutional codes are a technique to improve the reliability of data transmission. This article contains a simulation process of Convolutional Code from an application that we developed using Java. The basic difference between block codes and convolution codes in designing and evaluating is that Block codes are based on algebraic techniques or a combination whereas ConvCode based on construction techniques. Some of the excellent features of this application are demo encoding, modulation, noise generation on white Gaussian noise channels, and decoding using the Viterbi algorithm. The error correcting code process begins by checking the bit similarity (hamming distance) in the code word with the trellis diagram which will produce a path with weights depending on the hamming distance. With the Viterbi Algorithm, we will decode the codeword into the initial code by finding the highest probability (Maximum Likelihood) based on the Hamming distance from each state. Experiments show that the application successfully demonstrates the system's reliability to recover information signals damaged by noise.
在偏远地区使用数字通信需要一种机制来增加传输数据的稳健性。印度尼西亚的许多地区仍然难以接入互联网。这是因为结算的位置离信号发射机较远,卷积码是一种提高数据传输可靠性的技术。本文包含了我们使用Java开发的应用程序中卷积代码的模拟过程。分组码与卷积码在设计和评价上的根本区别在于分组码是基于代数技术或二者的组合,而卷积码是基于构造技术。该应用程序的一些优秀特性是演示编码、调制、高斯白噪声信道上的噪声生成以及使用Viterbi算法进行解码。纠错码过程首先通过格子图检查码字中的位相似性(汉明距离),这将产生一个基于汉明距离的权重路径。使用Viterbi算法,我们将根据每个状态的汉明距离找到最高概率(Maximum Likelihood),将码字解码为初始代码。实验结果表明,该系统对被噪声破坏的信息信号具有较好的恢复能力。
{"title":"Binary Data Correction Simulation Using Convolutional Code on Additive White Gaussian Noise Channel","authors":"H. Nuha, Tafta Zani, Muhammad Fadhly Ridha, Adiwijaya","doi":"10.1109/ICoDSA55874.2022.9862932","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862932","url":null,"abstract":"Access to digital communications in remote areas requires a mechanism to increase the robustness of the transmitted data. Many areas in Indonesia still have difficulty accessing the Internet. This is because the location of the settlement is remote from the signal transmitter Convolutional codes are a technique to improve the reliability of data transmission. This article contains a simulation process of Convolutional Code from an application that we developed using Java. The basic difference between block codes and convolution codes in designing and evaluating is that Block codes are based on algebraic techniques or a combination whereas ConvCode based on construction techniques. Some of the excellent features of this application are demo encoding, modulation, noise generation on white Gaussian noise channels, and decoding using the Viterbi algorithm. The error correcting code process begins by checking the bit similarity (hamming distance) in the code word with the trellis diagram which will produce a path with weights depending on the hamming distance. With the Viterbi Algorithm, we will decode the codeword into the initial code by finding the highest probability (Maximum Likelihood) based on the Hamming distance from each state. Experiments show that the application successfully demonstrates the system's reliability to recover information signals damaged by noise.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116459894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Separating Hate Speech from Abusive Language on Indonesian Twitter 区分印尼推特上的仇恨言论和辱骂语言
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862850
Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo
Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.
社交媒体是人们联系和传播信息的有效工具。然而,许多人经常利用社交媒体传播仇恨言论和辱骂性语言。与仇恨言论相反,辱骂性语言经常被用作笑话,没有冒犯个人或团体的目的,即使它们可能包含亵渎。因此,仇恨言论和辱骂语言之间的区别往往是模糊的。在许多情况下,传播仇恨言论的个人可能会受到起诉,因为这涉及法律问题。之前的研究主要集中在仇恨言论和正常推文的二元分类上。本研究旨在对印尼Twitter上的仇恨言论、辱骂语言和正常信息进行分类。一些机器学习模型,如逻辑回归和BERT模型,被用来完成文本分类任务。模型的性能使用F1-Score评估指标进行评估。结果表明,BERT模型在F1-Score方面优于其他模型,其中在社交媒体文本数据上进行预训练的BERT-indobenchmark模型的F1-Score最高,为85.59。这也表明使用社交媒体数据对BERT模型进行预训练可以显著改善分类模型。开发这种可以区分仇恨言论和辱骂性语言的分类模型将有助于个人防止具有法律影响的仇恨言论的传播。
{"title":"Separating Hate Speech from Abusive Language on Indonesian Twitter","authors":"Muhammad Amien Ibrahim, Noviyanti Tri Maretta Sagala, S. Arifin, R. Nariswari, N. Murnaka, P. W. Prasetyo","doi":"10.1109/ICoDSA55874.2022.9862850","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862850","url":null,"abstract":"Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124111137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extractive Text Summarization for Snippet Generation on Indonesian Search Engine using Sentence Transformers 基于句子变换的印尼语搜索引擎片段生成的抽取文本摘要
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862886
Komang Uning Sari Devi, Lya Hulliyyatus Suadaa
Search engine results usually show a list of retrieved document titles with document summaries to give a better preview of the retrieved documents, called snippet. This research proposes extractive text summarization models to generate a snippet. A new dataset is constructed for extractive text summarization tasks using Indonesian thesis documents, in which the targeted summaries were created manually by selecting important sentences. In generating snippets, we use Lead-3 and Textrank as baselines and propose fine-tuning Sentence Transformers (SBERT). Based on the evaluation results, SBERT generated a better summary than other baselines with 0.545 Rouge-1, 0.433 Rouge-2, and 0.474 Rouge-L.
搜索引擎结果通常显示检索到的文档标题列表和文档摘要,以便更好地预览检索到的文档,称为snippet。本研究提出了提取文本摘要模型来生成摘要。使用印尼语论文文档构建了一个新的数据集,用于提取文本摘要任务,其中通过选择重要句子手动创建目标摘要。在生成片段时,我们使用Lead-3和Textrank作为基线,并提出微调句子变形器(SBERT)。根据评价结果,SBERT以0.545 Rouge-1、0.433 Rouge-2和0.474 Rouge-L生成了比其他基线更好的总结。
{"title":"Extractive Text Summarization for Snippet Generation on Indonesian Search Engine using Sentence Transformers","authors":"Komang Uning Sari Devi, Lya Hulliyyatus Suadaa","doi":"10.1109/ICoDSA55874.2022.9862886","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862886","url":null,"abstract":"Search engine results usually show a list of retrieved document titles with document summaries to give a better preview of the retrieved documents, called snippet. This research proposes extractive text summarization models to generate a snippet. A new dataset is constructed for extractive text summarization tasks using Indonesian thesis documents, in which the targeted summaries were created manually by selecting important sentences. In generating snippets, we use Lead-3 and Textrank as baselines and propose fine-tuning Sentence Transformers (SBERT). Based on the evaluation results, SBERT generated a better summary than other baselines with 0.545 Rouge-1, 0.433 Rouge-2, and 0.474 Rouge-L.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124860561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ICoDSA 2022 Author Index ICoDSA 2022作者索引
Pub Date : 2022-07-06 DOI: 10.1109/icodsa55874.2022.9862893
{"title":"ICoDSA 2022 Author Index","authors":"","doi":"10.1109/icodsa55874.2022.9862893","DOIUrl":"https://doi.org/10.1109/icodsa55874.2022.9862893","url":null,"abstract":"","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125053051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality Control on Interpolation-based Reversible Audio Data Hiding using Bit Threshold 基于插值的可逆音频数据隐藏的比特阈值质量控制
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862889
Yoga Samudra, T. Ahmad
To solve daily problems, sometimes information needs to be accessible digitally or online by authorized parties. However, accessing some information may need secure data transmission. This reliable process can be achieved by using data security techniques to protect the transmitted information, such as cryptography or steganography. As part of data hiding methods, steganography works by hiding private information inside other public data, mainly multimedia data, in plain sight. Nevertheless, there are some main concerns regarding previous research, such as the similarity level and hiding capacity of the respective audio file. In this research, private information is embedded into an audio file as cover data so it can be retrieved later by the receiver without any suspicion of other parties. Therefore, this research focuses on providing flexibility in the embedding capacity of cover data by using an interpolation-based approach as the core sampling technique and then enhancing the similarity level by adding bit threshold values to evenly distribute hiding capacity on each sample as even as possible. Experimental results show that this proposed method can achieve an average similarity level of 103.56 dB on 100 Kb private data to an average of 93.95 dB on 300 Kb private data. It is better than some existing studies.
为了解决日常问题,有时需要授权方以数字方式或在线方式访问信息。然而,访问某些信息可能需要安全的数据传输。这个可靠的过程可以通过使用数据安全技术来保护传输的信息,例如加密或隐写术。隐写术作为数据隐藏方法的一部分,其工作原理是将私有信息隐藏在其他公共数据(主要是多媒体数据)中,使其一目了然。然而,在之前的研究中,存在一些主要的问题,例如各自音频文件的相似程度和隐藏能力。在这项研究中,私人信息被嵌入到音频文件中作为掩护数据,以便接收者稍后可以在不怀疑其他各方的情况下检索它。因此,本研究的重点是通过采用基于插值的方法作为核心采样技术,为覆盖数据的嵌入能力提供灵活性,然后通过增加位阈值来提高相似度,使隐藏能力尽可能均匀地分布在每个样本上。实验结果表明,该方法在100 Kb私有数据上的平均相似度为103.56 dB,在300 Kb私有数据上的平均相似度为93.95 dB。它比现有的一些研究要好。
{"title":"Quality Control on Interpolation-based Reversible Audio Data Hiding using Bit Threshold","authors":"Yoga Samudra, T. Ahmad","doi":"10.1109/ICoDSA55874.2022.9862889","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862889","url":null,"abstract":"To solve daily problems, sometimes information needs to be accessible digitally or online by authorized parties. However, accessing some information may need secure data transmission. This reliable process can be achieved by using data security techniques to protect the transmitted information, such as cryptography or steganography. As part of data hiding methods, steganography works by hiding private information inside other public data, mainly multimedia data, in plain sight. Nevertheless, there are some main concerns regarding previous research, such as the similarity level and hiding capacity of the respective audio file. In this research, private information is embedded into an audio file as cover data so it can be retrieved later by the receiver without any suspicion of other parties. Therefore, this research focuses on providing flexibility in the embedding capacity of cover data by using an interpolation-based approach as the core sampling technique and then enhancing the similarity level by adding bit threshold values to evenly distribute hiding capacity on each sample as even as possible. Experimental results show that this proposed method can achieve an average similarity level of 103.56 dB on 100 Kb private data to an average of 93.95 dB on 300 Kb private data. It is better than some existing studies.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129554267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Data Science and Its Applications (ICoDSA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1