首页 > 最新文献

2022 International Conference on Data Science and Its Applications (ICoDSA)最新文献

英文 中文
Potentials of Clinical Pathway Analysis Using Process Mining on the Indonesia National Health Insurance Data Samples: an Exploratory Data Analysis 利用过程挖掘对印度尼西亚国民健康保险数据样本进行临床路径分析的潜力:探索性数据分析
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862851
A. Kurniati, G. Wisudiawan, G. Kusuma
Clinical pathway analysis is an important analysis in the healthcare domain. This approach learns from historical data of patient pathways of clinical treatments and finds patterns to be used for further purposes, including treatment recommendations and precision medicine. Indonesia has the opportunity for clinical pathway analysis by using the Indonesia national health insurance data samples provided by the Social Security Administrator-Healthcare. The data samples are representative of the Indonesian population and potentially useful for initial explorations of clinical pathways in Indonesia. This study applied an exploratory data analysis using process mining for clinical pathway analysis. Process mining is a promising approach to learn from time-stamped datasets to find sequenced clinical pathway patterns. We examine the data samples carefully to define the minimum components of process mining for clinical pathway analysis and provide samples of the results of the clinical pathway analysis. Contributions of this study are two folds: to promote process mining for clinical pathway analysis and to present a case study of clinical pathway analysis using the Indonesia National Health Insurance data samples. The contributions of this paper are to promote clinical pathway analysis to improving health services using real data from BPJS Kesehatan system, and to propose a method for clinical pathway analysis based on process mining. The results of this study are disease trajectory visualization through a process model and statistics evaluating the performance of the results using process mining techniques.
临床路径分析是医疗保健领域的一项重要分析。这种方法从患者临床治疗路径的历史数据中学习,并找到用于进一步目的的模式,包括治疗建议和精准医疗。印度尼西亚有机会通过使用社会保障管理员-保健部门提供的印度尼西亚国民健康保险数据样本进行临床途径分析。这些数据样本代表了印度尼西亚人口,可能对印度尼西亚临床途径的初步探索有用。本研究采用探索性数据分析,利用过程挖掘进行临床路径分析。过程挖掘是一种很有前途的方法,可以从带有时间戳的数据集中学习,从而找到有序的临床路径模式。我们仔细检查数据样本,以定义临床路径分析过程挖掘的最小组成部分,并提供临床路径分析结果的样本。本研究的贡献有两个方面:促进临床路径分析的过程挖掘,并使用印度尼西亚国民健康保险数据样本进行临床路径分析的案例研究。本文的贡献在于利用BPJS Kesehatan系统的真实数据,促进临床路径分析对改善卫生服务的作用,并提出一种基于流程挖掘的临床路径分析方法。本研究的结果是通过过程模型将疾病轨迹可视化,并使用过程挖掘技术对结果的性能进行统计评估。
{"title":"Potentials of Clinical Pathway Analysis Using Process Mining on the Indonesia National Health Insurance Data Samples: an Exploratory Data Analysis","authors":"A. Kurniati, G. Wisudiawan, G. Kusuma","doi":"10.1109/ICoDSA55874.2022.9862851","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862851","url":null,"abstract":"Clinical pathway analysis is an important analysis in the healthcare domain. This approach learns from historical data of patient pathways of clinical treatments and finds patterns to be used for further purposes, including treatment recommendations and precision medicine. Indonesia has the opportunity for clinical pathway analysis by using the Indonesia national health insurance data samples provided by the Social Security Administrator-Healthcare. The data samples are representative of the Indonesian population and potentially useful for initial explorations of clinical pathways in Indonesia. This study applied an exploratory data analysis using process mining for clinical pathway analysis. Process mining is a promising approach to learn from time-stamped datasets to find sequenced clinical pathway patterns. We examine the data samples carefully to define the minimum components of process mining for clinical pathway analysis and provide samples of the results of the clinical pathway analysis. Contributions of this study are two folds: to promote process mining for clinical pathway analysis and to present a case study of clinical pathway analysis using the Indonesia National Health Insurance data samples. The contributions of this paper are to promote clinical pathway analysis to improving health services using real data from BPJS Kesehatan system, and to propose a method for clinical pathway analysis based on process mining. The results of this study are disease trajectory visualization through a process model and statistics evaluating the performance of the results using process mining techniques.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126242489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sparse Code Multiple Access Decoding Using Message-Passing Algorithm 基于消息传递算法的稀疏码多址译码
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862831
Lathifa Rizqi Andhary, H. Nuha, T. Haryanti
In no time, 4G will be replaced by 5G. Furthermore, 5G still uses the Orthogonal Frequency Division Multiple Access schemes. However, Orthogonal Frequency Division Multiple Access has shortcomings in terms of massive connectivity, so a new scheme is needed to replace Orthogonal Frequency Division Multiple Access for 5G. So, this paper discusses the performance evaluation of the Sparse Code Multiple Access for 5G. In addition to the Orthogonal Frequency Division Multiple Access, a message-passing algorithm can also be used to decode Sparse Code Multiple Access messages so they can deal with large amounts of traffic. If the amount of traffic is large, it can accommodate big number of users in the use of 5G. The message-passing algorithm can meet 5G specifications, so it is suitable for 5G use. The performance of Sparse Code Multiple Access will confirm its usability for the 5G system. So, the message-passing algorithm can detect the user information on Sparse Code Multiple Access, to increasing complexity and to find bit error rate in the algorithm. This paper uses the message-passing algorithm to test the bit error rate on six types of codebooks. From all the codebooks, the codebook with smallest bit error rate is best codebook to choose for Sparse Code Multiple Access.
很快,4G就会被5G取代。此外,5G仍然使用正交频分多址方案。然而,正交频分多址在海量连接方面存在不足,因此5G需要一种新的方案来取代正交频分多址。为此,本文对5G稀疏码多址的性能评价进行了探讨。除了正交频分多址外,还可以使用消息传递算法对稀疏码多址消息进行解码,使其能够处理大量流量。如果流量大,则可以容纳大量用户使用5G。消息传递算法满足5G规范,适合5G使用。稀疏码多址的性能将证实其在5G系统中的可用性。因此,消息传递算法可以检测稀疏码多址下的用户信息,从而增加了算法的复杂度和查找误码率。本文采用消息传递算法对六种码本进行误码率测试。在所有码本中,选择误码率最小的码本是稀疏码多址的最佳码本。
{"title":"Sparse Code Multiple Access Decoding Using Message-Passing Algorithm","authors":"Lathifa Rizqi Andhary, H. Nuha, T. Haryanti","doi":"10.1109/ICoDSA55874.2022.9862831","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862831","url":null,"abstract":"In no time, 4G will be replaced by 5G. Furthermore, 5G still uses the Orthogonal Frequency Division Multiple Access schemes. However, Orthogonal Frequency Division Multiple Access has shortcomings in terms of massive connectivity, so a new scheme is needed to replace Orthogonal Frequency Division Multiple Access for 5G. So, this paper discusses the performance evaluation of the Sparse Code Multiple Access for 5G. In addition to the Orthogonal Frequency Division Multiple Access, a message-passing algorithm can also be used to decode Sparse Code Multiple Access messages so they can deal with large amounts of traffic. If the amount of traffic is large, it can accommodate big number of users in the use of 5G. The message-passing algorithm can meet 5G specifications, so it is suitable for 5G use. The performance of Sparse Code Multiple Access will confirm its usability for the 5G system. So, the message-passing algorithm can detect the user information on Sparse Code Multiple Access, to increasing complexity and to find bit error rate in the algorithm. This paper uses the message-passing algorithm to test the bit error rate on six types of codebooks. From all the codebooks, the codebook with smallest bit error rate is best codebook to choose for Sparse Code Multiple Access.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121398802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Handwriting Analysis for Personality Trait Features Identification using CNN 使用CNN进行个性特征识别的笔迹分析
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862910
Derry Alamsyah, Samsuryadi, Wijang Widhiarsho, S. Hasan
Handwriting analysis is an approach to get information through the handwriting. It extremely useful information, for instance in personality traits identification. The information came from the feature extracted from the handwriting. This feature can be size, slantness, pressure, and so forth. In this research, handwriting analysis is through the AND dataset that provide handwriting dataset along with feature label while most public dataset has nothing with it. By using the Coonvolutional Neural Networks (CNN) potentiality in capturing and recognizing global features, there are 15 models had built in this research in accordance with each feature and divided into three group by its number of types. After built a simple CNN architecture by only conduct two convolution layer, overall result show fair enough performance where the highest rate of accuracy is 80.88%. Furthermore, there are three best features had recognized, which is "entry stroke ‘A’", "size", and "slantness", where the last two is naturally global features. However, the fact that handwriting image data cannot be oversampled which can lead to the bias result, than the imbalance data becomes a problem in this research that reduced the model performance.
笔迹分析是一种通过笔迹获取信息的方法。这是非常有用的信息,例如在人格特征识别方面。这些信息来自于从笔迹中提取的特征。这个特性可以是尺寸、倾斜度、压力等等。在本研究中,笔迹分析是通过AND数据集进行的,AND数据集提供笔迹数据集和特征标签,而大多数公共数据集没有特征标签。利用卷积神经网络(CNN)在捕获和识别全局特征方面的潜力,本研究根据每个特征构建了15个模型,并按其类型数量分为三组。仅通过两个卷积层构建简单的CNN架构后,总体结果显示出足够好的性能,最高准确率为80.88%。此外,还识别了三个最佳特征,即“入口笔划A”、“大小”和“倾斜度”,其中后两个特征是自然的全局特征。然而,手写体图像数据不能过采样,导致结果偏倚,数据不平衡成为本研究的一个问题,降低了模型的性能。
{"title":"Handwriting Analysis for Personality Trait Features Identification using CNN","authors":"Derry Alamsyah, Samsuryadi, Wijang Widhiarsho, S. Hasan","doi":"10.1109/ICoDSA55874.2022.9862910","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862910","url":null,"abstract":"Handwriting analysis is an approach to get information through the handwriting. It extremely useful information, for instance in personality traits identification. The information came from the feature extracted from the handwriting. This feature can be size, slantness, pressure, and so forth. In this research, handwriting analysis is through the AND dataset that provide handwriting dataset along with feature label while most public dataset has nothing with it. By using the Coonvolutional Neural Networks (CNN) potentiality in capturing and recognizing global features, there are 15 models had built in this research in accordance with each feature and divided into three group by its number of types. After built a simple CNN architecture by only conduct two convolution layer, overall result show fair enough performance where the highest rate of accuracy is 80.88%. Furthermore, there are three best features had recognized, which is \"entry stroke ‘A’\", \"size\", and \"slantness\", where the last two is naturally global features. However, the fact that handwriting image data cannot be oversampled which can lead to the bias result, than the imbalance data becomes a problem in this research that reduced the model performance.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132815932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Noisy Seismic Traces Classification Using Principal Component Analysis 基于主成分分析的噪声地震道分类
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862872
H. Nuha, Abdi T. Abdalla
Seismic Data, an exploration method of sending energy or sound waves into the earth and recording the wave reflections to reveal essential subsurface rock information including type, size, shape and depth. Seismic data acquisition typically produces a significant data size. Seismic files within a survey may include useless noisy traces that increase the file size. Noisy traces have some noticeable features which can be exploited to aid the denoising process. In this work, the features were formulated based on the Principal Component Analysis (PCA) to automatically distinguish excellent traces from noisy traces. PCA projects the seismic trace features to a lower dimension with only two features. To classify and detect noisy traces, we first select the dataset and generate Gaussian noise, then add the noise to the selected dataset and then normalize the traces before extracting the features: threshold algorithm, histogram algorithm, and zero-crossing algorithm and finally apply the PCA to obtain the projected data. In this work, two types of artificial noises were generated. It is shown that PCA is able to separate two types of noisy seismic traces. PCA projections show that at high noise contamination, the method is unable to separate the noisy and clean seismic traces.
地震数据是一种将能量或声波送入地球并记录声波反射的勘探方法,以揭示地下岩石的基本信息,包括类型、大小、形状和深度。地震数据采集通常会产生大量数据。勘探中的地震文件可能包含无用的噪声迹线,这会增加文件的大小。噪声痕迹有一些明显的特征,可以用来帮助去噪过程。在这项工作中,基于主成分分析(PCA)来制定特征,以自动区分优秀的迹线和噪声迹线。主成分分析将地震轨迹特征投影到只有两个特征的较低维度。为了对噪声轨迹进行分类和检测,我们首先选择数据集并产生高斯噪声,然后将噪声添加到所选数据集中,然后对轨迹进行归一化,然后提取特征:阈值算法、直方图算法和过零算法,最后应用主成分分析(PCA)获得投影数据。在这项工作中,产生了两种类型的人工噪声。结果表明,主成分分析法能够有效地分离出两类带噪的地震道。主成分分析结果表明,在噪声污染较大的情况下,该方法不能有效地分离出噪声和干净的地震道。
{"title":"Noisy Seismic Traces Classification Using Principal Component Analysis","authors":"H. Nuha, Abdi T. Abdalla","doi":"10.1109/ICoDSA55874.2022.9862872","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862872","url":null,"abstract":"Seismic Data, an exploration method of sending energy or sound waves into the earth and recording the wave reflections to reveal essential subsurface rock information including type, size, shape and depth. Seismic data acquisition typically produces a significant data size. Seismic files within a survey may include useless noisy traces that increase the file size. Noisy traces have some noticeable features which can be exploited to aid the denoising process. In this work, the features were formulated based on the Principal Component Analysis (PCA) to automatically distinguish excellent traces from noisy traces. PCA projects the seismic trace features to a lower dimension with only two features. To classify and detect noisy traces, we first select the dataset and generate Gaussian noise, then add the noise to the selected dataset and then normalize the traces before extracting the features: threshold algorithm, histogram algorithm, and zero-crossing algorithm and finally apply the PCA to obtain the projected data. In this work, two types of artificial noises were generated. It is shown that PCA is able to separate two types of noisy seismic traces. PCA projections show that at high noise contamination, the method is unable to separate the noisy and clean seismic traces.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122507217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cryptocurrency Trading based on Heuristic Guided Approach with Feature Engineering 基于特征工程的启发式引导方法的加密货币交易
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862934
Cagri Karahan, Ş. Öğüdücü
In recent years, machine learning and deep learning techniques have been frequently used in Algorithmic Trading. Algorithmic Trading means trading Forex, stock market, commodities, and many markets with the help of computers using systems created with various technical analysis indicators. The BTC/USD market is a market that allows buying and selling of products. People aim to profit by buying and selling in the Bitcoin market. Reinforcement Learning (RL) was also helpful in achieving those kinds of goals. Reinforcement learning is a sub-topic of machine learning. RL addresses the problem of a computational agent learning to make decisions by trial and error. For our application, it is aimed to make as much profit as possible. This study focuses on developing a novel tool to automate currency trading like a BTC/USD in a simulated market with maximum profit and minimum loss. RL technique with a modified version of the Collective Decision Optimization Algorithm is used to implement the proposed model. Feature engineering is also performed to create features that improve the result.
近年来,机器学习和深度学习技术在算法交易中被频繁使用。算法交易是指在计算机的帮助下交易外汇、股票市场、商品和许多市场,使用具有各种技术分析指标的系统。BTC/USD市场是一个允许买卖产品的市场。人们的目标是通过在比特币市场上买卖来获利。强化学习(RL)在实现这些目标方面也很有帮助。强化学习是机器学习的一个分支。强化学习解决了计算代理通过试错来学习做出决策的问题。对于我们的应用来说,它的目的是尽可能多地赚取利润。本研究的重点是开发一种新的工具,在模拟市场中实现像BTC/USD这样的自动化货币交易,实现利润最大化和损失最小化。RL技术与改进版本的集体决策优化算法被用于实现所提出的模型。特征工程还用于创建改进结果的特征。
{"title":"Cryptocurrency Trading based on Heuristic Guided Approach with Feature Engineering","authors":"Cagri Karahan, Ş. Öğüdücü","doi":"10.1109/ICoDSA55874.2022.9862934","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862934","url":null,"abstract":"In recent years, machine learning and deep learning techniques have been frequently used in Algorithmic Trading. Algorithmic Trading means trading Forex, stock market, commodities, and many markets with the help of computers using systems created with various technical analysis indicators. The BTC/USD market is a market that allows buying and selling of products. People aim to profit by buying and selling in the Bitcoin market. Reinforcement Learning (RL) was also helpful in achieving those kinds of goals. Reinforcement learning is a sub-topic of machine learning. RL addresses the problem of a computational agent learning to make decisions by trial and error. For our application, it is aimed to make as much profit as possible. This study focuses on developing a novel tool to automate currency trading like a BTC/USD in a simulated market with maximum profit and minimum loss. RL technique with a modified version of the Collective Decision Optimization Algorithm is used to implement the proposed model. Feature engineering is also performed to create features that improve the result.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129390418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study on Optimization of Data-Driven Anomaly Detection 数据驱动异常检测的优化研究
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862914
Yiqing Zhou, Rui Liao, Yong-hong Chen
In the paper, according to the original data and the value of the sensor at different moments, the box diagram method is used to process the data, and divides the normal value and outliers. The two types of outliers were distinguished based on the persistence of the outliers in the longitudinal time of the data and the linkage of the lateral sensors, and the clustering algorithm was used to reclassify the data. Then, persistence and linkage were calculated within each class, dividing the sum of persistence and linkage by the result of the maximum number of possible anomalies as the risk coefficient, and then defining a threshold to distinguish between risk-specific and non-risk anomalies. Later, a comprehensive evaluation model of anomaly degree was established through quantitative score, principal component analysis and 0,1 planning. Finally, this quantitative evaluation method is evaluated objectively.
本文根据原始数据和传感器在不同时刻的值,采用框图法对数据进行处理,并划分出正态值和离群值。根据异常点在数据纵向时间内的持久性和横向传感器的联动性来区分两类异常点,并利用聚类算法对数据进行重分类。然后,在每个类别内计算持久性和关联性,将持久性和关联性之和除以最大可能异常数的结果作为风险系数,然后定义区分风险特定和非风险异常的阈值。通过定量评分、主成分分析和0,1规划,建立异常程度综合评价模型。最后,对该定量评价方法进行了客观评价。
{"title":"Study on Optimization of Data-Driven Anomaly Detection","authors":"Yiqing Zhou, Rui Liao, Yong-hong Chen","doi":"10.1109/ICoDSA55874.2022.9862914","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862914","url":null,"abstract":"In the paper, according to the original data and the value of the sensor at different moments, the box diagram method is used to process the data, and divides the normal value and outliers. The two types of outliers were distinguished based on the persistence of the outliers in the longitudinal time of the data and the linkage of the lateral sensors, and the clustering algorithm was used to reclassify the data. Then, persistence and linkage were calculated within each class, dividing the sum of persistence and linkage by the result of the maximum number of possible anomalies as the risk coefficient, and then defining a threshold to distinguish between risk-specific and non-risk anomalies. Later, a comprehensive evaluation model of anomaly degree was established through quantitative score, principal component analysis and 0,1 planning. Finally, this quantitative evaluation method is evaluated objectively.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Telkom University Slogan Analysis on YouTube Using Naïve Bayes 利用Naïve贝叶斯分析电信大学的YouTube广告语
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862818
Rahma Fadhila Moenggah, Donni Richasdy, Mahendra Dwifebri Purbolaksono
YouTube is often used in public and private universities as branding for first-year students. YouTube facilitates users to interact by giving likes or dislikes, adding viewers to the video, and responding to videos through comment pages that can analyze by public feedback for branding. In doing branding, many alumni and college students discuss Telkom University as the best private university in content uploaded on YouTube. That can trigger the public to give positive, negative, or neutral comments to Telkom University. In this research, sentiment analysis focuses on the scientific context of branding the slogan "Number 1 Best Private University" to find out the perspectives and opinions of the public that can be used as evaluation material for the university to improve its reputation. Dataset takes from user opinions on YouTube regarding content that discusses Telkom University's branding slogan using the Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction and Naïve Bayes as a classification. The final results of this research show that the ratio of 90:10 normalized then combined with the unigram-bigram token and Naïve Bayes with alpha 0.6 brings out the best performance, with an average accuracy of 85.27%, the precision of 91.41%, recall of 62.45%, and the F1-Score of 64.78%.
YouTube经常被公立和私立大学用作一年级学生的品牌宣传。YouTube为用户提供了互动的便利,用户可以给出喜欢或不喜欢的视频,为视频添加观众,并通过评论页面对视频做出回应,这些评论页面可以通过公众反馈进行分析,以建立品牌。在品牌推广方面,许多校友和大学生在YouTube上上传的内容中讨论电信大学是最好的私立大学。这可能会引发公众对电信大学的正面、负面或中立评价。在本研究中,情感分析侧重于“第一最好的私立大学”这一口号的科学背景,以找出公众的观点和意见,这些观点和意见可以作为大学提高声誉的评价材料。数据集来自YouTube上关于讨论电信大学品牌口号的内容的用户意见,使用术语频率-逆文档频率(TF-IDF)特征提取和Naïve贝叶斯作为分类。本研究的最终结果表明,90:10的归一化比例再结合uniggram -bigram标记和Naïve与alpha 0.6的Bayes,表现出最好的性能,平均准确率为85.27%,精密度为91.41%,召回率为62.45%,F1-Score为64.78%。
{"title":"Telkom University Slogan Analysis on YouTube Using Naïve Bayes","authors":"Rahma Fadhila Moenggah, Donni Richasdy, Mahendra Dwifebri Purbolaksono","doi":"10.1109/ICoDSA55874.2022.9862818","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862818","url":null,"abstract":"YouTube is often used in public and private universities as branding for first-year students. YouTube facilitates users to interact by giving likes or dislikes, adding viewers to the video, and responding to videos through comment pages that can analyze by public feedback for branding. In doing branding, many alumni and college students discuss Telkom University as the best private university in content uploaded on YouTube. That can trigger the public to give positive, negative, or neutral comments to Telkom University. In this research, sentiment analysis focuses on the scientific context of branding the slogan \"Number 1 Best Private University\" to find out the perspectives and opinions of the public that can be used as evaluation material for the university to improve its reputation. Dataset takes from user opinions on YouTube regarding content that discusses Telkom University's branding slogan using the Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction and Naïve Bayes as a classification. The final results of this research show that the ratio of 90:10 normalized then combined with the unigram-bigram token and Naïve Bayes with alpha 0.6 brings out the best performance, with an average accuracy of 85.27%, the precision of 91.41%, recall of 62.45%, and the F1-Score of 64.78%.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116312713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICoDSA 2022 TOC
Pub Date : 2022-07-06 DOI: 10.1109/icodsa55874.2022.9862819
{"title":"ICoDSA 2022 TOC","authors":"","doi":"10.1109/icodsa55874.2022.9862819","DOIUrl":"https://doi.org/10.1109/icodsa55874.2022.9862819","url":null,"abstract":"","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122427062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment Analysis of Floods on Twitter Social Media Using the Naive Bayes Classifier Method with the N-Gram Feature 基于N-Gram特征的朴素贝叶斯分类器对Twitter社交媒体上洪水情绪分析
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862827
Akbar Ridwan, H. Nuha, Ramanti Dharayani
Indonesia is 6th largest population affected by floods in the world, which is 640.000 people every year. Indonesia areas that often experience floods due to high-intensity rainfall and tropical climate. Recently, there was a flood in South Kalimantan on January 14, 2021. From this incident, few netizen expressed their opinions about the natural flood disaster through Twitter social media. In this study, the author will classify netizen views regarding the natural flood disaster so that the netizen is aware of the incident and they can prevent flood causes. We will divide the tweet into relevant and irrelevant categories to categorize the incident using the Naïve Bayes Classifier. This research implements N-gram features to consider the most efficient method for determining a classification. We use Naïve Bayes because it assumes all variables are unique and provides weight to the text data using N-Gram. The importance of text data could be used to create a Naïve Bayes Classification model to calculate the probability. The naïve Bayes method can be implemented in classifying natural flood disasters. The tweet within the result using bigram will give higher accuracy than unigram or trigram. According this study the goverment can have plan for future mitigation action.
印度尼西亚是世界上第六大受洪水影响的人口,每年有64万人。由于高强度降雨和热带气候,印度尼西亚地区经常遭受洪水。最近,2021年1月14日,南加里曼丹发生了洪水。从这次事件来看,很少有网友通过Twitter社交媒体表达他们对自然洪涝灾害的看法。在本研究中,作者将对网民对自然洪水灾害的看法进行分类,以便网民了解事件并预防洪水原因。我们将推文分为相关和不相关的类别,使用Naïve贝叶斯分类器对事件进行分类。本研究采用N-gram特征来考虑确定分类的最有效方法。我们使用Naïve贝叶斯,因为它假设所有变量都是唯一的,并使用N-Gram为文本数据提供权重。文本数据的重要性可以用来创建Naïve贝叶斯分类模型来计算概率。naïve贝叶斯方法可以实现对自然洪水灾害的分类。使用双字母组合的结果中的tweet将比单字母组合或三字母组合提供更高的准确性。根据这项研究,政府可以为未来的缓解行动制定计划。
{"title":"Sentiment Analysis of Floods on Twitter Social Media Using the Naive Bayes Classifier Method with the N-Gram Feature","authors":"Akbar Ridwan, H. Nuha, Ramanti Dharayani","doi":"10.1109/ICoDSA55874.2022.9862827","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862827","url":null,"abstract":"Indonesia is 6th largest population affected by floods in the world, which is 640.000 people every year. Indonesia areas that often experience floods due to high-intensity rainfall and tropical climate. Recently, there was a flood in South Kalimantan on January 14, 2021. From this incident, few netizen expressed their opinions about the natural flood disaster through Twitter social media. In this study, the author will classify netizen views regarding the natural flood disaster so that the netizen is aware of the incident and they can prevent flood causes. We will divide the tweet into relevant and irrelevant categories to categorize the incident using the Naïve Bayes Classifier. This research implements N-gram features to consider the most efficient method for determining a classification. We use Naïve Bayes because it assumes all variables are unique and provides weight to the text data using N-Gram. The importance of text data could be used to create a Naïve Bayes Classification model to calculate the probability. The naïve Bayes method can be implemented in classifying natural flood disasters. The tweet within the result using bigram will give higher accuracy than unigram or trigram. According this study the goverment can have plan for future mitigation action.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127944161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Inductive Miner Implementation to Improve Healthcare Efficiency on Indonesia National Health Insurance Data 提高印尼国民健康保险数据医疗效率的归纳Miner实施
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862837
Mochammad Ivan Adhyaksa Pradana, A. Kurniati, G. Wisudiawan
Process Mining is a method to collect data about business processes and produce insight from those processes. This method can be applied to many sectors, including healthcare. One of the government’s programs to provide health services for the citizens is the Indonesia Health Social Security Agency (Badan Penyelenggara Jaminan Sosial Kesehatan/ BPJS Kesehatan). Currently, the services provided in this program are still unsatisfying, with one main concern in the waiting time. We analyze BPJS Kesehatan data samples using the Inductive Miner algorithm to mine event logs of treatment, frequent treatments, and health facility usage, with a focus on respiratory disease. Initial steps were needed in preprocessing to prepare the event logs. The produced process models are then evaluated based on their fitness, precision, generalization, and simplicity. Then, we replay the model toward the event logs for performance analysis. We test different types of Inductive Miner and found that the Inductive Miner Infrequent variant achieves the highest average score among other variants. We find eight treatment procedures that can be improved in terms of efficiency. We also find out that the most frequently used health facility is Public Health Center, followed by First Clinic and Hospital. The results are analyzed from the perspective of previously done treatment, recurring treatment, and facility usage process. Inductive Miner is a good algorithm that can produce an accurate process model and allow suggestions for improving the healthcare process.
流程挖掘是一种收集有关业务流程的数据并从这些流程中产生洞察力的方法。这种方法可以应用于许多部门,包括医疗保健。为公民提供医疗服务的政府项目之一是印度尼西亚卫生社会保障局(Badan Penyelenggara Jaminan Social Kesehatan/ BPJS Kesehatan)。目前,该计划提供的服务仍不令人满意,主要的一个问题是等待时间。我们使用归纳Miner算法分析BPJS Kesehatan数据样本,以挖掘治疗、频繁治疗和卫生设施使用的事件日志,重点是呼吸系统疾病。在预处理过程中需要初始步骤来准备事件日志。然后根据其适合度、精度、泛化和简单性对生成的流程模型进行评估。然后,我们对事件日志重放模型以进行性能分析。我们测试了不同类型的归纳Miner,发现归纳Miner infrequency变体在其他变体中获得了最高的平均分。我们发现有八种治疗方法可以提高效率。我们还发现,最常使用的卫生设施是公共卫生中心,其次是第一诊所和医院。从既往治疗、重复治疗和设施使用过程三个方面对结果进行分析。电感式Miner是一种很好的算法,它可以生成准确的流程模型,并为改进医疗保健流程提供建议。
{"title":"Inductive Miner Implementation to Improve Healthcare Efficiency on Indonesia National Health Insurance Data","authors":"Mochammad Ivan Adhyaksa Pradana, A. Kurniati, G. Wisudiawan","doi":"10.1109/ICoDSA55874.2022.9862837","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862837","url":null,"abstract":"Process Mining is a method to collect data about business processes and produce insight from those processes. This method can be applied to many sectors, including healthcare. One of the government’s programs to provide health services for the citizens is the Indonesia Health Social Security Agency (Badan Penyelenggara Jaminan Sosial Kesehatan/ BPJS Kesehatan). Currently, the services provided in this program are still unsatisfying, with one main concern in the waiting time. We analyze BPJS Kesehatan data samples using the Inductive Miner algorithm to mine event logs of treatment, frequent treatments, and health facility usage, with a focus on respiratory disease. Initial steps were needed in preprocessing to prepare the event logs. The produced process models are then evaluated based on their fitness, precision, generalization, and simplicity. Then, we replay the model toward the event logs for performance analysis. We test different types of Inductive Miner and found that the Inductive Miner Infrequent variant achieves the highest average score among other variants. We find eight treatment procedures that can be improved in terms of efficiency. We also find out that the most frequently used health facility is Public Health Center, followed by First Clinic and Hospital. The results are analyzed from the perspective of previously done treatment, recurring treatment, and facility usage process. Inductive Miner is a good algorithm that can produce an accurate process model and allow suggestions for improving the healthcare process.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2022 International Conference on Data Science and Its Applications (ICoDSA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1