Muhammad Fikri Hasani, Y. Heryadi, Yulyani Arifin, Lukas, W. Suparta
{"title":"Density Based Spatial Clustering of Applications with Noise and Sentence Bert Embedding for Indonesian Utterance Clustering","authors":"Muhammad Fikri Hasani, Y. Heryadi, Yulyani Arifin, Lukas, W. Suparta","doi":"10.1109/ICCoSITE57641.2023.10127683","DOIUrl":null,"url":null,"abstract":"Task oriented chatbots are a sub-topic related to chatbots, where chatbots will perform certain tasks with specific goals. One part of creating a task-oriented chatbot is doing intent classification. Intent classification is a task of text classification. As in general text classification, the required dataset requires a label to carry out the classification process. To speed up and help the utterance analysis process, there is already a method, namely clustering, and Density-based clustering is a part of clustering that can determine cluster patterns based on arbitrary data, with DBScan as one of its algorithms. This research used 10000 client utterance data of awhatsapp based e-commerce conversation. SentenceBert also used as a state of art sentence embedding. This research yield silhouette score of 0.327 as the best result from eps of 0.1 and MinPts of 95. However, based on the cluster result, sentences labelled as noise can be further clustered. Text Preprocessing, text augmentation and sentence embedding techniques can be explored to increase the cluster performance.","PeriodicalId":256184,"journal":{"name":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCoSITE57641.2023.10127683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Task oriented chatbots are a sub-topic related to chatbots, where chatbots will perform certain tasks with specific goals. One part of creating a task-oriented chatbot is doing intent classification. Intent classification is a task of text classification. As in general text classification, the required dataset requires a label to carry out the classification process. To speed up and help the utterance analysis process, there is already a method, namely clustering, and Density-based clustering is a part of clustering that can determine cluster patterns based on arbitrary data, with DBScan as one of its algorithms. This research used 10000 client utterance data of awhatsapp based e-commerce conversation. SentenceBert also used as a state of art sentence embedding. This research yield silhouette score of 0.327 as the best result from eps of 0.1 and MinPts of 95. However, based on the cluster result, sentences labelled as noise can be further clustered. Text Preprocessing, text augmentation and sentence embedding techniques can be explored to increase the cluster performance.