{"title":"最优路径森林的无监督对话行为分类","authors":"L. C. Ribeiro, J. Papa","doi":"10.1109/SIBGRAPI.2018.00010","DOIUrl":null,"url":null,"abstract":"Dialogue Act classification is a relevant problem for the Natural Language Processing field either as a standalone task or when used as input for downstream applications. Despite its importance, most of the existing approaches rely on supervised techniques, which depend on annotated samples, making it difficult to take advantage of the increasing amount of data available in different domains. In this paper, we briefly review the most commonly used datasets to evaluate Dialogue Act classification approaches and introduce the Optimum-Path Forest (OPF) classifier to this task. Instead of using its original strategy to determine the corresponding class for each cluster, we use a modified version based on majority voting, named M-OPF, which yields good results when compared to k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), according to accuracy and V-measure. We also show that M-OPF, and consequently OPF, are less sensitive to hyper-parameter tuning when compared to HDBSCAN.","PeriodicalId":208985,"journal":{"name":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Unsupervised Dialogue Act Classification with Optimum-Path Forest\",\"authors\":\"L. C. Ribeiro, J. Papa\",\"doi\":\"10.1109/SIBGRAPI.2018.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dialogue Act classification is a relevant problem for the Natural Language Processing field either as a standalone task or when used as input for downstream applications. Despite its importance, most of the existing approaches rely on supervised techniques, which depend on annotated samples, making it difficult to take advantage of the increasing amount of data available in different domains. In this paper, we briefly review the most commonly used datasets to evaluate Dialogue Act classification approaches and introduce the Optimum-Path Forest (OPF) classifier to this task. Instead of using its original strategy to determine the corresponding class for each cluster, we use a modified version based on majority voting, named M-OPF, which yields good results when compared to k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), according to accuracy and V-measure. We also show that M-OPF, and consequently OPF, are less sensitive to hyper-parameter tuning when compared to HDBSCAN.\",\"PeriodicalId\":208985,\"journal\":{\"name\":\"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIBGRAPI.2018.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBGRAPI.2018.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised Dialogue Act Classification with Optimum-Path Forest
Dialogue Act classification is a relevant problem for the Natural Language Processing field either as a standalone task or when used as input for downstream applications. Despite its importance, most of the existing approaches rely on supervised techniques, which depend on annotated samples, making it difficult to take advantage of the increasing amount of data available in different domains. In this paper, we briefly review the most commonly used datasets to evaluate Dialogue Act classification approaches and introduce the Optimum-Path Forest (OPF) classifier to this task. Instead of using its original strategy to determine the corresponding class for each cluster, we use a modified version based on majority voting, named M-OPF, which yields good results when compared to k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), according to accuracy and V-measure. We also show that M-OPF, and consequently OPF, are less sensitive to hyper-parameter tuning when compared to HDBSCAN.