{"title":"使用聚类和嵌入查询的商品和服务税系统问答自动化","authors":"Pankaj Dikshit, B. Chandra, M. Gupta","doi":"10.1109/ICMLA52953.2021.00260","DOIUrl":null,"url":null,"abstract":"Goods and Services Tax has been introduced for the first time in India in 2017 and it is a major tax reform. There have been a lot of queries posed by the users and response had to be given manually which was a very tedious task. There was a dire need to automate this Question/Answer process in an efficient manner. Embeddings e.g. BERT and ROBERTA have been used for converting the questions to make it efficient for clustering the questions. K-means and Hierarchical clustering techniques have been used for clustering the embeddings of questions, using different distance measures viz. Euclidean and Cosine. Three possible choices for answers for each query have been provided at first, and in the next step the best possible answer has been provided for each test question. Dataset of two months (October and November 2019) is used for automating the process. A high success rate in predicting the answers for the questions has been achieved.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"1630-1633"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automating Questions and Answers of Good and Services Tax system using clustering and embeddings of queries\",\"authors\":\"Pankaj Dikshit, B. Chandra, M. Gupta\",\"doi\":\"10.1109/ICMLA52953.2021.00260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Goods and Services Tax has been introduced for the first time in India in 2017 and it is a major tax reform. There have been a lot of queries posed by the users and response had to be given manually which was a very tedious task. There was a dire need to automate this Question/Answer process in an efficient manner. Embeddings e.g. BERT and ROBERTA have been used for converting the questions to make it efficient for clustering the questions. K-means and Hierarchical clustering techniques have been used for clustering the embeddings of questions, using different distance measures viz. Euclidean and Cosine. Three possible choices for answers for each query have been provided at first, and in the next step the best possible answer has been provided for each test question. Dataset of two months (October and November 2019) is used for automating the process. A high success rate in predicting the answers for the questions has been achieved.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"1 1\",\"pages\":\"1630-1633\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automating Questions and Answers of Good and Services Tax system using clustering and embeddings of queries
Goods and Services Tax has been introduced for the first time in India in 2017 and it is a major tax reform. There have been a lot of queries posed by the users and response had to be given manually which was a very tedious task. There was a dire need to automate this Question/Answer process in an efficient manner. Embeddings e.g. BERT and ROBERTA have been used for converting the questions to make it efficient for clustering the questions. K-means and Hierarchical clustering techniques have been used for clustering the embeddings of questions, using different distance measures viz. Euclidean and Cosine. Three possible choices for answers for each query have been provided at first, and in the next step the best possible answer has been provided for each test question. Dataset of two months (October and November 2019) is used for automating the process. A high success rate in predicting the answers for the questions has been achieved.