Weilin Sun, V. Lu, Aaron Truong, Hermione Bossolina, Yuan Lu
{"title":"Purrai:一种基于深度神经网络的方法来解释家猫的语言","authors":"Weilin Sun, V. Lu, Aaron Truong, Hermione Bossolina, Yuan Lu","doi":"10.1109/ICMLA52953.2021.00104","DOIUrl":null,"url":null,"abstract":"Being able to understand and communicate with domestic cats has always been fascinating to humans, although it is considered a difficult task even for phonetics experts. In this paper, we present our approach to this problem: Purrai, a neural-network-based machine learning platform to interpret cat’s language. Our framework consists of two parts. First, we build a comprehensively constructed cat voice dataset that is 3.7x larger than any existing public available dataset [1]. To improve accuracy, we also use several techniques to ensure labeling quality, including rule-based labeling, cross validation, cosine distance, and outlier detection, etc. Second, we design a two-stage neural network structure to interpret what cats express in the context of multiple sounds called sentences. The first stage is a modification of Google’s Vggish architecture [2] [3], which is a Convolutional Neural Network (CNN) architecture that focuses on the classification of nine primary cat sounds. The second stage takes the probability outputs of a sequence of sound classifications from the first stage and determines the emotional meaning of a cat sentence. Our first stage architecture generates a top-l and top-2 accuracy of 74.1% and 92.1%, better than that of the state-of-the-art approach: 64.9% and 83.4% [4]. Our sentence-based AI model achieves an accuracy of 81.1% for emotion prediction.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"11 1","pages":"622-627"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Purrai: A Deep Neural Network based Approach to Interpret Domestic Cat Language\",\"authors\":\"Weilin Sun, V. Lu, Aaron Truong, Hermione Bossolina, Yuan Lu\",\"doi\":\"10.1109/ICMLA52953.2021.00104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Being able to understand and communicate with domestic cats has always been fascinating to humans, although it is considered a difficult task even for phonetics experts. In this paper, we present our approach to this problem: Purrai, a neural-network-based machine learning platform to interpret cat’s language. Our framework consists of two parts. First, we build a comprehensively constructed cat voice dataset that is 3.7x larger than any existing public available dataset [1]. To improve accuracy, we also use several techniques to ensure labeling quality, including rule-based labeling, cross validation, cosine distance, and outlier detection, etc. Second, we design a two-stage neural network structure to interpret what cats express in the context of multiple sounds called sentences. The first stage is a modification of Google’s Vggish architecture [2] [3], which is a Convolutional Neural Network (CNN) architecture that focuses on the classification of nine primary cat sounds. The second stage takes the probability outputs of a sequence of sound classifications from the first stage and determines the emotional meaning of a cat sentence. Our first stage architecture generates a top-l and top-2 accuracy of 74.1% and 92.1%, better than that of the state-of-the-art approach: 64.9% and 83.4% [4]. Our sentence-based AI model achieves an accuracy of 81.1% for emotion prediction.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"11 1\",\"pages\":\"622-627\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00104\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Purrai: A Deep Neural Network based Approach to Interpret Domestic Cat Language
Being able to understand and communicate with domestic cats has always been fascinating to humans, although it is considered a difficult task even for phonetics experts. In this paper, we present our approach to this problem: Purrai, a neural-network-based machine learning platform to interpret cat’s language. Our framework consists of two parts. First, we build a comprehensively constructed cat voice dataset that is 3.7x larger than any existing public available dataset [1]. To improve accuracy, we also use several techniques to ensure labeling quality, including rule-based labeling, cross validation, cosine distance, and outlier detection, etc. Second, we design a two-stage neural network structure to interpret what cats express in the context of multiple sounds called sentences. The first stage is a modification of Google’s Vggish architecture [2] [3], which is a Convolutional Neural Network (CNN) architecture that focuses on the classification of nine primary cat sounds. The second stage takes the probability outputs of a sequence of sound classifications from the first stage and determines the emotional meaning of a cat sentence. Our first stage architecture generates a top-l and top-2 accuracy of 74.1% and 92.1%, better than that of the state-of-the-art approach: 64.9% and 83.4% [4]. Our sentence-based AI model achieves an accuracy of 81.1% for emotion prediction.