Improving the quality of images is one of the key tasks in Optical Coherence Tomography (OCT) imaging technology. Low contrast and speckle noise are two major factors affecting the accuracy of OCT measurement. In this paper, an effective speckle reduction and structure enhancement method is proposed based on variational image decomposition (VID) and multi-scale Retinex (MSR). To be specific, we propose a new variational image decomposition model BL-G-BM3D to decompose the OCT image into background part, structure part and noise. Then the structure part is enhanced by MSR and the background part is used to generate a filter mask by fuzzy c-means clustering algorithm. Experimental results show that the proposed method performs well in speckle reduction and structure enhancement, with better quality metrics of the SNR, CNR, and ENL and better fine detail retention than shearlet transform method and BM3D method.
{"title":"Effective Speckle reduction and structure enhancement method for retinal OCT image based on VID and Retinex","authors":"Biyuan Li, Yu Wang, Jun Zhang","doi":"10.1145/3517077.3517084","DOIUrl":"https://doi.org/10.1145/3517077.3517084","url":null,"abstract":"Improving the quality of images is one of the key tasks in Optical Coherence Tomography (OCT) imaging technology. Low contrast and speckle noise are two major factors affecting the accuracy of OCT measurement. In this paper, an effective speckle reduction and structure enhancement method is proposed based on variational image decomposition (VID) and multi-scale Retinex (MSR). To be specific, we propose a new variational image decomposition model BL-G-BM3D to decompose the OCT image into background part, structure part and noise. Then the structure part is enhanced by MSR and the background part is used to generate a filter mask by fuzzy c-means clustering algorithm. Experimental results show that the proposed method performs well in speckle reduction and structure enhancement, with better quality metrics of the SNR, CNR, and ENL and better fine detail retention than shearlet transform method and BM3D method.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a diagnostic criterion for cancer, histopathology image analysis is quite critical for the subsequent therapeutic treatment of patients. Nowadays, the diagnosis is mainly depended on manually which is less precise and low-accuracy. To address the problem, we propose a novel screening framework combined image preprocess and AI approaches for the automatic detection of lymph node metastasis of colorectal cancer. First calculates the Histogram of Oriented Gradient (HOG) and Gray Level Cooccurrence Matrix (GLCM) of high-resolution digital images transformed from pathological sections. Statistical analysis show that Support Vector Machine (SVM) can be used to automatically identify cancerous areas. We further introduce deep learning models Convolutional Neural Network (CNN) into our framework, taking preprocessed images as inputs. The screening results demonstrate that the highest overlapping ratio can be achieved compared with manually annotation areas is 93.09% got by CNN, while another approaches SVM get an accuracy of 83.75%. The combination of image preprocess and deep learning can effectively improve the efficiency of lymph node metastasis screening in colorectal cancer and has great significance for the further development of Computer Aided Diagnosis (CAD) systems.
{"title":"A Novel Screening Framework for Lymph Node Metastasis in Colorectal Cancer Based on Deep Learning Approaches","authors":"Yeming Liu, Fulong Li, Haitao Yu, Zhiyong Zhang, Huiyan Li, Chunxiao Han","doi":"10.1145/3517077.3517082","DOIUrl":"https://doi.org/10.1145/3517077.3517082","url":null,"abstract":"As a diagnostic criterion for cancer, histopathology image analysis is quite critical for the subsequent therapeutic treatment of patients. Nowadays, the diagnosis is mainly depended on manually which is less precise and low-accuracy. To address the problem, we propose a novel screening framework combined image preprocess and AI approaches for the automatic detection of lymph node metastasis of colorectal cancer. First calculates the Histogram of Oriented Gradient (HOG) and Gray Level Cooccurrence Matrix (GLCM) of high-resolution digital images transformed from pathological sections. Statistical analysis show that Support Vector Machine (SVM) can be used to automatically identify cancerous areas. We further introduce deep learning models Convolutional Neural Network (CNN) into our framework, taking preprocessed images as inputs. The screening results demonstrate that the highest overlapping ratio can be achieved compared with manually annotation areas is 93.09% got by CNN, while another approaches SVM get an accuracy of 83.75%. The combination of image preprocess and deep learning can effectively improve the efficiency of lymph node metastasis screening in colorectal cancer and has great significance for the further development of Computer Aided Diagnosis (CAD) systems.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130200632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion-onset visual evoked potential (mVEP) has been gradually applied in brain computer interface systems due to its maximum amplitude and minimum difference between subjects. In this paper, three feature extraction algorithms including downsampling stack average algorithm, common spatial pattern (CSP) and filter bank common spatial pattern (FBCSP) were used to extract the features of mVEP, and the experimental results show that the average classification accuracy of CSP algorithm and FBCSP algorithm in mVEP-BCI is 89.0% and 91.2% respectively, which is 3.8% and 6% higher than that of the downsampling stack average algorithm. And indicating that the CSP algorithm and the FBCSP algorithm are suitable for exercise initiation visual evoked potential brain-computer interface system and the FBCSP algorithm is in the system The feature extraction process can play a more obvious effect.
{"title":"Feature extraction of Motion-onset visual evoked potential based on CSP and FBCSP","authors":"Xinglin He, Li Zhao, Tongning Meng, Zhiwen Zhang","doi":"10.1145/3517077.3517101","DOIUrl":"https://doi.org/10.1145/3517077.3517101","url":null,"abstract":"Motion-onset visual evoked potential (mVEP) has been gradually applied in brain computer interface systems due to its maximum amplitude and minimum difference between subjects. In this paper, three feature extraction algorithms including downsampling stack average algorithm, common spatial pattern (CSP) and filter bank common spatial pattern (FBCSP) were used to extract the features of mVEP, and the experimental results show that the average classification accuracy of CSP algorithm and FBCSP algorithm in mVEP-BCI is 89.0% and 91.2% respectively, which is 3.8% and 6% higher than that of the downsampling stack average algorithm. And indicating that the CSP algorithm and the FBCSP algorithm are suitable for exercise initiation visual evoked potential brain-computer interface system and the FBCSP algorithm is in the system The feature extraction process can play a more obvious effect.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116512013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Fang, Chunxiao Han, Jing Liu, Fengjuan Guo, Yingmei Qin, Y. Che
Fatigue driving is one of the important factors that cause traffic accidents. To solve this problem, this paper proposes a classification model based on the traditional convolutional neural network (CNN) to distinguish the vigilance state. First, the raw electroencephalogram (EEG) signals were converted into two-dimensional spectrograms by the short-time Fourier transform (STFT). Then, the CNN model was used for automatic features extraction and classification from these spectrograms. Finally, the performance of the trained CNN model was evaluated. The average of area under ROC Curve (AUC) was 1, the sensitivity was 91.4%, the average false prediction rate (FPR) was 0.02/h, and the accuracy rate was as high as 97%. The effectiveness of the CNN model was verified by the evaluation results.
{"title":"Fatigue Driving Vigilance Detection Using Convolutional Neural Networks and Scalp EEG Signals","authors":"Y. Fang, Chunxiao Han, Jing Liu, Fengjuan Guo, Yingmei Qin, Y. Che","doi":"10.1145/3517077.3517099","DOIUrl":"https://doi.org/10.1145/3517077.3517099","url":null,"abstract":"Fatigue driving is one of the important factors that cause traffic accidents. To solve this problem, this paper proposes a classification model based on the traditional convolutional neural network (CNN) to distinguish the vigilance state. First, the raw electroencephalogram (EEG) signals were converted into two-dimensional spectrograms by the short-time Fourier transform (STFT). Then, the CNN model was used for automatic features extraction and classification from these spectrograms. Finally, the performance of the trained CNN model was evaluated. The average of area under ROC Curve (AUC) was 1, the sensitivity was 91.4%, the average false prediction rate (FPR) was 0.02/h, and the accuracy rate was as high as 97%. The effectiveness of the CNN model was verified by the evaluation results.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125210173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The common detection method used for detecting capsule is to put oil blotting paper on it and to observe whether the paper is clean after the conventional time. This method could cost low payment but need spend more time. A method to detect capsule whether the leakage occurs based on linear array camera is proposed in this paper. Firstly, the capsule images are taken by linear array camera and imaged processing in computer. Secondly, Adaptive Histogram Equalization (AHE) algorithm and Sobel Operator (SO) algorithm are used to sharpen the obtained images to highlight the position of the leakage parts. Finally, the leakage positions are determined by comparing the gray value difference of each area of the images. It is proved by a large number of experiments that, in the context of real-time detection, the error rate of capsule leakage detection is reduced from 10% to 1.5% if it takes the line scan camera to capture the images of a capsule illuminated by a laser with a wavelength of 638nm and the images to process by the above algorithm. Meanwhile, under the same number of comparison experiments, the detection task can be complete seven days in advance. Therefore, the capsule detection method proposed in this paper can greatly improve the accuracy and efficiency.
{"title":"Research on Capsule Leakage Detection Based on Linear Array Camera","authors":"L. Li, Genghuang Yang, Baoli Wang","doi":"10.1145/3517077.3517094","DOIUrl":"https://doi.org/10.1145/3517077.3517094","url":null,"abstract":"The common detection method used for detecting capsule is to put oil blotting paper on it and to observe whether the paper is clean after the conventional time. This method could cost low payment but need spend more time. A method to detect capsule whether the leakage occurs based on linear array camera is proposed in this paper. Firstly, the capsule images are taken by linear array camera and imaged processing in computer. Secondly, Adaptive Histogram Equalization (AHE) algorithm and Sobel Operator (SO) algorithm are used to sharpen the obtained images to highlight the position of the leakage parts. Finally, the leakage positions are determined by comparing the gray value difference of each area of the images. It is proved by a large number of experiments that, in the context of real-time detection, the error rate of capsule leakage detection is reduced from 10% to 1.5% if it takes the line scan camera to capture the images of a capsule illuminated by a laser with a wavelength of 638nm and the images to process by the above algorithm. Meanwhile, under the same number of comparison experiments, the detection task can be complete seven days in advance. Therefore, the capsule detection method proposed in this paper can greatly improve the accuracy and efficiency.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"295 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114270141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the process of CT scanning, multi-angle projection data needs to be obtained from a large number of projection actions, which makes the scanned individual bear the risk of high radiation exposure. In order to solve such problems, the use of sparse projection data for CT image reconstruction is proposed as a new type of solution. The previous research can obtain good quality reconstructed images when the projection data is sparse by using the CT reconstruction technology based on the nonlinear sparsity transformation of compressed sensing. However, the heavy time loading of the image reconstruction is a practical problem that needs to be solved urgently. This study optimizes the non-linear filtering process of the regularization term of the original scheme, and proposes a novel method which replaces the original non-linear filter with a low-pass frequency domain filter. This strategy effectively utilizes the properties of low-pass frequency domain filtering in image processing. The excellent properties include high efficiency and low time complexity for image smoothing. The simulation experiment results show that in the process of CT image reconstruction using compressed sensing algorithm, the low-pass frequency domain filtering of the new scheme can greatly reduce the required time in the reconstruction of sparse projection data, and the image quality is feasibly guaranteed.
{"title":"Frequency Domain Filtering Based Compressed Sensing Applied on Sparse-angle CT Image Reconstruction","authors":"Jian Dong, Hao Chen, Xiaoxia Yang","doi":"10.1145/3517077.3517089","DOIUrl":"https://doi.org/10.1145/3517077.3517089","url":null,"abstract":"In the process of CT scanning, multi-angle projection data needs to be obtained from a large number of projection actions, which makes the scanned individual bear the risk of high radiation exposure. In order to solve such problems, the use of sparse projection data for CT image reconstruction is proposed as a new type of solution. The previous research can obtain good quality reconstructed images when the projection data is sparse by using the CT reconstruction technology based on the nonlinear sparsity transformation of compressed sensing. However, the heavy time loading of the image reconstruction is a practical problem that needs to be solved urgently. This study optimizes the non-linear filtering process of the regularization term of the original scheme, and proposes a novel method which replaces the original non-linear filter with a low-pass frequency domain filter. This strategy effectively utilizes the properties of low-pass frequency domain filtering in image processing. The excellent properties include high efficiency and low time complexity for image smoothing. The simulation experiment results show that in the process of CT image reconstruction using compressed sensing algorithm, the low-pass frequency domain filtering of the new scheme can greatly reduce the required time in the reconstruction of sparse projection data, and the image quality is feasibly guaranteed.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134022204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to investigate the changes of local brain regions and the differences of functional network and structural network in patients with Alzheimer's disease, the coherent functional network and structural network were constructed by using EEG signals and MRI images of patients with Alzheimer's disease and normal controls respectively. Then the brain was divided into five brain regions (frontal, parietal, occipital, temporal and central), and seven network topological features were extracted from each brain region. ANOVA1 statistical analysis of these features showed that EEG network and MRI network of AD brain had the same results, that is, there were significant differences in the number of features, and the two groups had significant differences in the frontal lobe region. In order to further analyze the abnormal topological changes of brain structure and functional networks, the single feature and the combination of features of brain regions were used as the input of Naive Bayes classifier. The classification results showed that compared with single feature EEG and MRI network feature combination, the classification accuracy was significantly improved, and the best accuracy was 0.9565 and 0.9621, respectively. This method can effectively distinguish AD group from control group and provide effective support for the study of AD brain.
{"title":"Graph Theoretical Analysis Of Complex Networks In The Alzheimer Brain Using Navie-Bayes Classifier: An EEG And MRI Study","authors":"Ruofan Wang, Y. Yin, Haodong Wang, Lianshuan Shi","doi":"10.1145/3517077.3517079","DOIUrl":"https://doi.org/10.1145/3517077.3517079","url":null,"abstract":"In order to investigate the changes of local brain regions and the differences of functional network and structural network in patients with Alzheimer's disease, the coherent functional network and structural network were constructed by using EEG signals and MRI images of patients with Alzheimer's disease and normal controls respectively. Then the brain was divided into five brain regions (frontal, parietal, occipital, temporal and central), and seven network topological features were extracted from each brain region. ANOVA1 statistical analysis of these features showed that EEG network and MRI network of AD brain had the same results, that is, there were significant differences in the number of features, and the two groups had significant differences in the frontal lobe region. In order to further analyze the abnormal topological changes of brain structure and functional networks, the single feature and the combination of features of brain regions were used as the input of Naive Bayes classifier. The classification results showed that compared with single feature EEG and MRI network feature combination, the classification accuracy was significantly improved, and the best accuracy was 0.9565 and 0.9621, respectively. This method can effectively distinguish AD group from control group and provide effective support for the study of AD brain.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129808086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoran Yan, Huijun Lu, Dunbo Cai, Tao Hang, Ling Qian
Recently reported deep audiovisual models have shown promising results on solving the cocktail party problem and are attracting new studies. Audiovisual datasets are an important basis for these studies. Here we investigate the AVSpeech dataset[1], a popular dataset that was launched by the Google team, for training deep audio-visual models for multi-talker speech separation. Our goal is to derive a special kind of video, called purity video, from the dataset. A piece of purity video contains continuous image frames of the same person with a face within a time. A natural question is how we can extract purity videos, as many as possible, from the AVSpeech dataset. This paper presents the tools and methods we utilized, problems we encountered, and the purity video we obtained. Our main contributions are as follows: 1) We propose a solution to extract a derivation subset of the AVSpeech dataset that is of high quality and more than the existing training sets publicly available. 2) We implemented the above solution to perform experiments on the AVSpeech dataset and got insightful results; 3) We also evaluated our proposed solution on our manually labeled dataset called VTData. Experiments show that our solution is effective and robust. We hope this work can help the community in exploiting the AVSpeech dataset for other video understanding tasks.
{"title":"A preliminary study of challenges in extracting purity videos from the AV Speech Benchmark","authors":"Haoran Yan, Huijun Lu, Dunbo Cai, Tao Hang, Ling Qian","doi":"10.1145/3517077.3517091","DOIUrl":"https://doi.org/10.1145/3517077.3517091","url":null,"abstract":"Recently reported deep audiovisual models have shown promising results on solving the cocktail party problem and are attracting new studies. Audiovisual datasets are an important basis for these studies. Here we investigate the AVSpeech dataset[1], a popular dataset that was launched by the Google team, for training deep audio-visual models for multi-talker speech separation. Our goal is to derive a special kind of video, called purity video, from the dataset. A piece of purity video contains continuous image frames of the same person with a face within a time. A natural question is how we can extract purity videos, as many as possible, from the AVSpeech dataset. This paper presents the tools and methods we utilized, problems we encountered, and the purity video we obtained. Our main contributions are as follows: 1) We propose a solution to extract a derivation subset of the AVSpeech dataset that is of high quality and more than the existing training sets publicly available. 2) We implemented the above solution to perform experiments on the AVSpeech dataset and got insightful results; 3) We also evaluated our proposed solution on our manually labeled dataset called VTData. Experiments show that our solution is effective and robust. We hope this work can help the community in exploiting the AVSpeech dataset for other video understanding tasks.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123507108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to avoid the limitations of artificial feature extraction, the CNN model is adopted to extract image features by big data-driven adaptive learning, which improves the accuracy of the features. For avoiding the loss of spatial information, an improved CNN model based on up-sampling is proposed, which consists of six layers of superimposed small convolution. The multi-layer design not only expands the receptive field, but also reduces the number of training parameters, and improves the running speed. The fusion method based on improved CNN model is proposed for multi-focus images. The improved CNN model divides the input image into focus region and non-focus region, and form the decision map. According to the decision map optimized by GFF, the focus regions are intergraded by pixel-by-pixel weighted fusion strategy to obtain fusion image. Experimental results show that the fusion results of the proposed method are clear in detail, complete in structure, no distortion in contrast, and no artifacts in the picture. It effectively avoids grayscale discontinuity, artifacts and other problems, and is better than classical methods we selected.
{"title":"Multi-Focus Image Fusion Based on Improved CNN","authors":"Lixia Zhang","doi":"10.1145/3517077.3517093","DOIUrl":"https://doi.org/10.1145/3517077.3517093","url":null,"abstract":"In order to avoid the limitations of artificial feature extraction, the CNN model is adopted to extract image features by big data-driven adaptive learning, which improves the accuracy of the features. For avoiding the loss of spatial information, an improved CNN model based on up-sampling is proposed, which consists of six layers of superimposed small convolution. The multi-layer design not only expands the receptive field, but also reduces the number of training parameters, and improves the running speed. The fusion method based on improved CNN model is proposed for multi-focus images. The improved CNN model divides the input image into focus region and non-focus region, and form the decision map. According to the decision map optimized by GFF, the focus regions are intergraded by pixel-by-pixel weighted fusion strategy to obtain fusion image. Experimental results show that the fusion results of the proposed method are clear in detail, complete in structure, no distortion in contrast, and no artifacts in the picture. It effectively avoids grayscale discontinuity, artifacts and other problems, and is better than classical methods we selected.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114474257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid development of information technology, student attendance has changed from paper attendance to machine attendance, such as taking photos, scanning QR codes, positioning, etc. These attendance needs to turn on the camera to take photos, which is slightly inefficient, or turn on the positioning service. However, many people think that turning on the positioning service will infringe on personal privacy. Therefore, we need to consider a more efficient Attendance method that does not infringe on personal privacy. Voice, as a signal that can quickly obtain and contain a variety of information, can be used for class students' attendance. Speaker recognition corpus is the basis of speech speaker recognition research. Diversified, large-scale and high-quality speaker recognition corpus plays an important role in improving the performance of speaker recognition system. At present, although there are many standardized corpora, there are few corpora for student attendance scenes. Therefore, this topic studies the speaker's speech feature parameters, and selects the appropriate Chinese phrases to establish the speaker's corpus.
{"title":"Establishment of Speaker Recognition Corpus for Intelligent Attendance System","authors":"Shuxi Chen, Yiyang Sun","doi":"10.1145/3517077.3517118","DOIUrl":"https://doi.org/10.1145/3517077.3517118","url":null,"abstract":"With the rapid development of information technology, student attendance has changed from paper attendance to machine attendance, such as taking photos, scanning QR codes, positioning, etc. These attendance needs to turn on the camera to take photos, which is slightly inefficient, or turn on the positioning service. However, many people think that turning on the positioning service will infringe on personal privacy. Therefore, we need to consider a more efficient Attendance method that does not infringe on personal privacy. Voice, as a signal that can quickly obtain and contain a variety of information, can be used for class students' attendance. Speaker recognition corpus is the basis of speech speaker recognition research. Diversified, large-scale and high-quality speaker recognition corpus plays an important role in improving the performance of speaker recognition system. At present, although there are many standardized corpora, there are few corpora for student attendance scenes. Therefore, this topic studies the speaker's speech feature parameters, and selects the appropriate Chinese phrases to establish the speaker's corpus.","PeriodicalId":233686,"journal":{"name":"2022 7th International Conference on Multimedia and Image Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123967966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}