Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659535
Y. Ishimoto, Takehiro Teraoka, M. Enomoto
This study aims to reveal a clue for predicting end-of-utterance in spontaneous Japanese speech. In casual everyday conversation, participants must predict the ends of utterances of a speaker to perform smooth turn-taking with small gaps or overlaps. Syntactic and prosodic factors are considered to project the end of utterance of speech, and participants utilize these factors to predict the end-of-utterance. In this paper, we focused on the dependency structure among bunsetsu-phrases as a syntactic feature and F0, intensity, and mora duration for bunsetsu-phrases as prosodic features. We investigated the relationship between the position of a bunsetsu-phrase in an utterance and these features. The results showed that a single feature cannot be an authoritative clue that determines the position of bunsetsu-phrases. Next, we constructed a Bayesian hierarchical model to estimate the bunsetsu-phrase position from the syntactic and prosodic features. The results of the model indicated that prosodic features vary in usefulness according to speakers. This suggests that the different combinations of syntactic and prosodic features for each speaker are relevant to predict the ends of utterances.
{"title":"A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese","authors":"Y. Ishimoto, Takehiro Teraoka, M. Enomoto","doi":"10.23919/APSIPA.2018.8659535","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659535","url":null,"abstract":"This study aims to reveal a clue for predicting end-of-utterance in spontaneous Japanese speech. In casual everyday conversation, participants must predict the ends of utterances of a speaker to perform smooth turn-taking with small gaps or overlaps. Syntactic and prosodic factors are considered to project the end of utterance of speech, and participants utilize these factors to predict the end-of-utterance. In this paper, we focused on the dependency structure among bunsetsu-phrases as a syntactic feature and F0, intensity, and mora duration for bunsetsu-phrases as prosodic features. We investigated the relationship between the position of a bunsetsu-phrase in an utterance and these features. The results showed that a single feature cannot be an authoritative clue that determines the position of bunsetsu-phrases. Next, we constructed a Bayesian hierarchical model to estimate the bunsetsu-phrase position from the syntactic and prosodic features. The results of the model indicated that prosodic features vary in usefulness according to speakers. This suggests that the different combinations of syntactic and prosodic features for each speaker are relevant to predict the ends of utterances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659735
Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi
We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.
{"title":"Low-Frequency Character Clustering for End-to-End ASR System","authors":"Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi","doi":"10.23919/APSIPA.2018.8659735","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659735","url":null,"abstract":"We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659744
Zi-Xin Xu, Y. Chan, D. Lun
Block Truncation Coding (BTC) is an effective lossy image coding technique that enjoys both high efficiency and low complexity especially when halftoning techniques are employed to shape the noise spectrum of its output. However, due to its block-based nature, blocking artifacts are commonly found in the coding outputs. Post-processing schemes are generally applied to soften the problem. Recently, a halftoning-based BTC algorithm was proposed to solve this problem by eliminating the cause of blocking artifacts. In this paper, through an optimization step, the performance of the algorithm is optimized in terms of a given objective measure. The idea can be adopted to work with other halftoning methods to optimize other measures for suiting different needs in different circumstances.
{"title":"Optimizing the Performance of Halftoning-Based Block Truncation Coding","authors":"Zi-Xin Xu, Y. Chan, D. Lun","doi":"10.23919/APSIPA.2018.8659744","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659744","url":null,"abstract":"Block Truncation Coding (BTC) is an effective lossy image coding technique that enjoys both high efficiency and low complexity especially when halftoning techniques are employed to shape the noise spectrum of its output. However, due to its block-based nature, blocking artifacts are commonly found in the coding outputs. Post-processing schemes are generally applied to soften the problem. Recently, a halftoning-based BTC algorithm was proposed to solve this problem by eliminating the cause of blocking artifacts. In this paper, through an optimization step, the performance of the algorithm is optimized in terms of a given objective measure. The idea can be adopted to work with other halftoning methods to optimize other measures for suiting different needs in different circumstances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124052371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659643
ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung
This paper proposes a data hiding method in MP4 container format. Specifically, the synchronization between subtitle and audio-video tracks is exploited to hide data. The time scale is first scaled, and the sample duration pair is modified to hide data. The proposed method is able to hide data reversibly when the payload size is relative small, and it switches to the irreversible mode to offer higher payload. Although synchronization between audio-video and subtitle tracks are manipulated, the delay or ahead in displaying subtitle is imperceptible. The filesize of the processed MP4 file is also completely preserved. Subjective evaluations are carried out to verify the basic performance of the proposed method.
{"title":"Data Hiding in MP4 Video Container based on Subtitle Track","authors":"ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung","doi":"10.23919/APSIPA.2018.8659643","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659643","url":null,"abstract":"This paper proposes a data hiding method in MP4 container format. Specifically, the synchronization between subtitle and audio-video tracks is exploited to hide data. The time scale is first scaled, and the sample duration pair is modified to hide data. The proposed method is able to hide data reversibly when the payload size is relative small, and it switches to the irreversible mode to offer higher payload. Although synchronization between audio-video and subtitle tracks are manipulated, the delay or ahead in displaying subtitle is imperceptible. The filesize of the processed MP4 file is also completely preserved. Subjective evaluations are carried out to verify the basic performance of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125779782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659774
Hang Liu, Xu Zhu, T. Fujii
In this paper, an ensemble learning (EL) framework is adopted for cooperative spectrum sensing (CSS) in an orthogonal frequency division multiplexing (OFDM) signal based cognitive radio system. Each secondary user (SU) is accordingly considered as a base learner, where the local spectrum sensing is for investigating the probability of PU being inactive or active. The convolution neural networks with simple architecture are applied given its strength in image recognition as well as the limited computation ability of each SU, meanwhile, the cyclic spectral correlation feature is introduced as the input data. Here, as for the supervised learning, the bagging strategy is helped to establish the training database. For the global decision, the fusion center employs the stacked generalization for further combination learning the SU output of classification pre-prediction of the PU status. Our method shows significant advantages over conventional CSS methods in term of the detection probability or false alarm probability performance.
{"title":"Ensemble Deep Learning Based Cooperative Spectrum Sensing with Stacking Fusion Center","authors":"Hang Liu, Xu Zhu, T. Fujii","doi":"10.23919/APSIPA.2018.8659774","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659774","url":null,"abstract":"In this paper, an ensemble learning (EL) framework is adopted for cooperative spectrum sensing (CSS) in an orthogonal frequency division multiplexing (OFDM) signal based cognitive radio system. Each secondary user (SU) is accordingly considered as a base learner, where the local spectrum sensing is for investigating the probability of PU being inactive or active. The convolution neural networks with simple architecture are applied given its strength in image recognition as well as the limited computation ability of each SU, meanwhile, the cyclic spectral correlation feature is introduced as the input data. Here, as for the supervised learning, the bagging strategy is helped to establish the training database. For the global decision, the fusion center employs the stacked generalization for further combination learning the SU output of classification pre-prediction of the PU status. Our method shows significant advantages over conventional CSS methods in term of the detection probability or false alarm probability performance.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659493
T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato
In this study, we propose efficient the number of computational iteration method of MNMF for speech recognition. The proposed method initializes and estimates the MNMF algorithm with respect to the estimated spatial correlation matrix reducing the number of iteration of update algorithm. This time, mask emphasis via Expectation Maximization algorithm is used for estimation of a spatial correlation matrix. As another method, we propose a computational complexity reduction method via decimating update of the spatial correlation matrixH. The experimental result indicates that our method reduced the computational complexity of MNMF. It shows that the performance of the conventional MNMF was maintained and the computational complexity could be reduced.
{"title":"Multichannel NMF with Reduced Computational Complexity for Speech Recognition","authors":"T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato","doi":"10.23919/APSIPA.2018.8659493","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659493","url":null,"abstract":"In this study, we propose efficient the number of computational iteration method of MNMF for speech recognition. The proposed method initializes and estimates the MNMF algorithm with respect to the estimated spatial correlation matrix reducing the number of iteration of update algorithm. This time, mask emphasis via Expectation Maximization algorithm is used for estimation of a spatial correlation matrix. As another method, we propose a computational complexity reduction method via decimating update of the spatial correlation matrixH. The experimental result indicates that our method reduced the computational complexity of MNMF. It shows that the performance of the conventional MNMF was maintained and the computational complexity could be reduced.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128243373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659547
Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto
This paper describes an experimental result of modeling stomp boxes of the distortion effect based on a machine learning approach. Our proposed technique models the distortion stomp boxes as a neural network consisting of CNN and LSTM. In this approach, CNN is employed for modeling the linear component that appears in the pre and post filters of the stomp boxes. On the other hand, LSTM is employed for modeling the nonlinear component that appears in the distortion process of the stomp boxes. All the parameters are estimated through the training process using the input and output signals of the distortion stomp boxes. The experimental result indicates that the proposed technique may have a certain potential to replicate the distortion stomp boxes appropriately by using the well-trained neural network.
{"title":"A Digital Modeling Technique for Distortion Effect Based on a Machine Learning Approach","authors":"Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto","doi":"10.23919/APSIPA.2018.8659547","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659547","url":null,"abstract":"This paper describes an experimental result of modeling stomp boxes of the distortion effect based on a machine learning approach. Our proposed technique models the distortion stomp boxes as a neural network consisting of CNN and LSTM. In this approach, CNN is employed for modeling the linear component that appears in the pre and post filters of the stomp boxes. On the other hand, LSTM is employed for modeling the nonlinear component that appears in the distortion process of the stomp boxes. All the parameters are estimated through the training process using the input and output signals of the distortion stomp boxes. The experimental result indicates that the proposed technique may have a certain potential to replicate the distortion stomp boxes appropriately by using the well-trained neural network.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128697250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.
{"title":"Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet","authors":"Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659502","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659502","url":null,"abstract":"This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130544901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659704
Tzu-Yu Chen, Po-Wen Hsiao, T. Chi
A deep neural network (DNN) is constructed to predict the magnitude responses of the head-related transfer functions (HRTFs) of users for a specific direction and a specific ear. Using the CIPIC HRTF database (including 25 azimuth angles and 50 elevation angles for both ears), we trained 2500 DNNs to predict magnitude responses of all HRTFs of a user. To reduce training time, we propose to use the final weights of the trained DNN of a nearby direction as the initial weights of the current DNN under training since magnitude responses of the HRTFs are smoothly changing across nearby directions. Analysis of variance (ANOVA) was performed to show that the proposed training scheme produces equivalent magnitude responses of HRTFs as the standard training scheme with random initial weights in terms of the log-spectral distortion (LSD) measure. Meanwhile, the proposed training scheme can dramatically reduce training time by more than 95%.
{"title":"Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization","authors":"Tzu-Yu Chen, Po-Wen Hsiao, T. Chi","doi":"10.23919/APSIPA.2018.8659704","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659704","url":null,"abstract":"A deep neural network (DNN) is constructed to predict the magnitude responses of the head-related transfer functions (HRTFs) of users for a specific direction and a specific ear. Using the CIPIC HRTF database (including 25 azimuth angles and 50 elevation angles for both ears), we trained 2500 DNNs to predict magnitude responses of all HRTFs of a user. To reduce training time, we propose to use the final weights of the trained DNN of a nearby direction as the initial weights of the current DNN under training since magnitude responses of the HRTFs are smoothly changing across nearby directions. Analysis of variance (ANOVA) was performed to show that the proposed training scheme produces equivalent magnitude responses of HRTFs as the standard training scheme with random initial weights in terms of the log-spectral distortion (LSD) measure. Meanwhile, the proposed training scheme can dramatically reduce training time by more than 95%.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127887837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659765
M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura
In Japanese scientific news articles, although the research results are described clearly, the article's sources tend to be uncited. This makes it difficult for readers to know the details of the research. In this paper, we address the task of extracting journal names from Japanese scientific news articles. We hypothesize that a journal name is likely to occur in a specific context. To support the hypothesis, we construct a character-based method and extract journal names using this method. This method only uses the left and right context features of journal names. The results of the journal name extractions suggest that the distribution hypothesis plays an important role in identifying the journal names.
{"title":"Journal Name Extraction from Japanese Scientific News Articles","authors":"M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura","doi":"10.23919/APSIPA.2018.8659765","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659765","url":null,"abstract":"In Japanese scientific news articles, although the research results are described clearly, the article's sources tend to be uncited. This makes it difficult for readers to know the details of the research. In this paper, we address the task of extracting journal names from Japanese scientific news articles. We hypothesize that a journal name is likely to occur in a specific context. To support the hypothesis, we construct a character-based method and extract journal names using this method. This method only uses the left and right context features of journal names. The results of the journal name extractions suggest that the distribution hypothesis plays an important role in identifying the journal names.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131352225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}