Ryo Masumura, A. Ito, Yu Uno, Masashi Ito, S. Makino
{"title":"文档扩展使用相关的网络文档为口语文档检索","authors":"Ryo Masumura, A. Ito, Yu Uno, Masashi Ito, S. Makino","doi":"10.1109/NLPKE.2010.5587854","DOIUrl":null,"url":null,"abstract":"Recently, automatic indexing of a spoken document using a speech recognizer attracts attention. However, index generation from an automatic transcription has many problems because the automatic transcription has many recognition errors and Out-Of-Vocabulary words. To solve this problem, we propose a document expansion method using Web documents. To obtain important keywords which included in the spoken document but lost by recognition errors, we acquire Web documents relevant to the spoken document. Then, an index of the spoken document is generated by combining an index that generated from the automatic transcription and the Web documents. We propose a method for retrieval of relevant documents, and the experimental result shows that the retrieved Web document contained many OOV words. Next, we propose a method for combining the recognized index and the Web index. The experimental result shows that the index of the spoken document generated by the document expansion was closer to an index from the manual transcription than the index generated by the conventional method. Finally, we conducted a spoken document retrieval experiment, and the document-expansion-based index gave better retrieval precision than the conventional indexing method.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"27 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Document expansion using relevant web documents for spoken document retrieval\",\"authors\":\"Ryo Masumura, A. Ito, Yu Uno, Masashi Ito, S. Makino\",\"doi\":\"10.1109/NLPKE.2010.5587854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, automatic indexing of a spoken document using a speech recognizer attracts attention. However, index generation from an automatic transcription has many problems because the automatic transcription has many recognition errors and Out-Of-Vocabulary words. To solve this problem, we propose a document expansion method using Web documents. To obtain important keywords which included in the spoken document but lost by recognition errors, we acquire Web documents relevant to the spoken document. Then, an index of the spoken document is generated by combining an index that generated from the automatic transcription and the Web documents. We propose a method for retrieval of relevant documents, and the experimental result shows that the retrieved Web document contained many OOV words. Next, we propose a method for combining the recognized index and the Web index. The experimental result shows that the index of the spoken document generated by the document expansion was closer to an index from the manual transcription than the index generated by the conventional method. Finally, we conducted a spoken document retrieval experiment, and the document-expansion-based index gave better retrieval precision than the conventional indexing method.\",\"PeriodicalId\":259975,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)\",\"volume\":\"27 8\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NLPKE.2010.5587854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Document expansion using relevant web documents for spoken document retrieval
Recently, automatic indexing of a spoken document using a speech recognizer attracts attention. However, index generation from an automatic transcription has many problems because the automatic transcription has many recognition errors and Out-Of-Vocabulary words. To solve this problem, we propose a document expansion method using Web documents. To obtain important keywords which included in the spoken document but lost by recognition errors, we acquire Web documents relevant to the spoken document. Then, an index of the spoken document is generated by combining an index that generated from the automatic transcription and the Web documents. We propose a method for retrieval of relevant documents, and the experimental result shows that the retrieved Web document contained many OOV words. Next, we propose a method for combining the recognized index and the Web index. The experimental result shows that the index of the spoken document generated by the document expansion was closer to an index from the manual transcription than the index generated by the conventional method. Finally, we conducted a spoken document retrieval experiment, and the document-expansion-based index gave better retrieval precision than the conventional indexing method.