首页 > 最新文献

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)最新文献

英文 中文
Recognition of abnormal vibrational responses of signposts using the Two-dimensional Geometric Distance and Wilcoxon test 利用二维几何距离和Wilcoxon检验识别路标的异常振动响应
M. Jinnai, Y. Akashi, S. Nakaya, F. Ren, M. Fukumi
In expressway companies, workers have been impacting signposts using wooden hammers and estimating the degree of the corrosion by listening to the sound. In order to automate this, we have been developing software that recognizes an abnormal impact vibrational response due to corrosion. This software extracts sonograms from impact vibrational waves using the LPC spectrum analysis, and matches images of the sonogram between a standard and an input impact vibrations using the Two-dimensional Geometric Distance. Then, the software distinguishes the abnormality of the input impact vibration using Wilcoxon rank-sum test. We have measured the impact vibrations of five normal signposts and five abnormal signposts, and carried out the automatic recognition experiments. As a result, the software has recognized correctly in all cases. We have verified the effectiveness of the proposed method.
在高速公路公司,工人们一直在用木锤敲击路标,并通过听声音来估计腐蚀的程度。为了实现自动化,我们一直在开发一种软件,可以识别由腐蚀引起的异常冲击振动响应。该软件使用LPC频谱分析从冲击振动波中提取超声图,并使用二维几何距离在标准和输入冲击振动之间匹配超声图图像。然后,利用Wilcoxon秩和检验对输入的冲击振动进行异常识别。测量了5个正常路标和5个异常路标的冲击振动,并进行了自动识别实验。结果,该软件在所有情况下都能正确识别。我们已经验证了所提出方法的有效性。
{"title":"Recognition of abnormal vibrational responses of signposts using the Two-dimensional Geometric Distance and Wilcoxon test","authors":"M. Jinnai, Y. Akashi, S. Nakaya, F. Ren, M. Fukumi","doi":"10.1109/NLPKE.2010.5587837","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587837","url":null,"abstract":"In expressway companies, workers have been impacting signposts using wooden hammers and estimating the degree of the corrosion by listening to the sound. In order to automate this, we have been developing software that recognizes an abnormal impact vibrational response due to corrosion. This software extracts sonograms from impact vibrational waves using the LPC spectrum analysis, and matches images of the sonogram between a standard and an input impact vibrations using the Two-dimensional Geometric Distance. Then, the software distinguishes the abnormality of the input impact vibration using Wilcoxon rank-sum test. We have measured the impact vibrations of five normal signposts and five abnormal signposts, and carried out the automatic recognition experiments. As a result, the software has recognized correctly in all cases. We have verified the effectiveness of the proposed method.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116781424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Marine literature categorization based on minimizing the labelled data 基于标记数据最小化的海洋文献分类
Wei Zhang, Qiuhong Wang, Yeheng Deng, R. Du
In marine literature categorization, supervised machine learning method will take a lot of time for labelling the samples by hand. So we utilize Co-training method to decrease the quantities of labelled samples needed for training the classifier. In this paper, we only select features from the text details and add attribute labels to them. It can greatly boost the efficiency of text processing. For building up two views, we split features into two parts, each of which can form an independent view. One view is made up of the feature set of abstract, and the other is made up of the feature sets of title, keywords, creator and department. In experiments, the F1 value and error rate of the categorization system could reach about 0.863 and 14.26%.They are close to the performance of supervised classifier (0.902 and 9.13%), which is trained by more than 1500 labelled samples, however, the labelled samples used by Co-training categorization method to train the original classifier are only one positive sample and one negative sample. In addition we consider joining the idea of the active-learning in Co-training method.
在海洋文献分类中,有监督的机器学习方法需要花费大量的时间手工标记样本。因此,我们利用协同训练方法来减少训练分类器所需的标记样本数量。在本文中,我们只从文本细节中选择特征并为其添加属性标签。它可以大大提高文本处理的效率。为了构建两个视图,我们将特征分成两个部分,每个部分都可以形成一个独立的视图。一个视图由抽象的特征集组成,另一个视图由标题、关键词、创建者和部门的特征集组成。在实验中,分类系统的F1值和错误率分别达到0.863和14.26%左右。它们接近监督分类器的性能(0.902和9.13%),后者是由1500多个标记样本训练而成的,而协同训练分类方法训练原始分类器使用的标记样本只有一个正样本和一个负样本。此外,我们还考虑在联合训练方法中加入主动学习的思想。
{"title":"Marine literature categorization based on minimizing the labelled data","authors":"Wei Zhang, Qiuhong Wang, Yeheng Deng, R. Du","doi":"10.1109/NLPKE.2010.5587847","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587847","url":null,"abstract":"In marine literature categorization, supervised machine learning method will take a lot of time for labelling the samples by hand. So we utilize Co-training method to decrease the quantities of labelled samples needed for training the classifier. In this paper, we only select features from the text details and add attribute labels to them. It can greatly boost the efficiency of text processing. For building up two views, we split features into two parts, each of which can form an independent view. One view is made up of the feature set of abstract, and the other is made up of the feature sets of title, keywords, creator and department. In experiments, the F1 value and error rate of the categorization system could reach about 0.863 and 14.26%.They are close to the performance of supervised classifier (0.902 and 9.13%), which is trained by more than 1500 labelled samples, however, the labelled samples used by Co-training categorization method to train the original classifier are only one positive sample and one negative sample. In addition we consider joining the idea of the active-learning in Co-training method.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115182144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anusaaraka: An expert system based machine translation system Anusaaraka:基于专家系统的机器翻译系统
Sriram Chaudhury, A. Rao, D. Sharma
Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.
大多数关于机器翻译的研究都是关于让计算机完全承担将一种人类语言翻译成另一种语言的重担。本文重新审视了机器翻译问题,并指出需要在人和机器之间分担负荷,从启发式中区分可靠的知识,提供一系列输出以服务于不同阶层的人,最后利用现有资源而不是重新发明轮子。本文以帕尼尼语法形式主义的信息动力学思想为基础,提出了一种开发机器翻译系统的独特方法。Anusaaraka是一个基于共享负载的基本前提下,根据读者的需要产生足够好的翻译结果的语言访问和机器翻译系统。该系统承诺忠实地表示翻译文本,在翻译时不会丢失信息,并且在失败的情况下会有优雅的退化(鲁棒性)。分层输出提供了访问翻译的所有阶段,使整个过程透明。因此,Anusaaraka在两个方面不同于机器翻译系统:(1)它对忠实的承诺,从而提供了一个100%忠实的输出层,这样经过一些训练的用户就可以忠实地“访问源文本”。(2)系统的设计使用户可以对其做出贡献并参与改进其质量。此外,Anusaaraka提供了Apertium架构与前向链专家系统的折衷组合,允许使用深层解析器和浅层解析器输出来分析SL文本。使用GPL下可用的现有语言资源(解析器、标记器、分块器),而不是再次重写它。语言数据和语言规则独立于核心程序,使得语言学家可以很容易地修改和实验不同的语言现象来改进系统。用户可以通过互联网上提供的web界面提供新的歧义词的词义消歧规则,从而成为贡献者。该系统利用专家系统的前向链,从已有的语言数据中推断出新的语言事实。它通过应用特定的知识而不是特定的技术,以电子的形式建立一个庞大的语言知识库,有助于解决语言翻译的复杂行为。或者换句话说,专家系统有助于将人类可用的主题专家(SME)知识转化为计算机可处理的知识库。
{"title":"Anusaaraka: An expert system based machine translation system","authors":"Sriram Chaudhury, A. Rao, D. Sharma","doi":"10.1109/NLPKE.2010.5587789","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587789","url":null,"abstract":"Most research in Machine translation is about having the computers completely bear the load of translating one human language into another. This paper looks at the machine translation problem afresh and observes that there is a need to share the load between man and machine, distinguish reliable knowledge from the heuristics, provide a spectrum of outputs to serve different strata of people, and finally make use of existing resources instead of reinventing the wheel. This paper describes a unique approach to develop machine translation system based on the insights of information dynamics from Paninian Grammar Formalism. Anusaaraka is a Language Accessor cum Machine Translation system based on the fundamental premise of sharing the load producing good enough results according to the needs of the reader. The system promises to give faithful representation of the translated text, no loss of information while translating and graceful degradation (robustness) in case of failure. The layered output provides an access to all the stages of translation making the whole process transparent. Thus, Anusaaraka differs from the Machine Translation systems in two respects: (1) its commitment to faithfulness and thereby providing a layer of 100% faithful output so that a user with some training can “access the source text” faithfully. (2) The system is so designed that a user can contribute to it and participate in improving its quality. Further Anusaaraka provides an eclectic combination of the Apertium architecture with the forward chaining expert system, allowing use of both the deep parser and shallow parser outputs to analyze the SL text. Existing language resources (parsers, taggers, chunkers) available under GPL are used instead of rewriting it again. Language data and linguistic rules are independent from the core programme, making it easy for linguists to modify and experiment with different language phenomena to improve the system. Users can become contributors by contributing new word sense disambiguation (WSD) rules of the ambiguous words through a web-interface available over internet. The system uses forward chaining of expert system to infer new language facts from the existing language data. It helps to solve the complex behavior of language translation by applying specific knowledge rather than specific technique creating a vast language knowledge base in electronic form. Or in other words, the expert system facilitates the transformation of subject matter expert's (SME) knowledge available with humans into a computer processable knowledge base.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Extracting opinion sentence by combination of SVM and syntactic templates 基于支持向量机和句法模板的观点句提取方法
Bo Zhang, Yanquan Zhou, Yu Mao
This paper presents a combined method of syntactic structure, dependency relation and SVM classifier to extract opinion sentences. At first, we use the syntactic structure templates with high confidence summarized artificially and the dependency relation templates with high precision obtained by a dependency relation extraction algorithm to tag sentences as opinion sentence. Then we input the remaining test data to a trained SVM classifier which is created by a rigorous process of feature selection. Eventually the combined method performed well, achieving 92.6% recall with 85.5% precision.
提出了一种结合句法结构、依赖关系和支持向量机分类器的观点句提取方法。首先,我们使用人工总结的高置信度句法结构模板和依赖关系提取算法获得的高精度依赖关系模板将句子标记为意见句。然后将剩余的测试数据输入到经过训练的SVM分类器中,该分类器通过严格的特征选择过程生成。最终,联合方法取得了良好的结果,召回率为92.6%,准确率为85.5%。
{"title":"Extracting opinion sentence by combination of SVM and syntactic templates","authors":"Bo Zhang, Yanquan Zhou, Yu Mao","doi":"10.1109/NLPKE.2010.5587835","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587835","url":null,"abstract":"This paper presents a combined method of syntactic structure, dependency relation and SVM classifier to extract opinion sentences. At first, we use the syntactic structure templates with high confidence summarized artificially and the dependency relation templates with high precision obtained by a dependency relation extraction algorithm to tag sentences as opinion sentence. Then we input the remaining test data to a trained SVM classifier which is created by a rigorous process of feature selection. Eventually the combined method performed well, achieving 92.6% recall with 85.5% precision.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"29 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128669541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improving emotion recognition from text with fractionation training
Ye Wu, F. Ren
Previous approaches of emotion recognition from text were mostly implemented under keyword-based or learning-based frameworks. However, keyword-based systems are unable to recognize emotion from text with no emotional keywords, and constructing an emotion lexicon is a tough work because of ambiguity in defining all emotional keywords. Completely prior-knowledge-free supervised machine learning methods for emotion recognition also do not perform as well as on some traditional tasks. In this paper, a fractionation training approach is proposed, utilizing the emotion lexicon extracted from an annotated blog emotion corpus to train SVM classifiers. Experimental results show the effectiveness of the proposed approach, and the use of some other experimental design also improves the classification accuracy.
以往的文本情感识别方法大多是在基于关键词或基于学习的框架下实现的。然而,基于关键字的系统无法从没有情感关键字的文本中识别情感,并且由于所有情感关键字的定义都存在歧义,因此构建情感词典是一项艰巨的工作。完全无先验知识的监督机器学习方法在情感识别方面的表现也不如一些传统任务。本文提出了一种分类训练方法,利用从带注释的博客情感语料库中提取的情感词汇来训练支持向量机分类器。实验结果表明了所提方法的有效性,同时采用其他一些实验设计也提高了分类精度。
{"title":"Improving emotion recognition from text with fractionation training","authors":"Ye Wu, F. Ren","doi":"10.1109/NLPKE.2010.5587800","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587800","url":null,"abstract":"Previous approaches of emotion recognition from text were mostly implemented under keyword-based or learning-based frameworks. However, keyword-based systems are unable to recognize emotion from text with no emotional keywords, and constructing an emotion lexicon is a tough work because of ambiguity in defining all emotional keywords. Completely prior-knowledge-free supervised machine learning methods for emotion recognition also do not perform as well as on some traditional tasks. In this paper, a fractionation training approach is proposed, utilizing the emotion lexicon extracted from an annotated blog emotion corpus to train SVM classifiers. Experimental results show the effectiveness of the proposed approach, and the use of some other experimental design also improves the classification accuracy.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128574637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detection of users suspected of using multiple user accounts and manipulating evaluations in a community site 检测涉嫌使用多个用户帐户和操纵社区网站评估的用户
Naoki Ishikawa, Kenji Umemoto, Yasuhiko Watanabe, Yoshihiro Okada, Ryo Nishimura, M. Murata
Some users in a community site abuse the anonymity and attempt to manipulate communications in a community site. These users and their submissions discourage other users, keep them from retrieving good communication records, and decrease the credibility of the communication site. To solve this problem, we conducted an experimental study to detect users suspected of using multiple user accounts and manipulating evaluations in a community site. In this study, we used messages in the data of Yahoo! chiebukuro for data training and examination.
社区站点中的一些用户滥用匿名性并试图操纵社区站点中的通信。这些用户及其提交的内容阻碍了其他用户,使他们无法检索良好的通信记录,并降低了通信站点的可信度。为了解决这个问题,我们进行了一项实验研究,以检测涉嫌使用多个用户帐户并操纵社区网站评估的用户。在本研究中,我们使用了Yahoo!Chiebukuro用于数据培训和考试。
{"title":"Detection of users suspected of using multiple user accounts and manipulating evaluations in a community site","authors":"Naoki Ishikawa, Kenji Umemoto, Yasuhiko Watanabe, Yoshihiro Okada, Ryo Nishimura, M. Murata","doi":"10.1109/NLPKE.2010.5587765","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587765","url":null,"abstract":"Some users in a community site abuse the anonymity and attempt to manipulate communications in a community site. These users and their submissions discourage other users, keep them from retrieving good communication records, and decrease the credibility of the communication site. To solve this problem, we conducted an experimental study to detect users suspected of using multiple user accounts and manipulating evaluations in a community site. In this study, we used messages in the data of Yahoo! chiebukuro for data training and examination.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Using cognitive model to automatically analyze Chinese predicate 基于认知模型的汉语谓词自动分析
Shiqi Li, T. Zhao, Hanjing Li, Shui Liu, Pengyuan Liu
This paper presents an cognitive approach to semantic role labeling in Chinese based on an extension of Construction-Integration (CI) model. The method can implicitly integrate more contextual and general knowledge into the calculating process in contrast with the machine learning methods. First, we define a proposition representation as the basic unit for semantic role labeling using CI model. Then the contextually appropriate propositions will be strengthened and inappropriate ones will be inhibited by simulating the spreading activation of human mind. Finally, experimental results show an encouraging performance on Chinese PropBank (CPB) and other two datasets.
本文提出了一种基于构建-集成(CI)模型扩展的汉语语义角色标注认知方法。与机器学习方法相比,该方法可以隐式地将更多的上下文和一般知识集成到计算过程中。首先,我们定义了一个命题表示作为语义角色标注的基本单元。然后通过模拟人脑的扩张性激活,强化情境适宜命题,抑制情境不适宜命题。最后,实验结果表明,在中文PropBank (CPB)和其他两个数据集上取得了令人鼓舞的性能。
{"title":"Using cognitive model to automatically analyze Chinese predicate","authors":"Shiqi Li, T. Zhao, Hanjing Li, Shui Liu, Pengyuan Liu","doi":"10.1109/NLPKE.2010.5587843","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587843","url":null,"abstract":"This paper presents an cognitive approach to semantic role labeling in Chinese based on an extension of Construction-Integration (CI) model. The method can implicitly integrate more contextual and general knowledge into the calculating process in contrast with the machine learning methods. First, we define a proposition representation as the basic unit for semantic role labeling using CI model. Then the contextually appropriate propositions will be strengthened and inappropriate ones will be inhibited by simulating the spreading activation of human mind. Finally, experimental results show an encouraging performance on Chinese PropBank (CPB) and other two datasets.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117129179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating english-persian parallel corpus using an automatic anchor finding sentence aligner 使用自动锚点查找句子对齐器生成英语-波斯语平行语料库
Meisam Vosoughpour Yazdchi, Heshaam Faili
The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new language-independent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentence-level corpus.
我们越能扩大平行双语语料库,我们就越能使它有效和强大。提供这样的语料库需要在寻找尽可能多的已经翻译的文本和设计适当的句子对齐算法方面做出特别的努力,同时尽可能减少时间复杂度。在本文中,我们提出了一种线性时间复杂度的波斯语-英语文本句子对齐算法,并且具有惊人的高准确率。这种线性时间复杂度是通过我们新的与语言无关的锚点查找算法实现的,该算法使我们能够在一次尝试中以高精度将一个大的平行文本对齐为整本书。据我们所知,该项目是第一个英汉-波斯语平行句级语料库的自动构建。
{"title":"Generating english-persian parallel corpus using an automatic anchor finding sentence aligner","authors":"Meisam Vosoughpour Yazdchi, Heshaam Faili","doi":"10.1109/NLPKE.2010.5587769","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587769","url":null,"abstract":"The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new language-independent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentence-level corpus.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122993030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Descriptive analysis of emotion and feeling in voice 声音中情绪和感觉的描述性分析
M. Shimura, Fumiaki Monma, S. Mitsuyoshi, M. Shuzo, Taishi Yamamoto, I. Yamada
Recognition of human “emotions” or “feelings” from voice is important to research on human communications. Although there has been much research on emotions or feelings in voice, definitions of these terms have been inconsistent. We reviewed previous papers in linguistics, brain science, information science, etc. and developed specific definitions for these term. In our paper, “emotion” is defined as an involuntary reaction in the human brain; it has two states: pleasure and displeasure. “Feeling” (e.g., anger, enjoyment, sadness, fear, and distress) is defined as a state voluntarily resulting from an emotion. Here, we should notice that the pleasure-displeasure direction does not always correspond to the feeling. So, our objective is to obtain sufficient amount of voice data and to analyze the relationship between emotions and feelings. In voice recording experiments, the voice database from about 100 participants with various natural feelings was constructed. A result of descriptive analysis showed that pleasure-displeasure direction did not correspond to the each feeling in 5% of voice data. This result suggested that, if an experimental situation is constructed that tends to arouse various feelings, data with less variability can be obtained. Further analysis of the characteristics of the data obtained to identify situations in which the pleasure-displeasure direction does not necessarily correspond to the basic feeling should lead to improved accuracy of voice emotion recognition.
从声音中识别人的“情绪”或“感觉”,对研究人类交流具有重要意义。尽管对声音中的情绪或感觉进行了很多研究,但这些术语的定义一直不一致。我们回顾了语言学、脑科学、信息科学等方面的文献,并对这些术语进行了具体的定义。在我们的论文中,“情感”被定义为人类大脑中的一种无意识反应;它有两种状态:快乐和不快乐。“感觉”(如愤怒、享受、悲伤、恐惧和痛苦)被定义为一种由情绪自发产生的状态。在这里,我们应该注意到快乐-不快乐的方向并不总是与感觉相对应。因此,我们的目标是获得足够数量的语音数据,并分析情绪和感觉之间的关系。在录音实验中,构建了约100名具有各种自然感受的参与者的语音数据库。描述性分析的结果表明,在5%的语音数据中,快乐-不快乐的方向与每种感觉不对应。这一结果表明,如果构建一个易于引起各种感受的实验情境,则可以获得变异性较小的数据。进一步分析所获得的数据的特征,以确定快乐-不快乐方向不一定与基本感觉相对应的情况,这将提高语音情绪识别的准确性。
{"title":"Descriptive analysis of emotion and feeling in voice","authors":"M. Shimura, Fumiaki Monma, S. Mitsuyoshi, M. Shuzo, Taishi Yamamoto, I. Yamada","doi":"10.1109/NLPKE.2010.5587794","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587794","url":null,"abstract":"Recognition of human “emotions” or “feelings” from voice is important to research on human communications. Although there has been much research on emotions or feelings in voice, definitions of these terms have been inconsistent. We reviewed previous papers in linguistics, brain science, information science, etc. and developed specific definitions for these term. In our paper, “emotion” is defined as an involuntary reaction in the human brain; it has two states: pleasure and displeasure. “Feeling” (e.g., anger, enjoyment, sadness, fear, and distress) is defined as a state voluntarily resulting from an emotion. Here, we should notice that the pleasure-displeasure direction does not always correspond to the feeling. So, our objective is to obtain sufficient amount of voice data and to analyze the relationship between emotions and feelings. In voice recording experiments, the voice database from about 100 participants with various natural feelings was constructed. A result of descriptive analysis showed that pleasure-displeasure direction did not correspond to the each feeling in 5% of voice data. This result suggested that, if an experimental situation is constructed that tends to arouse various feelings, data with less variability can be obtained. Further analysis of the characteristics of the data obtained to identify situations in which the pleasure-displeasure direction does not necessarily correspond to the basic feeling should lead to improved accuracy of voice emotion recognition.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124413153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MT on and for the Web MT上和为网络
C. Boitet, H. Blanchon, Mark Seligman, Valérie Bellynck
A Systran MT server became available on the minitel network in 1984, and on Internet in 1994. Since then we have come to a better understanding of the nature of MT systems by separately analyzing their linguistic, computational, and operational architectures. Also, thanks to the CxAxQ metatheorem, the systems' inherent limits have been clarified, and design choices can now be made in an informed manner according to the translation situations. MT evaluation has also matured: tools based on reference translations are useful for measuring progress; those based on subjective judgments for estimating future usage quality; and task-related objective measures (such as post-editing distances) for measuring operational quality. Moreover, the same technological advances that have led to “Web 2.0” have brought several futuristic predictions to fruition. Free Web MT services have democratized assimilation MT beyond belief. Speech translation research has given rise to usable systems for restricted tasks running on PDAs or on mobile phones connected to servers. New man-machine interface techniques have made interactive disambiguation usable in large-coverage multimodal MT. Increases in computing power have made statistical methods workable, and have led to the possibility of building low-linguistic-quality but still useful MT systems by machine learning from aligned bilingual corpora (SMT, EBMT). In parallel, progress has been made in developing interlingua-based MT systems, using hybrid methods. Unfortunately, many misconceptions about MT have spread among the public, and even among MT researchers, because of ignorance of the past and present of MT R&D. A compensating factor is the willingness of end users to freely contribute to building essential parts of the linguistic knowledge needed to construct MT systems, whether corpus-related or lexical. Finally, some developments we anticipated fifteen years ago have not yet materialized, such as online writing tools equipped with interactive disambiguation, and as a corollary the possibility of transforming source documents into self-explaining documents (SEDs) and of producing corresponding SEDs fully automatically in several target languages. These visions should now be realized, thanks to the evolution of Web programming and multilingual NLP techniques, leading towards a true Semantic Web, “Web 3.0”, which will support ubilingual (ubiquitous multilingual) computing.
1984年,一个systeman MT服务器在迷你网络上可用,1994年在Internet上可用。从那时起,我们通过分别分析其语言、计算和操作架构,对机器翻译系统的本质有了更好的理解。此外,由于CxAxQ元定理,系统的固有限制已经被澄清,并且现在可以根据翻译情况以明智的方式做出设计选择。机器翻译评估也已经成熟:基于参考翻译的工具对于衡量进展是有用的;基于主观判断来估计未来使用质量的;以及与任务相关的客观度量(如后期编辑距离),用于度量操作质量。此外,同样的技术进步导致了Web 2.0&# x201C;实现了几个未来主义的预言。免费的网络机器翻译服务已经使机器翻译民主化。语音翻译研究已经产生了可用于在pda或连接到服务器的移动电话上运行的受限任务的可用系统。新的人机界面技术使交互式消歧在大范围多模态机器翻译中可用。计算能力的提高使统计方法可行,并通过对齐双语语料库(SMT, EBMT)的机器学习,使构建低语言质量但仍然有用的机器翻译系统成为可能。同时,使用混合方法开发基于语言间的机器翻译系统也取得了进展。不幸的是,由于对机器翻译研发的过去和现在的无知,许多关于机器翻译的误解在公众中传播,甚至在机器翻译研究人员中传播。一个补偿因素是最终用户愿意自由地为构建机器翻译系统所需的语言知识的基本部分做出贡献,无论是与语料库相关的还是词汇的。最后,我们在15年前预测的一些发展尚未实现,例如配备交互式消歧的在线写作工具,以及将源文档转换为自解释文档(sed)的必然可能性,以及以几种目标语言完全自动生成相应的sed的可能性。由于Web编程和多语言NLP技术的发展,这些愿景现在应该实现了,它们将导致真正的语义网(Web 3.0),它将支持非双语(无处不在的多语言)计算。
{"title":"MT on and for the Web","authors":"C. Boitet, H. Blanchon, Mark Seligman, Valérie Bellynck","doi":"10.1109/NLPKE.2010.5587865","DOIUrl":"https://doi.org/10.1109/NLPKE.2010.5587865","url":null,"abstract":"A Systran MT server became available on the minitel network in 1984, and on Internet in 1994. Since then we have come to a better understanding of the nature of MT systems by separately analyzing their linguistic, computational, and operational architectures. Also, thanks to the CxAxQ metatheorem, the systems' inherent limits have been clarified, and design choices can now be made in an informed manner according to the translation situations. MT evaluation has also matured: tools based on reference translations are useful for measuring progress; those based on subjective judgments for estimating future usage quality; and task-related objective measures (such as post-editing distances) for measuring operational quality. Moreover, the same technological advances that have led to “Web 2.0” have brought several futuristic predictions to fruition. Free Web MT services have democratized assimilation MT beyond belief. Speech translation research has given rise to usable systems for restricted tasks running on PDAs or on mobile phones connected to servers. New man-machine interface techniques have made interactive disambiguation usable in large-coverage multimodal MT. Increases in computing power have made statistical methods workable, and have led to the possibility of building low-linguistic-quality but still useful MT systems by machine learning from aligned bilingual corpora (SMT, EBMT). In parallel, progress has been made in developing interlingua-based MT systems, using hybrid methods. Unfortunately, many misconceptions about MT have spread among the public, and even among MT researchers, because of ignorance of the past and present of MT R&D. A compensating factor is the willingness of end users to freely contribute to building essential parts of the linguistic knowledge needed to construct MT systems, whether corpus-related or lexical. Finally, some developments we anticipated fifteen years ago have not yet materialized, such as online writing tools equipped with interactive disambiguation, and as a corollary the possibility of transforming source documents into self-explaining documents (SEDs) and of producing corresponding SEDs fully automatically in several target languages. These visions should now be realized, thanks to the evolution of Web programming and multilingual NLP techniques, leading towards a true Semantic Web, “Web 3.0”, which will support ubilingual (ubiquitous multilingual) computing.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1