2013 International Conference on Asian Language Processing最新文献

英文中文

Research and Implementation of the Uyghur-Chinese Personal Name Transliteration Based on Syllabification 基于音节化的维吾尔汉人名音译研究与实现

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.22

Alim Murat, Azragul Yusup, Yusup Abaydulla

In recent years, there have been many Uyghur-Chinese cross language applications, but automatic translation still lack of in-depth study between these two languages. The most traditional Uyghur-Chinese personal name transliteration based on rules, and different from phoneme-based transliteration, this paper achieves Uyghur-Chinese personal name transliteration on the basis of Uyghur syllabification and under Grapheme-based DOM Transliteration Framework.

近年来，维吾尔汉交叉语言应用较多，但两种语言之间的自动翻译仍缺乏深入的研究。基于规则的维吾尔汉人名音译是最传统的基于规则的维吾尔汉人名音译，与基于音位的音译不同，本文在基于字形的DOM音译框架下，基于维吾尔语音节化实现了维吾尔汉人名音译。

引用次数: 3

On the Formation and Semantic Meaning of the Title [[V+O]+O] in Mandarin 论汉语标题[[V+O]+O]的构成及其语义

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.17

Hongwu Xue

The paper studied the features of the sentence [[V+O] +O] which is a documents title sentence in Mandarin, and found out its process and motivation of its formation. The value of the paper lies in supplying a way to the document title generation and the semantic orientation computing in the NPL.

本文研究了汉语文献标题句[[V+O] +O]的特点，找出了其形成的过程和动机。本文的价值在于为国家物理实验室的文档标题生成和语义方向计算提供了一种方法。

引用次数: 0

Pronominal Resolution in Tamil Using Tree CRFs 用Tree CRFs解析泰米尔语中的代词

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.59

R. Ram, S. L. Devi

We describe our work on pronominal resolution in Tamil using Tree CRFs. Pronominal resolution is the task of identifying the referent of a pronominal. In this work we have studied third person pronouns in Tamil such as 'avan', 'aval', 'athu', 'avar', he, she, it and they respectively. Tamil is a Dravidian language and it is morphologically rich and highly agglutinative language. Tree CRFs is a machine learning method, in which the data is modeled as a graph with edge weights used for learning. The features for learning are developed by using the morphological features of the language. The work is carried out on tourism domain data from the Web. We have obtained 70.8% precision and 66.5% recall. The results are encouraging.

我们使用Tree CRFs描述我们在泰米尔语代词解决方面的工作。代词解析是识别代词所指对象的任务。在这项工作中，我们研究了泰米尔语中的第三人称代词，如“avan”，“aval”，“athu”，“avar”，他，她，它和他们。泰米尔语是一种德拉威语，它是一种形态丰富且高度粘连的语言。Tree CRFs是一种机器学习方法，它将数据建模为带有边缘权重的图，用于学习。学习的特征是利用语言的形态特征来发展的。该工作是在来自Web的旅游领域数据上进行的。我们获得了70.8%的准确率和66.5%的召回率。结果令人鼓舞。

引用次数: 11

On Teaching Strategies of Light Tone in Chinese to Foreign Students 浅论对外国留学生汉语轻调教学策略

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.34

Mingyue Wang

The teaching of light tone is in an awkward situation in teaching Chinese as a second language. Disputes on whether or not light tone should be taught, what to teach and how to teach could be heard from time to time. As a special phonetic phenomenon in Chinese Mandarin and dialects, light tone has a certain grammatical function, but more importantly it reveals the rhythm of Chinese, therefore, it should not be ignored in the teaching of Chinese to foreign students. This paper tries to analyze the importance and difficulties of the light tone teaching and figure out acceptable solutions to the difficulties of the light tone teaching.

轻音教学在对外汉语教学中处于尴尬的境地。关于轻调教不教、教什么、怎么教的争论不绝于耳。轻声调作为汉语普通话和方言中的一种特殊的语音现象，具有一定的语法功能，但更重要的是它揭示了汉语的节奏，因此在对外汉语教学中不可忽视。本文试图分析轻音教学的重要性和难点，并针对轻音教学的难点提出可接受的解决方案。

引用次数: 0

Machine Translation Approach for Vietnamese Diacritic Restoration 越南语变音符恢复的机器翻译方法

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.30

T. Do, Duy Binh Nguyen, Dang-Khoa Mac, D. Tran

The diacritic marks exist in many languages such as French, German, Slovak, Vietnamese, etc. However for some reasons, sometime they are omitted in writing. This phenomenon may lead to the ambiguity for reader when reading a non-diacritic text. The automatic diacritic restoration problem has been proposed and resolved in several languages using the character-based approach, word-based approach, point-wise approach, etc. However, these approaches lean heavily on the linguistics information, size of training corpus and sometime they are language dependent. In this paper, a simple and effective restoration method will be presented. The machine translation approach will be used as a new solution for this problem. The restoration method has been applied for Vietnamese language, and integrated in an Android application named VIVA (Vietnamese Voice Assistant) that reads out the content of incoming text messages on mobile phone. Our experiments show that the proposed restoration method can recover diacritic marks with a 99.0% accuracy rate.

变音符号存在于许多语言中，如法语、德语、斯洛伐克语、越南语等。然而，由于某些原因，有时它们在书面中被省略。这种现象可能导致读者在阅读非变音符文本时产生歧义。本文提出了几种语言的自动变音符恢复问题，并采用基于字符的方法、基于单词的方法、基于点的方法等解决了自动变音符恢复问题。然而，这些方法严重依赖于语言学信息，训练语料库的大小，有时它们是语言依赖的。本文将提出一种简单有效的复原方法。机器翻译方法将被用作解决这一问题的新方法。这种复原方法已经应用于越南语，并被整合到一款名为VIVA(越南语语音助手)的安卓应用程序中，该应用程序可以读取手机收到的短信内容。实验结果表明，该复原方法能以99.0%的正确率复原变音符。

引用次数: 10

Relationship Extraction Tactics of Chinese Entity Based on Formal Concept Connectivity Distance 基于形式概念连通性距离的中文实体关系抽取策略

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.12

Chun-ming. Cheng

As Chinese expression diversity, there are some shortcomings in traditional algorithms of Chinese entity relationship extraction. For example, workload of labeling by hand on training corpus is too large, the generated relationship schemas usually have poor versatility, and it is difficult to select or integrate high quality domain ontology for extraction task. Moreover, these algorithms don't consider the fact that the entity relationship usually has different meanings with the different topic backgrounds or with the various concept granularities. The paper, utilizing statistical method and linguistics knowledge, carries out the work of crawling, parsing, filling, builds the relational formal concept lattice with Chinese entities context, and acquires entity relationship schemas described by relational formal concept. With these relational schemas and concept built above, we carry out the entry concept correlation computing and the predicate text flexible matching, and get the concept connectivity distance between entities to achieve the non-single and indirect entity relation extraction. The granularities of concept in relation extraction are more flexible, and the relational schema described by formal concept is more versatile and robust. The method in this paper provides a better semantic description for the extracted relationship, and obtains a good relation extraction performance.

由于中文表达的多样性，传统的中文实体关系提取算法存在一定的不足。例如，手工标注训练语料库的工作量太大，生成的关系模式通常通用性差，难以选择或集成高质量的领域本体进行抽取任务。而且，这些算法没有考虑到实体关系在不同的主题背景或不同的概念粒度下通常具有不同的含义。本文利用统计学方法和语言学知识，进行了抓取、解析、填充等工作，构建了具有中文实体语境的关系形式概念格，获得了用关系形式概念描述的实体关系图式。利用上述构建的关系模式和概念，进行条目概念关联计算和谓词文本灵活匹配，获得实体之间的概念连通性距离，实现非单一、间接的实体关系提取。关系提取中的概念粒度更加灵活，形式概念描述的关系模式更加通用和健壮。该方法为提取的关系提供了更好的语义描述，获得了良好的关系提取性能。

{"title":"Relationship Extraction Tactics of Chinese Entity Based on Formal Concept Connectivity Distance","authors":"Chun-ming. Cheng","doi":"10.1109/IALP.2013.12","DOIUrl":"https://doi.org/10.1109/IALP.2013.12","url":null,"abstract":"As Chinese expression diversity, there are some shortcomings in traditional algorithms of Chinese entity relationship extraction. For example, workload of labeling by hand on training corpus is too large, the generated relationship schemas usually have poor versatility, and it is difficult to select or integrate high quality domain ontology for extraction task. Moreover, these algorithms don't consider the fact that the entity relationship usually has different meanings with the different topic backgrounds or with the various concept granularities. The paper, utilizing statistical method and linguistics knowledge, carries out the work of crawling, parsing, filling, builds the relational formal concept lattice with Chinese entities context, and acquires entity relationship schemas described by relational formal concept. With these relational schemas and concept built above, we carry out the entry concept correlation computing and the predicate text flexible matching, and get the concept connectivity distance between entities to achieve the non-single and indirect entity relation extraction. The granularities of concept in relation extraction are more flexible, and the relational schema described by formal concept is more versatile and robust. The method in this paper provides a better semantic description for the extracted relationship, and obtains a good relation extraction performance.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122858660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Algorithm of Chinese String Sort in User-Defined Sequence 一种高效的自定义序列中文字符串排序算法

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.33

Haijun Zhang, Shumin Shi

Existing sort algorithms are difficult to implement Chinese string sort in user-defined sequence. This paper proposes an efficient string sort method in user-defined character order. On the basis of the consecutive numbers which used to define the custom order of characters, the hash table structure is employed to convert each string into corresponding array of integers. By taking the maximum number of characters as the new radix, the Radix sort algorithm is used to implement fast sort of strings in user-defined order. Theory analysis and experiments show that the sort algorithm of this paper can easily achieve Chinese string sort in user-defined order in linear time and space complexity. This sort algorithm has a better time performance than that of Quick Sort algorithm, and it can effortlessly extend to string sort applications of other languages.

现有的排序算法难以实现用户自定义序列的中文字符串排序。提出了一种基于用户自定义字符顺序的高效字符串排序方法。在定义自定义字符顺序的连续数字的基础上，使用哈希表结构将每个字符串转换为相应的整数数组。radix排序算法以最大的字符数作为新的基数，以用户自定义的顺序对字符串进行快速排序。理论分析和实验表明，本文提出的排序算法在线性时间和空间复杂度上可以很容易地实现自定义顺序的中文字符串排序。该算法具有比快速排序算法更好的时间性能，并且可以毫不费力地扩展到其他语言的字符串排序应用中。

引用次数: 0

Combination of Unsupervised Keyphrase Extraction Algorithms 无监督关键词提取算法的组合

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.14

Zede Zhu, Miao Li, Lei Chen, Zhenxin Yang, Sheng Chen

Key phrase extraction plays a significant role in many language processing tasks such as text summarization, text categorization and information retrieval. However, none study combines several approaches to improve the performance of key phrase extraction. This paper first implements three representative unsupervised algorithms TfIdf, Text Rank and Expand Rank, and then proposes a generalized framework using serial, parallel and voting methods on combining these algorithms for comprehensive analysis of key phrase extraction. Experimental results, carried out on an evaluation dataset including 1040 abstracts from Chinese thesis, demonstrate the remarkable performance of some combination approaches.

关键短语提取在文本摘要、文本分类和信息检索等语言处理任务中起着重要的作用。然而，没有一项研究结合几种方法来提高关键短语提取的性能。本文首先实现了三种具有代表性的无监督算法TfIdf、Text Rank和Expand Rank，然后将这些算法结合起来，提出了一个采用串行、并行和投票方法的通用框架，对关键短语提取进行综合分析。在包含1040篇中文论文摘要的评价数据集上进行的实验结果表明，一些组合方法具有显著的性能。

引用次数: 7

Topic and Its Negation in Chinese Sentences 汉语句子中的话题及其否定

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.10

Lin He, Qiong Peng

There are two major views on the generation of sentential topics in Chinese, of which some are the ones moved from a syntactic position. Disagreement occurs as regards the so-called dangling topics. One view contends that dangling topics are the moved ones, thematically related to a position inside the comment, the other holds that they are base-generated and licensed by the non-empty set resulting from the intersection of the topic set and the set generated by the semantic variable in the comment. Both views help interpret the negation of topics in Chinese sentential negation from the perspective of syntax, semantics and pragmatics. It is suggested that the topic can be negated when the variable or related element in the comment has a co-referential relation with the topic, or has no definite referent.

关于汉语句子主题的生成有两种主要观点，其中一些观点是从句法角度出发的。分歧发生在所谓的悬空话题上。一种观点认为，悬空主题是移动的主题，在主题上与注释中的位置相关;另一种观点认为，悬空主题是由主题集和注释中语义变量生成的集的交集产生的非空集生成和许可的。这两种观点都有助于从句法、语义和语用的角度解释汉语句子否定中的主语否定。建议当评论中的变量或相关元素与主题存在共指关系，或者没有明确的指涉时，可以对主题进行否定。

引用次数: 0

Broadcast News Story Clustering via Term and Sentence Matching 基于词与句子匹配的广播新闻故事聚类

2013 International Conference on Asian Language Processing

Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.62

Foong Kuin Yow, T. Tan

In this paper, we propose a rule-based approach that uses the term and sentence matching criteria for clustering Malay broadcast news to different stories. The proposed clustering method does not require users to predefined number of clusters. The three main stages of the clustering are sentences segmentation, indexing, and also term and sentence matching clustering. The sentences in the transcription will be segmented before indexing. Indexing involves tokenization, stop word removal, stemming, term selection and term representation. A vector space model (VSM) is used to represent the terms and sentences in the form of vectors. The sentences will then be grouped into clusters by using term and sentence matching thresholds. The proposed approach shows a significantly better accuracy than the baseline approaches.

在本文中，我们提出了一种基于规则的方法，该方法使用术语和句子匹配标准将马来语广播新闻聚类到不同的故事。提出的聚类方法不需要用户预先定义簇数。聚类的三个主要阶段是句子切分、索引以及术语和句子匹配聚类。在索引之前，抄本中的句子将被分段。索引包括标记化、停止词删除、词干提取、术语选择和术语表示。使用向量空间模型(VSM)以向量的形式表示术语和句子。然后使用术语和句子匹配阈值将句子分组成簇。该方法的精度明显优于基线方法。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 International Conference on Asian Language Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀