LDV Forum最新文献

英文中文

Satzlänge: Definitionen, Häufigkeiten, Modelle (Am Beispiel slowenischer Prosatexte) 文法:定义，发音，模型(斯洛文尼亚模式)

LDV Forum

Pub Date : 2022-12-05 DOI: 10.21248/jlcl.20.2005.74

Emmerich Kelih, Peter Grzybek

Die vorliegende Untersuchung versteht sich als ein Beitrag zur Satzlängenforschung. Nach einleitender Darstellung der Analysemöglichkeiten auf der Ebene der Satzlängen, geht es hauptsächlich um die Diskussion der Anwendung von unterschiedlichen Satzdefinitionen. Auf der Basis eines Korpus slowenischer Texte wird der Frage nachgegangen,welchen Einfluss die Anwendung unterschiedlicher (durchaus üblicher) Satzdefinitionenauf (a) deskriptive Kenngrößen der Häufigkeitsverteilung hat, und (b) inwiefern davondie Adäquatheit und Güte theoretischer Verteilungsmodelle abhängt.

本文旨在说明研究的一个结果。《根据对条目长度层面的分析要素的介绍，我们主要讨论不同的句子定义的应用。在作为一份斯洛文尼亚文本的基础上，将在应用不同(a)叙事模式(a)的说明性模式的影响下探索这一问题(b)理论分配模式的可持续性和善良程度取决于这一问题。

引用次数: 8

Evaluating the Quality of Automatically Extracted Synonymy Information 自动提取同义词信息的质量评价

LDV Forum

Pub Date : 2008-07-01 DOI: 10.21248/jlcl.23.2008.100

A. Kumaran, R. Makin, Vijay Pattisapu, Shaik Sharif, Lucy Vanderwende

Automatic extraction of semantic information, if successful, offers to languages with little or poor resources, the prospects of creating ontological resources inexpensively, thus providing support for common-sense reasoning applications in those languages. In this paper we explore the automatic extraction of synonymy information from large corpora using two complementary techniques: a generic broad-coverage parser for generation of bits of semantic information, and their synthesis into sets of synonyms using automatic sense-disambiguation. To validate the quality of the synonymy information thus extracted, we experiment with English, where appropriate semantic resources are already available. We cull synonymy information from a large corpus and compare it against synonymy information available in several standard sources. We present the results of our methodology, both quantitatively and qualitatively, that indicate good quality synonymy information may be extracted automatically from large corpora using the proposed methodology.

语义信息的自动提取如果成功，将为资源少或资源差的语言提供低成本创建本体论资源的前景，从而为这些语言中的常识推理应用提供支持。在本文中，我们探索了使用两种互补技术从大型语料库中自动提取同义词信息:一种通用的广泛覆盖的语法分析器，用于生成语义信息位，以及使用自动语义消歧将其合成为同义词集。为了验证这样提取的同义词信息的质量，我们以英语为实验对象，因为已经有适当的语义资源可用。我们从一个大型语料库中挑选同义词信息，并将其与几个标准来源中的同义词信息进行比较。我们提出了我们的方法的结果，无论是定量的还是定性的，都表明使用所提出的方法可以从大型语料库中自动提取高质量的同义词信息。

引用次数: 0

OWL ontologies as a resource for discourse parsing OWL本体作为语篇解析的资源

LDV Forum

Pub Date : 2008-07-01 DOI: 10.21248/jlcl.23.2008.99

Maja Bärenfänger, M. Hilbert, Henning Lobin, H. Lüngen

In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with automatically assigning a text a tree structure in which discourse segments and rhetorical relations between them are marked, such as Concession. For identifying the combinable segments, declarative rules are employed, which describe linguistic and structural cues and constraints about possible combinations by referring to different XML annotation layers of the input text, and external knowledge bases such as a discourse marker lexicon, a lexico-semantic ontology (later to be combined with a domain ontology), and an ontology of rhetorical relations. In our text-technological environment, the obvious choice of formalism to represent such ontologies is OWL (Smith et al., 2004). In this paper, we describe two OWL ontologies and how they are consulted from the discourse parser to solve certain tasks within DP. The first ontology is a taxononomy of rhetorical relations which was developed in the project. The second one is an OWL version of GermaNet, the model of which we designed together with our project partners.

在由德国研究基金会DFG资助的SemDok(线性组织文本中的通用文档结构)项目中，正在开发用于复杂类型(例如科学文章)的话语解析器。基于修辞结构理论的语篇分析(以下简称DP) (Mann and Taboada, 2005)Marcu, 2000)处理自动分配文本一个树状结构，其中话语段和它们之间的修辞关系被标记，如让步。为了识别可组合的片段，使用了声明性规则，它通过引用输入文本的不同XML注释层和外部知识库(如话语标记词典、词典-语义本体(稍后将与领域本体结合)和修辞关系本体)来描述关于可能组合的语言和结构线索和约束。在我们的文本技术环境中，表示这种本体的形式主义的明显选择是OWL (Smith et al.， 2004)。在本文中，我们描述了两个OWL本体，以及如何从话语解析器中查询它们以解决DP中的某些任务。第一个本体论是在该项目中发展起来的修辞关系的分类。第二个是一个OWL版本的GermaNet，这个模型是我们和我们的项目伙伴一起设计的。

{"title":"OWL ontologies as a resource for discourse parsing","authors":"Maja Bärenfänger, M. Hilbert, Henning Lobin, H. Lüngen","doi":"10.21248/jlcl.23.2008.99","DOIUrl":"https://doi.org/10.21248/jlcl.23.2008.99","url":null,"abstract":"In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with automatically assigning a text a tree structure in which discourse segments and rhetorical relations between them are marked, such as Concession. For identifying the combinable segments, declarative rules are employed, which describe linguistic and structural cues and constraints about possible combinations by referring to different XML annotation layers of the input text, and external knowledge bases such as a discourse marker lexicon, a lexico-semantic ontology (later to be combined with a domain ontology), and an ontology of rhetorical relations. In our text-technological environment, the obvious choice of formalism to represent such ontologies is OWL (Smith et al., 2004). In this paper, we describe two OWL ontologies and how they are consulted from the discourse parser to solve certain tasks within DP. The first ontology is a taxononomy of rhetorical relations which was developed in the project. The second one is an OWL version of GermaNet, the model of which we designed together with our project partners.","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":"11 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126240261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Automatic Acquisition of Formal Concepts from Text 文本形式概念的自动习得

LDV Forum

Pub Date : 2008-07-01 DOI: 10.21248/jlcl.23.2008.102

Pablo Gamallo, J. Lopes, Alexandre Agustini

This paper describes an unsupervised method for extracting concepts from Part-Of-Speech annotated corpora. The method consists in building bidimensional clusters of both words and their lexico-syntactic contexts. The method is based on Formal Concept Analysis (FCA). Each generated cluster is defined as a formal concept with a set of words describing the extension of the concept and a set of contexts perceived as the intensional attributes (or properties) valid for all the words in the extension. The clustering process relies on two concept operations: abstraction and specification. The former allows us to build a more generic concept by intersecting the intensions of the merged concepts and making the union of their extensions. By contrast, specification makes the union of the intensions and intersects the extensions. The result is a concept lattice that describes the domain-specific ontology underlying the training corpus.

本文描述了一种从词性标注语料库中提取概念的无监督方法。该方法包括建立两个词及其词汇句法上下文的二维聚类。该方法基于形式概念分析(FCA)。每个生成的集群都被定义为一个形式概念，其中包含一组描述概念扩展的单词，以及一组被视为扩展中所有单词有效的内涵属性(或属性)的上下文。聚类过程依赖于两个概念操作:抽象和规范。前者允许我们通过交叉合并概念的内涵并将它们的扩展合并来构建一个更通用的概念。与此相反，规范使内涵结合，并使扩展相交。结果是一个概念格，它描述了训练语料库底层的特定领域本体。

引用次数: 6

A hybrid approach to resolve nominal anaphora 一种解决名义回指的混合方法

LDV Forum

Pub Date : 2008-07-01 DOI: 10.21248/jlcl.23.2008.101

Daniela Goecke, Maik Stührenberg, Tonio Wandmacher

In order to resolve nominal anaphora, especially definite description anaphora, various sources of information have to be taken into account. These range from morphosyntactic information to domain knowledge encoded in ontologies. As the acquisition of ontological knowledge is a timeconsuming task, existing resources often model only a small set of information. This leads to a knowledge gap that has to be closed: We present a hybrid approach that combines several knowledge sources in order to resolve definite descriptions.1

为了解决名义性回指，特别是明确描述性回指，必须综合考虑各种信息来源。这些信息的范围从形态句法信息到本体编码的领域知识。由于本体知识的获取是一项耗时的任务，现有的资源往往只能对一小部分信息进行建模。这就导致了一个必须被填补的知识鸿沟:我们提出了一个混合的方法，结合了几个知识来源，以解决明确的描述

引用次数: 3

An ontology of linguistic annotations 语言注释的本体论

LDV Forum

Pub Date : 2008-07-01 DOI: 10.21248/jlcl.23.2008.98

C. Chiarcos

This paper describes development and design of an ontology of linguistic annotations, primarily word classes and morphosyntactic features, based on existing standardization approaches (e.g. EAGLES), a set of annotation schemes (e.g. for German, STTS and morphological annotations), and existing terminological resources (e.g. GOLD). The ontology is intended to be a platform for terminological integration, integrated representation and ontology-based search across existing linguistic resources with terminologically heterogeneous annotations. Further, it can be applied to augment the semantic analysis of a given text with an ontological interpretation of its morphosyntactic analysis.

本文描述了基于现有标准化方法(如EAGLES)、一组注释方案(如德语、STTS和形态注释)和现有术语资源(如GOLD)的语言注释本体的开发和设计，主要是词类和形态句法特征。本体论旨在成为一个平台，用于跨现有语言资源的术语集成、集成表示和基于本体论的搜索，并带有术语异构注释。此外，它可以应用于通过形态句法分析的本体论解释来增强给定文本的语义分析。

引用次数: 58

Integration Languages for Data-Driven Approaches to Ontology Population and Maintenance 用于数据驱动的本体填充和维护方法的集成语言

LDV Forum

Pub Date : 2007-07-01 DOI: 10.21248/jlcl.22.2007.94

Eduardo Torres Schumann, Uwe Mönnich, K. Schulz

Populating an ontology with a vast amount of data and ensuring the quality of the integration process by means of human supervision seem to be mutually exclusive goals that nevertheless arise as requirements when building practical applications. In our case, we were confronted with the practical problem of populating the EFGT Net, a large-scale ontology that enables thematic reasoning in dierent NLP applications, out of already existing and partly very large data sources, but on condition of not putting the quality of the resource at risk. We present here our particular solution to this problem, which combines, in a single tool, on one hand an integration language capable of generating new entries for the ontology out of structured data with, on the other hand, a visualization of conflicting generated entries with online ontology editing facilities. This approach appears to enable ecient human supervision of the population process in an interactive way and to be also useful for maintenance tasks.

用大量数据填充本体和通过人工监督确保集成过程的质量似乎是相互排斥的目标，然而在构建实际应用程序时却作为需求出现。在我们的案例中，我们面临着填充EFGT网络的实际问题，这是一个大规模的本体，可以在不同的NLP应用程序中使用已经存在的和部分非常大的数据源进行主题推理，但前提是不危及资源的质量。我们在这里提出了我们对这个问题的特殊解决方案，它在一个工具中，一方面结合了能够从结构化数据中为本体生成新条目的集成语言，另一方面结合了具有在线本体编辑功能的冲突生成条目的可视化。这种方法似乎能够以一种互动的方式对人口过程进行有效的人类监督，并且对维护任务也很有用。

引用次数: 0

A Hybrid Model for Chinese Word Segmentation 中文分词的混合模型

LDV Forum

Pub Date : 2007-07-01 DOI: 10.21248/jlcl.22.2007.90

Xiaofei Lu

This paper describes a hybrid model that combines machine learning with linguistic and statistical heuristics for integrating unknown word identification with Chinese word segmentation. The model consists of two major components: a tagging component that annotates each character in a Chinese sentence with a position-of-character (POC) tag that indicates its position in a word, and a merging component that transforms a POC-tagged character sequence into a word-segmented sentence. The tagging component uses a support vector machine (Vapnik, 1995) based tagger to produce an initial tagging of the text and a transformation-based tagger (Brill, 1995) to improve the initial tagging. In addition to the POC tags assigned to the characters, the merging component incorporates a number of linguistic and statistical heuristics to detect words with regular internal structures, recognize long words, and filter non-words. Experiments show that, without resorting to a separate unknown word identification mechanism, the model achieves an F-score of 95.0% for word segmentation and a competitive recall of 74.8% for unknown word identification.

本文描述了一种将机器学习与语言学和统计启发式相结合的混合模型，用于将未知词识别与中文分词相结合。该模型由两个主要组件组成:一个标记组件，它用字符位置(POC)标签标注中文句子中的每个字符，表明其在单词中的位置;一个合并组件，将POC标记的字符序列转换为分词句。标注组件使用基于支持向量机(Vapnik, 1995)的标注器对文本进行初始标注，使用基于变换的标注器(Brill, 1995)对初始标注进行改进。除了分配给字符的POC标签外，合并组件还结合了许多语言和统计启发式方法来检测具有规则内部结构的单词、识别长单词和过滤非单词。实验表明，在不使用单独的未知词识别机制的情况下，该模型在分词方面的f值达到95.0%，在未知词识别方面的竞争召回率达到74.8%。

引用次数: 2

Chatbots: Are they Really Useful? 聊天机器人真的有用吗?

LDV Forum

Pub Date : 2007-07-01 DOI: 10.21248/jlcl.22.2007.88

Bayan Abu Shawar, E. Atwell

Chatbots are computer programs that interact with users using natural lan- guages. This technology started in the 1960’s; the aim was to see if chatbot systems could fool users that they were real humans. However, chatbot sys- tems are not only built to mimic human conversation, and entertain users. In this paper, we investigate other applications where chatbots could be useful such as education, information retrival, business, and e-commerce. A range of chatbots with useful applications, including several based on the ALICE/AIML architecture, are presented in this paper.

聊天机器人是使用自然语言与用户交互的计算机程序。这项技术始于20世纪60年代;目的是看看聊天机器人系统是否能欺骗用户，让他们以为自己是真人。然而，聊天机器人系统不仅仅是为了模仿人类对话和娱乐用户而构建的。在本文中，我们研究了聊天机器人可能有用的其他应用，如教育、信息检索、商业和电子商务。本文介绍了一系列具有有用应用程序的聊天机器人，包括基于ALICE/AIML架构的几个聊天机器人。

引用次数: 418

Automatic Ontology Extension: Resolving Inconsistencies 自动本体扩展:解决不一致

LDV Forum

Pub Date : 2007-07-01 DOI: 10.21248/jlcl.22.2007.93

Ekaterina Ovchinnikova, Kai-Uwe Kühnberger

Ontologies are widely used in text technology and artificial intelligence. The need to develop large ontologies for real-life applications provokes researchers to automatize ontology extension procedures. Automatic updates without the control of a human expert can generate potential conflicts between original and new knowledge resulting in inconsistencies occurring in the ontology. We propose an algorithm that models the process of the adaptation of an ontology to new information. 1 Automatic Ontology Extension There is an increasing interest in applying ontological knowledge in text technologies and artificial intelligence. Since the manual development of large ontologies proved to be a time-consuming task many current investigations are devoted to automatic ontology learning methods (see [6] for an overview). Several formalisms have been proposed to represent ontological knowledge. Probably the most important one of the existing markup languages for ontology design is the Web Ontology Language (OWL) based on the logical formalism called Description Logics (DL) [1]. In particular, description logics were designed for the representation of terminological knowledge and reasoning processes. Although most of the tools extracting or extending ontologies automatically output knowledge in the OWL-format, they usually use only a small subset of DL. The core ontologies generated in practice usually contain the subsumption relation defined on concepts (taxonomy) and general relations (such as part-of and others). At present complex ontologies making use of the whole expressive power and advances of the various versions of DLs can be achieved only manually or semi-automatically. However, several approaches appeared recently tending not only to learn taxonomic and general relations but also state which concepts in the knowledge base are equivalent or disjoint [5]. In the present paper, we concentrate on these approaches. We will consider only terminological knowledge (called TBox in DL) leaving the information about assertions in the knowledge base (called ABox in DL) for further investigations. 3 See the documentation at http://www.w3.org/TR/owl-features/

本体在文本技术和人工智能中有着广泛的应用。为实际应用开发大型本体的需求促使研究人员将本体扩展过程自动化。没有人类专家控制的自动更新可能会在原始知识和新知识之间产生潜在的冲突，从而导致本体中出现不一致。我们提出了一种算法来模拟本体对新信息的适应过程。在文本技术和人工智能中应用本体知识的兴趣越来越大。由于大型本体的手工开发被证明是一项耗时的任务，许多当前的研究都致力于自动本体学习方法(参见[6]的概述)。已经提出了几种表示本体论知识的形式。在现有的用于本体设计的标记语言中，最重要的可能是基于称为描述逻辑(DL)的逻辑形式化的Web本体语言(OWL)[1]。特别地，描述逻辑被设计用来表示术语知识和推理过程。尽管大多数提取或扩展本体的工具都以owl格式自动输出知识，但它们通常只使用DL的一小部分。实践中生成的核心本体通常包含定义在概念(分类法)上的包容关系和一般关系(如part-of等)。目前，利用各种版本的dl的整体表达能力和进步的复杂本体只能手动或半自动地实现。然而，最近出现的几种方法不仅倾向于学习分类和一般关系，而且还倾向于说明知识库中哪些概念是等价的或不相交的[5]。在本文中，我们集中讨论这些方法。我们将只考虑术语知识(在DL中称为TBox)，将关于断言的信息留在知识库(在DL中称为ABox)中以供进一步研究。3参考http://www.w3.org/TR/owl-features/的文档

{"title":"Automatic Ontology Extension: Resolving Inconsistencies","authors":"Ekaterina Ovchinnikova, Kai-Uwe Kühnberger","doi":"10.21248/jlcl.22.2007.93","DOIUrl":"https://doi.org/10.21248/jlcl.22.2007.93","url":null,"abstract":"Ontologies are widely used in text technology and artificial intelligence. The need to develop large ontologies for real-life applications provokes researchers to automatize ontology extension procedures. Automatic updates without the control of a human expert can generate potential conflicts between original and new knowledge resulting in inconsistencies occurring in the ontology. We propose an algorithm that models the process of the adaptation of an ontology to new information. 1 Automatic Ontology Extension There is an increasing interest in applying ontological knowledge in text technologies and artificial intelligence. Since the manual development of large ontologies proved to be a time-consuming task many current investigations are devoted to automatic ontology learning methods (see [6] for an overview). Several formalisms have been proposed to represent ontological knowledge. Probably the most important one of the existing markup languages for ontology design is the Web Ontology Language (OWL) based on the logical formalism called Description Logics (DL) [1]. In particular, description logics were designed for the representation of terminological knowledge and reasoning processes. Although most of the tools extracting or extending ontologies automatically output knowledge in the OWL-format, they usually use only a small subset of DL. The core ontologies generated in practice usually contain the subsumption relation defined on concepts (taxonomy) and general relations (such as part-of and others). At present complex ontologies making use of the whole expressive power and advances of the various versions of DLs can be achieved only manually or semi-automatically. However, several approaches appeared recently tending not only to learn taxonomic and general relations but also state which concepts in the knowledge base are equivalent or disjoint [5]. In the present paper, we concentrate on these approaches. We will consider only terminological knowledge (called TBox in DL) leaving the information about assertions in the knowledge base (called ABox in DL) for further investigations. 3 See the documentation at http://www.w3.org/TR/owl-features/","PeriodicalId":346957,"journal":{"name":"LDV Forum","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126754417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

LDV Forum

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀