首页 > 最新文献

Terminology最新文献

英文 中文
‘Arm’s length’ phraseology? "手臂长度"的措辞?
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-06-30 DOI: 10.1075/term.21028.roj
José Luis Rojas Díaz
In the last decades, the study of phraseology within general and specialized lexicographic resources has been of interest to scholars. However, phraseology has not been studied in language for specific purposes (LSP) as much as in language for general purposes (LGP). Therefore, this study (i) offers an overview of the definitions regarding LSP phraseology, (ii) provides a series of linguistic analyses of specialized phraseological units (SPUs) extracted from a specialized bilingual dictionary, and (iii) draws a comparative line between LGP and LSP phraseology. To do so, 11,086 entries were extracted to build the analysis database. This study provides 1,054 morphosyntactic and 4,369 semantic patterns, a definition and a taxonomy of SPUs based on the data analysis and revision of LGP phraseology notions, and a hybrid lexicographic indexation method for SPUs. The contributions of this paper answer the question ‘what is a SPU?’; while highlighting similarities and differences with LGP phraseology.
在过去的几十年里,在一般和专业词典资源中的短语学研究已经引起了学者们的兴趣。然而,短语学在特殊用途语言(LSP)中的研究并不像在通用用途语言(LGP)中的研究那么多。因此,本研究(i)概述了LSP用语的定义,(ii)对从专业双语词典中提取的专业用语单位(spu)进行了一系列语言学分析,(iii)在LGP和LSP用语之间画了一条比较线。为此,提取了11086个条目来构建分析数据库。本研究提供了1054种形态句法模式和4369种语义模式,基于数据分析和对LGP短语概念的修订,提出了spu的定义和分类,并提出了spu的混合词典索引方法。本文的贡献回答了“什么是SPU”这个问题;同时强调与LGP用语的异同。
{"title":"‘Arm’s length’ phraseology?","authors":"José Luis Rojas Díaz","doi":"10.1075/term.21028.roj","DOIUrl":"https://doi.org/10.1075/term.21028.roj","url":null,"abstract":"\u0000In the last decades, the study of phraseology within general and specialized lexicographic resources has been of interest to scholars. However, phraseology has not been studied in language for specific purposes (LSP) as much as in language for general purposes (LGP). Therefore, this study (i) offers an overview of the definitions regarding LSP phraseology, (ii) provides a series of linguistic analyses of specialized phraseological units (SPUs) extracted from a specialized bilingual dictionary, and (iii) draws a comparative line between LGP and LSP phraseology. To do so, 11,086 entries were extracted to build the analysis database. This study provides 1,054 morphosyntactic and 4,369 semantic patterns, a definition and a taxonomy of SPUs based on the data analysis and revision of LGP phraseology notions, and a hybrid lexicographic indexation method for SPUs. The contributions of this paper answer the question ‘what is a SPU?’; while highlighting similarities and differences with LGP phraseology.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48634736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic medical term extraction from Vietnamese clinical texts 从越南临床文本中自动提取医学术语
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-06-09 DOI: 10.1075/term.20037.vo
C. Vo, T. Cao, Ngoc Truong, T. Ngo, Dai Bui
In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese medical term discovery and extraction.
在本文中,我们提出了第一种从临床文本中自动发现和提取越南语医学术语的方法。该方法将基于我们定义的开放模式的语言过滤与嵌套术语提取和使用C值的统计排名相结合。它不需要注释语料库、外部数据资源、参数设置或术语长度限制。除了它在处理越南医学术语方面的特殊性外,另一个新颖之处是它使用Pointwise Mutual Information来拆分嵌套术语,并使用析取接受条件来提取它们。在真实的越南电子病历上进行评估,其准确率约为74%,召回率约为92%,并在小数据集上被证明是稳定有效的。在不使用注释语料库和外部数据资源的情况下,它优于以往同类作品。我们的方法和实证评价分析可以为越南医学术语发现和提取的进一步研究和发展奠定基础。
{"title":"Automatic medical term extraction from Vietnamese clinical texts","authors":"C. Vo, T. Cao, Ngoc Truong, T. Ngo, Dai Bui","doi":"10.1075/term.20037.vo","DOIUrl":"https://doi.org/10.1075/term.20037.vo","url":null,"abstract":"\u0000 In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from\u0000 clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and\u0000 statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter\u0000 settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses\u0000 Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real\u0000 Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective\u0000 with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data\u0000 resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese\u0000 medical term discovery and extraction.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49564308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interlingual terminological asymmetry as one of the aspects of studying foreign languages 语际术语不对称是外语学习的一个方面
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-05-31 DOI: 10.1075/term.00065.kar
Tetyana Karlova
The purpose of the study is to explore interlingual terminological asymmetry from the cognitive-onomasiological standpoints. False synonymy of adjectives in anatomical terminology of Latin, Ukrainian, Russian, and English have been analyzed and interpreted as factors causing interlingual terminological asymmetry.In Latin anatomical terminology, there is a significant number of nominative units with similar meanings. They often have one equivalent in other (modern) languages or can be simply confused as a result of misunderstanding. It creates difficulties in the process of interlingual terminological communication. Despite the substrate nature of the Latin anatomical terminology, national terminological systems undergo different types of correlations in their functioning. The author assumes such correlations are related to the concepts of “terminological asymmetry” (lack of interlingual interchangeability of terms) and “quasi-synonymous effect” (the loss of cognitive-differential function of the term).Attention is also paid to the preparation of a theoretical basis for creating a special thesaurus to help speakers of Ukrainian study medical terminology in Latin and English.
本研究的目的是从认知经济学的角度探讨语际术语的不对称性。拉丁语、乌克兰语、俄语和英语解剖学术语中形容词的假同义现象被分析和解释为造成语际术语不对称的因素。在拉丁解剖学术语中,有相当数量的主格单位具有相似的含义。它们通常在其他(现代)语言中有一个等价物,或者由于误解而被简单地混淆。它给语际术语交流过程带来了困难。尽管拉丁解剖学术语具有基础性质,但国家术语系统在其功能中经历了不同类型的相关性。作者认为这种相关性与“术语不对称”(术语缺乏语际互换性)和“准同义词效应”(术语认知差异功能的丧失)的概念有关拉丁语和英语。
{"title":"Interlingual terminological asymmetry as one of the aspects of studying foreign languages","authors":"Tetyana Karlova","doi":"10.1075/term.00065.kar","DOIUrl":"https://doi.org/10.1075/term.00065.kar","url":null,"abstract":"\u0000The purpose of the study is to explore interlingual terminological asymmetry from the cognitive-onomasiological standpoints. False synonymy of adjectives in anatomical terminology of Latin, Ukrainian, Russian, and English have been analyzed and interpreted as factors causing interlingual terminological asymmetry.\u0000In Latin anatomical terminology, there is a significant number of nominative units with similar meanings. They often have one equivalent in other (modern) languages or can be simply confused as a result of misunderstanding. It creates difficulties in the process of interlingual terminological communication. Despite the substrate nature of the Latin anatomical terminology, national terminological systems undergo different types of correlations in their functioning. The author assumes such correlations are related to the concepts of “terminological asymmetry” (lack of interlingual interchangeability of terms) and “quasi-synonymous effect” (the loss of cognitive-differential function of the term).\u0000Attention is also paid to the preparation of a theoretical basis for creating a special thesaurus to help speakers of Ukrainian study medical terminology in Latin and English.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47792480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Repérage automatisé de l’hyponymie dans des corpus spécialisés en français à l’aide de Sketch Engine 使用Sketch引擎在专门的法语语料库中自动识别hyponymia
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-05-12 DOI: 10.1075/term.20044.san
Antonio San Martín, Catherine Trekker, Pilar León-Araúz
Hyponymy is an essential semantic relation in terminology, as it represents the hierarchical organization of concepts. Much has been written about hyponymy extraction. However, terminologists working with French do not currently have user-friendly and freely available tools to automatically extract hyper-hyponymic pairs from their own corpora. This paper presents the most recent version of the ESSG (EcoLexicon Semantic Sketch Grammar) methodology, a knowledge-pattern-based approach that enables Sketch Engine to extract semantic relations. This methodology is applied to the development and evaluation of the ESSG-fr, a semantic sketch grammar for hyponymy extraction in French. The evaluation results show that the ESSG-fr is a reliable domain-independent tool for terminologists wishing to extract simple hyper-hyponymic pairs and the corresponding concordances from specialized corpora.
同义词是术语中一种重要的语义关系,因为它代表了概念的层次组织。关于上义词提取,已经有很多文章了。然而,从事法语工作的术语学家目前还没有用户友好且免费的工具来从自己的语料库中自动提取超同义词对。本文介绍了ESSG(EcoLexicon Semantic Sketch Grammar)方法的最新版本,这是一种基于知识模式的方法,使Sketch Engine能够提取语义关系。该方法被应用于ESSG-fr的开发和评估,ESSG-fr是一种用于法语中上义词提取的语义草图语法。评估结果表明,ESSG-fr是一个可靠的领域独立工具,适用于希望从专业语料库中提取简单的超同义词对和相应的一致性的术语学家。
{"title":"Repérage automatisé de l’hyponymie dans des corpus spécialisés en français à l’aide de Sketch Engine","authors":"Antonio San Martín, Catherine Trekker, Pilar León-Araúz","doi":"10.1075/term.20044.san","DOIUrl":"https://doi.org/10.1075/term.20044.san","url":null,"abstract":"\u0000Hyponymy is an essential semantic relation in terminology, as it represents the hierarchical organization of concepts. Much has been written about hyponymy extraction. However, terminologists working with French do not currently have user-friendly and freely available tools to automatically extract hyper-hyponymic pairs from their own corpora. This paper presents the most recent version of the ESSG (EcoLexicon Semantic Sketch Grammar) methodology, a knowledge-pattern-based approach that enables Sketch Engine to extract semantic relations. This methodology is applied to the development and evaluation of the ESSG-fr, a semantic sketch grammar for hyponymy extraction in French. The evaluation results show that the ESSG-fr is a reliable domain-independent tool for terminologists wishing to extract simple hyper-hyponymic pairs and the corresponding concordances from specialized corpora.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":"1 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42705882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Corpus-based bilingual terminology extraction in the power engineering domain 基于语料库的电力工程领域双语术语提取
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-04-07 DOI: 10.1075/term.20038.iva
Tanja Ivanović, R. Stanković, B. Todorovic, Cvetana Krstev
This paper presents the resources and tools used to extract and evaluate bilingual, English-Serbian terminology in the power engineering domain. The resources consist of existing general and domain lexica, and a domain parallel corpus; tools include term extractors for both languages and a tool for aligning the segments belonging to corpus sentences. The system was tested by varying a match function that establishes the presence of an extracted term in an aligned segment (a chunk), ranging from very loose to strict. The evaluation of results showed that the precision of English term extraction was 92%, Serbian term extraction 86%, while the precision of bilingual pair extraction was 72% based on the strictest match function. The result of extraction was 2,684 correct bilingual pairs that enhanced the terminology database and can further be used to support the search of the power engineering aligned collection stored in a digital library.
本文介绍了用于提取和评估电力工程领域中英塞双语术语的资源和工具。资源包括现有的一般词汇和领域词汇,以及一个领域平行语料库;工具包括用于两种语言的术语提取器和用于对齐属于语料库句子的片段的工具。通过改变匹配函数来测试该系统,该匹配函数确定在对齐的段(块)中提取的术语的存在,从非常松散到严格。结果评估表明,基于最严格匹配函数,英语词条提取的准确率为92%,塞尔维亚语词条提取的精度为86%,而双语配对提取的准确度为72%。提取的结果是2684对正确的双语对,这增强了术语数据库,并可进一步用于支持搜索存储在数字图书馆中的与电力工程相关的集合。
{"title":"Corpus-based bilingual terminology extraction in the power engineering domain","authors":"Tanja Ivanović, R. Stanković, B. Todorovic, Cvetana Krstev","doi":"10.1075/term.20038.iva","DOIUrl":"https://doi.org/10.1075/term.20038.iva","url":null,"abstract":"\u0000 This paper presents the resources and tools used to extract and evaluate bilingual, English-Serbian terminology in\u0000 the power engineering domain. The resources consist of existing general and domain lexica, and a domain parallel corpus; tools\u0000 include term extractors for both languages and a tool for aligning the segments belonging to corpus sentences. The system was\u0000 tested by varying a match function that establishes the presence of an extracted term in an aligned segment (a chunk), ranging\u0000 from very loose to strict. The evaluation of results showed that the precision of English term extraction was 92%, Serbian term\u0000 extraction 86%, while the precision of bilingual pair extraction was 72% based on the strictest match function. The result of\u0000 extraction was 2,684 correct bilingual pairs that enhanced the terminology database and can further be used to support the search\u0000 of the power engineering aligned collection stored in a digital library.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48623555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
La représentation de la polysémie et des termes complexes de type locution faible dans une base de données terminologique 在术语数据库中表示多义和复杂弱词术语
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-03-01 DOI: 10.1075/term.21004.fra
Paolo Frassi
We propose to identify, for the French language, the senses and subsenses of travail in the field of international commerce. We also intend to present the main weak idioms containing this form, from a corpus that has been constituted ex novo in the framework of the DIACOM-fr project (Department of Foreign Languages, University of Verona), part of the Excellence Project “Le Digital Humanities applicate alle lingue e letterature straniere” (“Digital Humanities applied to foreign modern languages and literatures”). The senses and subsenses as well as the weak idioms, classified on the basis of a number of semantic labels, will be represented in a draft of terminological network.
我们建议为法语确定国际商业领域中痛苦的含义和次含义。我们还打算展示包含这种形式的主要弱习语,这些习语来自DIACOM-fr项目(维罗纳大学外语系)框架下的一个语料库,该项目是“数字人文学科应用于外国现代语言和文学”卓越项目的一部分。在术语网络草案中,将根据若干语义标签对词义、次词义和弱习语进行分类。
{"title":"La représentation de la polysémie et des termes complexes de type locution faible dans une base de données terminologique","authors":"Paolo Frassi","doi":"10.1075/term.21004.fra","DOIUrl":"https://doi.org/10.1075/term.21004.fra","url":null,"abstract":"We propose to identify, for the French language, the senses and subsenses of travail in the field of international commerce. We also intend to present the main weak idioms containing this form, from a corpus that has been constituted ex novo in the framework of the DIACOM-fr project (Department of Foreign Languages, University of Verona), part of the Excellence Project “Le Digital Humanities applicate alle lingue e letterature straniere” (“Digital Humanities applied to foreign modern languages and literatures”). The senses and subsenses as well as the weak idioms, classified on the basis of a number of semantic labels, will be represented in a draft of terminological network.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":"19 28","pages":"103-128"},"PeriodicalIF":0.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138513739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Framing karstology 框架岩溶学
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-01-27 DOI: 10.1075/term.21005.vin
Špela Vintar, Matej Martinc
We describe the creation of a knowledge base in the field of karstology using the frame-based approach. Apart from providing a new multilingual resource using manually annotated definitions as the source of structured information, the main focus is on exploring text mining methods to identify targeted knowledge structures in specialised corpora. The first stage of this process is the design of a domain model and its implementation in a definition annotation task. Once annotation is completed, an analysis of typical co-occurrence patterns between semantic categories and the relations describing them allows us to discern ideal definition templates. We demonstrate that such templates contribute to a more comprehensive and structured representations of concepts, but also help us design targeted text mining experiments to retrieve new semantic relations from text. Two such experiments are presented, the first using intersections of word embeddings to identify words expressing a specific semantic relation, and the second using the embedding of the semantic relation to extract multiword units which contain the target relation. Results suggest that the proposed methods are promising for capturing the semantic properties of relations in frame-based knowledge modelling.
我们使用基于框架的方法描述了喀斯特学领域知识库的创建。除了提供一个新的多语言资源,使用手动注释的定义作为结构化信息的来源外,主要的重点是探索文本挖掘方法,以识别专业语料库中的目标知识结构。该过程的第一阶段是领域模型的设计及其在定义注释任务中的实现。完成注释后,对语义类别和描述它们的关系之间的典型共现模式的分析使我们能够辨别理想的定义模板。我们证明了这样的模板有助于更全面和结构化的概念表示,但也帮助我们设计有针对性的文本挖掘实验,以从文本中检索新的语义关系。提出了两个这样的实验,第一个是使用词嵌入的交集来识别表达特定语义关系的词,第二个是使用语义关系的嵌入来提取包含目标关系的多词单元。结果表明,所提出的方法有望在基于框架的知识建模中捕获关系的语义属性。
{"title":"Framing karstology","authors":"Špela Vintar, Matej Martinc","doi":"10.1075/term.21005.vin","DOIUrl":"https://doi.org/10.1075/term.21005.vin","url":null,"abstract":"We describe the creation of a knowledge base in the field of karstology using the frame-based approach. Apart from providing a new multilingual resource using manually annotated definitions as the source of structured information, the main focus is on exploring text mining methods to identify targeted knowledge structures in specialised corpora. The first stage of this process is the design of a domain model and its implementation in a definition annotation task. Once annotation is completed, an analysis of typical co-occurrence patterns between semantic categories and the relations describing them allows us to discern ideal definition templates. We demonstrate that such templates contribute to a more comprehensive and structured representations of concepts, but also help us design targeted text mining experiments to retrieve new semantic relations from text. Two such experiments are presented, the first using intersections of word embeddings to identify words expressing a specific semantic relation, and the second using the embedding of the semantic relation to extract multiword units which contain the target relation. Results suggest that the proposed methods are promising for capturing the semantic properties of relations in frame-based knowledge modelling.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45662117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tagging terms in text 标记文本中的术语
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-01-10 DOI: 10.1075/term.21010.rig
Ayla Rigouts Terryn, Veronique Hoste, Els Lefever
As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.
与自然语言处理中的许多任务一样,自动术语提取(ATE)越来越多地被视为机器学习问题。到目前为止,大多数用于ATE的机器学习方法都大致遵循传统的混合方法,首先提取唯一候选术语列表,然后根据预测的有效术语概率对这些候选术语进行分类。然而,随着神经网络和词嵌入的兴起,ATE的下一个发展方向可能是顺序方法,即在其原始上下文中对每个标记的每次出现进行分类。为了测试这些方法对ATE的有效性,我们开发、评估和比较了两种顺序方法:一种基于特征的条件随机场分类器和一种基于嵌入的递归神经网络。通过对传统方法的机器学习解释进行了额外的比较。所有系统都在多种语言和领域的相同数据上进行了训练和评估,以确定各自的优势和劣势。序列方法被证明是有效的ATE方法,神经网络甚至优于传统方法。有趣的是,多种方法的组合可以超越单独的所有方法,展示了在ATE中推动最先进技术的新方法。
{"title":"Tagging terms in text","authors":"Ayla Rigouts Terryn, Veronique Hoste, Els Lefever","doi":"10.1075/term.21010.rig","DOIUrl":"https://doi.org/10.1075/term.21010.rig","url":null,"abstract":"\u0000As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47276639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variation in Spanish accounting terminology 西班牙会计术语的变化
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-12-17 DOI: 10.1075/term.20039.gar
Marta García González
The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish companies, both in the Spanish originals and in their English translations.
本文以三种不同语料库为基础,讨论了对西班牙语会计术语分析的主要结果。分析的目的是衡量西班牙会计术语变化的程度,并评估会计标准和公司财务报表在翻译会计文本时抽取术语的适宜性。结果表明,在国际会计准则中,术语差异约为25%,在西班牙公司的财务报表中,无论是在西班牙语原件还是在其英语翻译中,会计术语的使用都相当缺乏一致性。
{"title":"Variation in Spanish accounting terminology","authors":"Marta García González","doi":"10.1075/term.20039.gar","DOIUrl":"https://doi.org/10.1075/term.20039.gar","url":null,"abstract":"\u0000 The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation\u0000 of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at\u0000 assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the\u0000 translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting\u0000 standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish\u0000 companies, both in the Spanish originals and in their English translations.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47313917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The phraseology of wine and olive oil tasting notes 葡萄酒和橄榄油品尝笔记的术语
IF 0.8 4区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-12-02 DOI: 10.1075/term.20035.lop
Belén López Arroyo, Lucía Sanz Valdivieso
Specialized genres are bound to the communicative context of their discourse community. However, certain genres extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for scholars as well as technical writers when writing tasting notes.
专门的语篇与语篇共同体的交际语境密切相关。然而,某些流派超越了一个特定的领域,在不同的语言层面上保持不变。这似乎是葡萄酒和橄榄油品尝笔记的情况,因为两者都分析和评估感官描述。本研究旨在从语义层面描述和比较葡萄酒和橄榄油品尝笔记的词块,以表明同一类型在不同领域是否存在差异;我们不仅要描述、分类和比较词块,还要确定这些知识在两个领域的结构和解释方式。我们将在由三位不同作家撰写的两种类型的英语品尝笔记语料库中测试我们的方法:专业人士、业余爱好者和酿酒厂/酿酒厂。我们的研究结果将对学者和技术作家在撰写品尝笔记时有用。
{"title":"The phraseology of wine and olive oil tasting notes","authors":"Belén López Arroyo, Lucía Sanz Valdivieso","doi":"10.1075/term.20035.lop","DOIUrl":"https://doi.org/10.1075/term.20035.lop","url":null,"abstract":"\u0000 Specialized genres are bound to the communicative context of their discourse community. However, certain genres\u0000 extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of\u0000 wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and\u0000 comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre\u0000 across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is\u0000 structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from\u0000 both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for\u0000 scholars as well as technical writers when writing tasting notes.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44380022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Terminology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1