As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.
{"title":"Tagging terms in text","authors":"Ayla Rigouts Terryn, Veronique Hoste, Els Lefever","doi":"10.1075/term.21010.rig","DOIUrl":"https://doi.org/10.1075/term.21010.rig","url":null,"abstract":"\u0000As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47276639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish companies, both in the Spanish originals and in their English translations.
{"title":"Variation in Spanish accounting terminology","authors":"Marta García González","doi":"10.1075/term.20039.gar","DOIUrl":"https://doi.org/10.1075/term.20039.gar","url":null,"abstract":"\u0000 The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation\u0000 of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at\u0000 assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the\u0000 translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting\u0000 standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish\u0000 companies, both in the Spanish originals and in their English translations.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47313917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Specialized genres are bound to the communicative context of their discourse community. However, certain genres extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for scholars as well as technical writers when writing tasting notes.
{"title":"The phraseology of wine and olive oil tasting notes","authors":"Belén López Arroyo, Lucía Sanz Valdivieso","doi":"10.1075/term.20035.lop","DOIUrl":"https://doi.org/10.1075/term.20035.lop","url":null,"abstract":"\u0000 Specialized genres are bound to the communicative context of their discourse community. However, certain genres\u0000 extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of\u0000 wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and\u0000 comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre\u0000 across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is\u0000 structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from\u0000 both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for\u0000 scholars as well as technical writers when writing tasting notes.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44380022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.
{"title":"Utilising heterogeneous language resources for term extraction in maritime domains","authors":"Gisle Andersen","doi":"10.1075/term.20024.and","DOIUrl":"https://doi.org/10.1075/term.20024.and","url":null,"abstract":"\u0000 The development of terminologies for domains where these are lacking is a time-consuming and costly task. This\u0000 article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding,\u0000 utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important\u0000 player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official\u0000 maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains.\u0000 The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term\u0000 extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as\u0000 bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43120552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper analyzes nested-abbreviated terms from a linguistic perspective by describing their morphological, syntactic, and semantic features for terminology purposes. Nested-abbreviated terms can be considered as abbreviated forms, either initialisms or acronyms, which have within their meaning another abbreviated term. To carry out the analysis, 433 nested-abbreviated terms were extracted from two specialized dictionaries in English. Data analysis showed that, from the morphological and semantic perspective, nested-abbreviated terms behave like typical abbreviations. Important differences were found from a syntactic standpoint where nested abbreviated terms behave as premodifiers in the noun phrase (NP) in 98.93% of the cases. As this is the first time nested-abbreviated terms are studied, they were not only described but also analyzed and defined. Although the percentage of nested-abbreviated terms obtained from the dictionaries is relatively low, less than 1% of total abbreviations, it was found that it is highly relevant to study this growing phenomenon in specialized languages for terminology extraction, as well as for other purposes.
{"title":"Identification and characterization of nested-abbreviated terms in scientific discourse","authors":"Natalia Rivas, Gabriel Quiroz, John Jairo Giraldo","doi":"10.1075/term.20022.riv","DOIUrl":"https://doi.org/10.1075/term.20022.riv","url":null,"abstract":"\u0000 This paper analyzes nested-abbreviated terms from a linguistic perspective by describing their morphological,\u0000 syntactic, and semantic features for terminology purposes. Nested-abbreviated terms can be considered as abbreviated\u0000 forms, either initialisms or acronyms, which have within their meaning another abbreviated term. To carry out the\u0000 analysis, 433 nested-abbreviated terms were extracted from two specialized dictionaries in English. Data analysis showed that,\u0000 from the morphological and semantic perspective, nested-abbreviated terms behave like typical abbreviations. Important differences\u0000 were found from a syntactic standpoint where nested abbreviated terms behave as premodifiers in the noun phrase (NP) in 98.93% of\u0000 the cases. As this is the first time nested-abbreviated terms are studied, they were not only described but also analyzed and\u0000 defined. Although the percentage of nested-abbreviated terms obtained from the dictionaries is relatively low, less than 1% of\u0000 total abbreviations, it was found that it is highly relevant to study this growing phenomenon in specialized languages for\u0000 terminology extraction, as well as for other purposes.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43558748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the system evaluation issue for commercial term extraction tools from the users’ perspective. We first revisit the gold standard approach commonly practised among researchers, and discuss the challenges it may pose on end users, taking translators as a typical example. Considering the very different motivations and needs of users and researchers, a user-driven approach is proposed as a variation and alternative to the gold standard approach to allow users to assess and understand the performance of commercial tools more objectively. Its feasibility and usefulness are demonstrated by deploying a benchmarking dataset of English-Chinese financial terms, produced by multiple annotators, in a case study with SDL MultiTerm Extract. The results also provide insight for future development of term extractors designed for translators, which will hopefully generate more accurate candidates, offer more customised features, enable better user experience, and enjoy wider popularity as a computer-aided translation tool.
{"title":"User-driven assessment of commercial term extractors","authors":"O. Kwong","doi":"10.1075/term.20032.kwo","DOIUrl":"https://doi.org/10.1075/term.20032.kwo","url":null,"abstract":"\u0000 In this paper, we address the system evaluation issue for commercial term extraction tools from the users’\u0000 perspective. We first revisit the gold standard approach commonly practised among researchers, and discuss the challenges it may\u0000 pose on end users, taking translators as a typical example. Considering the very different motivations and needs of users and\u0000 researchers, a user-driven approach is proposed as a variation and alternative to the gold standard approach to allow users to\u0000 assess and understand the performance of commercial tools more objectively. Its feasibility and usefulness are demonstrated by\u0000 deploying a benchmarking dataset of English-Chinese financial terms, produced by multiple annotators, in a case study with SDL\u0000 MultiTerm Extract. The results also provide insight for future development of term extractors designed for translators, which will\u0000 hopefully generate more accurate candidates, offer more customised features, enable better user experience, and enjoy wider\u0000 popularity as a computer-aided translation tool.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43721372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Li & Hope (2021): Terminology Translation in Chinese Contexts: Theory and Practice","authors":"Zhonghua Wu","doi":"10.1075/term.21025.wu","DOIUrl":"https://doi.org/10.1075/term.21025.wu","url":null,"abstract":"","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46528579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beatriz Curti-Contessoto, Isabella De Oliveira, Lidia Almeida Barros
In 1791, the term mariage civil first appeared in French law in order to designate a civil and secular union recognised only by the State. After the introduction of this term into the French legal domain, there were legislative changes regarding the rules of civil marriages over the following years. The present paper examines the semantic evolution of the term mariage civil in French law, relating this evolution to socio-cultural and historical aspects of France between 1791 (when civil marriage was instituted in this country) and 2013 (when the most recent legislative change in the area occurred). Based on this investigation, it is possible to affirm the transformations in the French society and legislative changes have modified the concept designated by the term mariage civil, especially concerning the notion of family and the achievement of rights by women and homosexuals.
{"title":"Changes in the concept designated by the term mariage civil throughout the history of French law\u0000 1791–2013","authors":"Beatriz Curti-Contessoto, Isabella De Oliveira, Lidia Almeida Barros","doi":"10.1075/TERM.00061.CUR","DOIUrl":"https://doi.org/10.1075/TERM.00061.CUR","url":null,"abstract":"\u0000 In 1791, the term mariage civil first appeared in French law in order to designate a civil and\u0000 secular union recognised only by the State. After the introduction of this term into the French legal domain, there were\u0000 legislative changes regarding the rules of civil marriages over the following years. The present paper examines the semantic evolution of\u0000 the term mariage civil in French law, relating this evolution to socio-cultural and historical aspects of France between\u0000 1791 (when civil marriage was instituted in this country) and 2013 (when the most recent legislative change in the area occurred). Based on\u0000 this investigation, it is possible to affirm the transformations in the French society and legislative changes have modified the concept\u0000 designated by the term mariage civil, especially concerning the notion of family and the achievement of rights by women and\u0000 homosexuals.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2021-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47726453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}