In the last decades, the study of phraseology within general and specialized lexicographic resources has been of interest to scholars. However, phraseology has not been studied in language for specific purposes (LSP) as much as in language for general purposes (LGP). Therefore, this study (i) offers an overview of the definitions regarding LSP phraseology, (ii) provides a series of linguistic analyses of specialized phraseological units (SPUs) extracted from a specialized bilingual dictionary, and (iii) draws a comparative line between LGP and LSP phraseology. To do so, 11,086 entries were extracted to build the analysis database. This study provides 1,054 morphosyntactic and 4,369 semantic patterns, a definition and a taxonomy of SPUs based on the data analysis and revision of LGP phraseology notions, and a hybrid lexicographic indexation method for SPUs. The contributions of this paper answer the question ‘what is a SPU?’; while highlighting similarities and differences with LGP phraseology.
{"title":"‘Arm’s length’ phraseology?","authors":"José Luis Rojas Díaz","doi":"10.1075/term.21028.roj","DOIUrl":"https://doi.org/10.1075/term.21028.roj","url":null,"abstract":"\u0000In the last decades, the study of phraseology within general and specialized lexicographic resources has been of interest to scholars. However, phraseology has not been studied in language for specific purposes (LSP) as much as in language for general purposes (LGP). Therefore, this study (i) offers an overview of the definitions regarding LSP phraseology, (ii) provides a series of linguistic analyses of specialized phraseological units (SPUs) extracted from a specialized bilingual dictionary, and (iii) draws a comparative line between LGP and LSP phraseology. To do so, 11,086 entries were extracted to build the analysis database. This study provides 1,054 morphosyntactic and 4,369 semantic patterns, a definition and a taxonomy of SPUs based on the data analysis and revision of LGP phraseology notions, and a hybrid lexicographic indexation method for SPUs. The contributions of this paper answer the question ‘what is a SPU?’; while highlighting similarities and differences with LGP phraseology.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48634736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese medical term discovery and extraction.
{"title":"Automatic medical term extraction from Vietnamese clinical texts","authors":"C. Vo, T. Cao, Ngoc Truong, T. Ngo, Dai Bui","doi":"10.1075/term.20037.vo","DOIUrl":"https://doi.org/10.1075/term.20037.vo","url":null,"abstract":"\u0000 In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from\u0000 clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and\u0000 statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter\u0000 settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses\u0000 Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real\u0000 Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective\u0000 with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data\u0000 resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese\u0000 medical term discovery and extraction.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49564308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of the study is to explore interlingual terminological asymmetry from the cognitive-onomasiological standpoints. False synonymy of adjectives in anatomical terminology of Latin, Ukrainian, Russian, and English have been analyzed and interpreted as factors causing interlingual terminological asymmetry. In Latin anatomical terminology, there is a significant number of nominative units with similar meanings. They often have one equivalent in other (modern) languages or can be simply confused as a result of misunderstanding. It creates difficulties in the process of interlingual terminological communication. Despite the substrate nature of the Latin anatomical terminology, national terminological systems undergo different types of correlations in their functioning. The author assumes such correlations are related to the concepts of “terminological asymmetry” (lack of interlingual interchangeability of terms) and “quasi-synonymous effect” (the loss of cognitive-differential function of the term). Attention is also paid to the preparation of a theoretical basis for creating a special thesaurus to help speakers of Ukrainian study medical terminology in Latin and English.
{"title":"Interlingual terminological asymmetry as one of the aspects of studying foreign languages","authors":"Tetyana Karlova","doi":"10.1075/term.00065.kar","DOIUrl":"https://doi.org/10.1075/term.00065.kar","url":null,"abstract":"\u0000The purpose of the study is to explore interlingual terminological asymmetry from the cognitive-onomasiological standpoints. False synonymy of adjectives in anatomical terminology of Latin, Ukrainian, Russian, and English have been analyzed and interpreted as factors causing interlingual terminological asymmetry.\u0000In Latin anatomical terminology, there is a significant number of nominative units with similar meanings. They often have one equivalent in other (modern) languages or can be simply confused as a result of misunderstanding. It creates difficulties in the process of interlingual terminological communication. Despite the substrate nature of the Latin anatomical terminology, national terminological systems undergo different types of correlations in their functioning. The author assumes such correlations are related to the concepts of “terminological asymmetry” (lack of interlingual interchangeability of terms) and “quasi-synonymous effect” (the loss of cognitive-differential function of the term).\u0000Attention is also paid to the preparation of a theoretical basis for creating a special thesaurus to help speakers of Ukrainian study medical terminology in Latin and English.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47792480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio San Martín, Catherine Trekker, Pilar León-Araúz
Hyponymy is an essential semantic relation in terminology, as it represents the hierarchical organization of concepts. Much has been written about hyponymy extraction. However, terminologists working with French do not currently have user-friendly and freely available tools to automatically extract hyper-hyponymic pairs from their own corpora. This paper presents the most recent version of the ESSG (EcoLexicon Semantic Sketch Grammar) methodology, a knowledge-pattern-based approach that enables Sketch Engine to extract semantic relations. This methodology is applied to the development and evaluation of the ESSG-fr, a semantic sketch grammar for hyponymy extraction in French. The evaluation results show that the ESSG-fr is a reliable domain-independent tool for terminologists wishing to extract simple hyper-hyponymic pairs and the corresponding concordances from specialized corpora.
{"title":"Repérage automatisé de l’hyponymie dans des corpus spécialisés en français à l’aide de Sketch Engine","authors":"Antonio San Martín, Catherine Trekker, Pilar León-Araúz","doi":"10.1075/term.20044.san","DOIUrl":"https://doi.org/10.1075/term.20044.san","url":null,"abstract":"\u0000Hyponymy is an essential semantic relation in terminology, as it represents the hierarchical organization of concepts. Much has been written about hyponymy extraction. However, terminologists working with French do not currently have user-friendly and freely available tools to automatically extract hyper-hyponymic pairs from their own corpora. This paper presents the most recent version of the ESSG (EcoLexicon Semantic Sketch Grammar) methodology, a knowledge-pattern-based approach that enables Sketch Engine to extract semantic relations. This methodology is applied to the development and evaluation of the ESSG-fr, a semantic sketch grammar for hyponymy extraction in French. The evaluation results show that the ESSG-fr is a reliable domain-independent tool for terminologists wishing to extract simple hyper-hyponymic pairs and the corresponding concordances from specialized corpora.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":"1 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42705882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanja Ivanović, R. Stanković, B. Todorovic, Cvetana Krstev
This paper presents the resources and tools used to extract and evaluate bilingual, English-Serbian terminology in the power engineering domain. The resources consist of existing general and domain lexica, and a domain parallel corpus; tools include term extractors for both languages and a tool for aligning the segments belonging to corpus sentences. The system was tested by varying a match function that establishes the presence of an extracted term in an aligned segment (a chunk), ranging from very loose to strict. The evaluation of results showed that the precision of English term extraction was 92%, Serbian term extraction 86%, while the precision of bilingual pair extraction was 72% based on the strictest match function. The result of extraction was 2,684 correct bilingual pairs that enhanced the terminology database and can further be used to support the search of the power engineering aligned collection stored in a digital library.
{"title":"Corpus-based bilingual terminology extraction in the power engineering domain","authors":"Tanja Ivanović, R. Stanković, B. Todorovic, Cvetana Krstev","doi":"10.1075/term.20038.iva","DOIUrl":"https://doi.org/10.1075/term.20038.iva","url":null,"abstract":"\u0000 This paper presents the resources and tools used to extract and evaluate bilingual, English-Serbian terminology in\u0000 the power engineering domain. The resources consist of existing general and domain lexica, and a domain parallel corpus; tools\u0000 include term extractors for both languages and a tool for aligning the segments belonging to corpus sentences. The system was\u0000 tested by varying a match function that establishes the presence of an extracted term in an aligned segment (a chunk), ranging\u0000 from very loose to strict. The evaluation of results showed that the precision of English term extraction was 92%, Serbian term\u0000 extraction 86%, while the precision of bilingual pair extraction was 72% based on the strictest match function. The result of\u0000 extraction was 2,684 correct bilingual pairs that enhanced the terminology database and can further be used to support the search\u0000 of the power engineering aligned collection stored in a digital library.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48623555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose to identify, for the French language, the senses and subsenses of travail in the field of international commerce. We also intend to present the main weak idioms containing this form, from a corpus that has been constituted ex novo in the framework of the DIACOM-fr project (Department of Foreign Languages, University of Verona), part of the Excellence Project “Le Digital Humanities applicate alle lingue e letterature straniere” (“Digital Humanities applied to foreign modern languages and literatures”). The senses and subsenses as well as the weak idioms, classified on the basis of a number of semantic labels, will be represented in a draft of terminological network.
{"title":"La représentation de la polysémie et des termes complexes de type locution faible dans une base de données terminologique","authors":"Paolo Frassi","doi":"10.1075/term.21004.fra","DOIUrl":"https://doi.org/10.1075/term.21004.fra","url":null,"abstract":"We propose to identify, for the French language, the senses and subsenses of travail in the field of international commerce. We also intend to present the main weak idioms containing this form, from a corpus that has been constituted ex novo in the framework of the DIACOM-fr project (Department of Foreign Languages, University of Verona), part of the Excellence Project “Le Digital Humanities applicate alle lingue e letterature straniere” (“Digital Humanities applied to foreign modern languages and literatures”). The senses and subsenses as well as the weak idioms, classified on the basis of a number of semantic labels, will be represented in a draft of terminological network.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":"19 28","pages":"103-128"},"PeriodicalIF":0.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138513739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe the creation of a knowledge base in the field of karstology using the frame-based approach. Apart from providing a new multilingual resource using manually annotated definitions as the source of structured information, the main focus is on exploring text mining methods to identify targeted knowledge structures in specialised corpora. The first stage of this process is the design of a domain model and its implementation in a definition annotation task. Once annotation is completed, an analysis of typical co-occurrence patterns between semantic categories and the relations describing them allows us to discern ideal definition templates. We demonstrate that such templates contribute to a more comprehensive and structured representations of concepts, but also help us design targeted text mining experiments to retrieve new semantic relations from text. Two such experiments are presented, the first using intersections of word embeddings to identify words expressing a specific semantic relation, and the second using the embedding of the semantic relation to extract multiword units which contain the target relation. Results suggest that the proposed methods are promising for capturing the semantic properties of relations in frame-based knowledge modelling.
{"title":"Framing karstology","authors":"Špela Vintar, Matej Martinc","doi":"10.1075/term.21005.vin","DOIUrl":"https://doi.org/10.1075/term.21005.vin","url":null,"abstract":"We describe the creation of a knowledge base in the field of karstology using the frame-based approach. Apart from providing a new multilingual resource using manually annotated definitions as the source of structured information, the main focus is on exploring text mining methods to identify targeted knowledge structures in specialised corpora. The first stage of this process is the design of a domain model and its implementation in a definition annotation task. Once annotation is completed, an analysis of typical co-occurrence patterns between semantic categories and the relations describing them allows us to discern ideal definition templates. We demonstrate that such templates contribute to a more comprehensive and structured representations of concepts, but also help us design targeted text mining experiments to retrieve new semantic relations from text. Two such experiments are presented, the first using intersections of word embeddings to identify words expressing a specific semantic relation, and the second using the embedding of the semantic relation to extract multiword units which contain the target relation. Results suggest that the proposed methods are promising for capturing the semantic properties of relations in frame-based knowledge modelling.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45662117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.
{"title":"Tagging terms in text","authors":"Ayla Rigouts Terryn, Veronique Hoste, Els Lefever","doi":"10.1075/term.21010.rig","DOIUrl":"https://doi.org/10.1075/term.21010.rig","url":null,"abstract":"\u0000As with many tasks in natural language processing, automatic term extraction (ATE) is increasingly approached as a machine learning problem. So far, most machine learning approaches to ATE broadly follow the traditional hybrid methodology, by first extracting a list of unique candidate terms, and classifying these candidates based on the predicted probability that they are valid terms. However, with the rise of neural networks and word embeddings, the next development in ATE might be towards sequential approaches, i.e., classifying each occurrence of each token within its original context. To test the validity of such approaches for ATE, two sequential methodologies were developed, evaluated, and compared: one feature-based conditional random fields classifier and one embedding-based recurrent neural network. An additional comparison was added with a machine learning interpretation of the traditional approach. All systems were trained and evaluated on identical data in multiple languages and domains to identify their respective strengths and weaknesses. The sequential methodologies were proven to be valid approaches to ATE, and the neural network even outperformed the more traditional approach. Interestingly, a combination of multiple approaches can outperform all of them separately, showing new ways to push the state-of-the-art in ATE.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47276639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish companies, both in the Spanish originals and in their English translations.
{"title":"Variation in Spanish accounting terminology","authors":"Marta García González","doi":"10.1075/term.20039.gar","DOIUrl":"https://doi.org/10.1075/term.20039.gar","url":null,"abstract":"\u0000 The paper discusses the main results of an analysis of Spanish accounting terminology, based on the exploitation\u0000 of three different corpora. The analysis was aimed at measuring the level of terminology variation in Spanish accounting and at\u0000 assessing the suitability of accounting standards and companies’ financial statements for terminology extraction in the\u0000 translation of accounting texts. The results evidence a terminological variation of around 25% in international accounting\u0000 standards and a considerable lack of consistency in the use of accounting terminology in the financial statements of Spanish\u0000 companies, both in the Spanish originals and in their English translations.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47313917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Specialized genres are bound to the communicative context of their discourse community. However, certain genres extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for scholars as well as technical writers when writing tasting notes.
{"title":"The phraseology of wine and olive oil tasting notes","authors":"Belén López Arroyo, Lucía Sanz Valdivieso","doi":"10.1075/term.20035.lop","DOIUrl":"https://doi.org/10.1075/term.20035.lop","url":null,"abstract":"\u0000 Specialized genres are bound to the communicative context of their discourse community. However, certain genres\u0000 extend beyond one specific domain, remaining unchanged at different linguistic levels across domains. That seems to be the case of\u0000 wine and olive oil tasting notes since both analyze and evaluate sensory descriptions. The present study aims at describing and\u0000 comparing lexical chunks of wine and olive oil tasting notes at a semantic level to show if there is variation in the same genre\u0000 across domains; we will not only describe, classify and compare lexical chunks, but also identify the way this knowledge is\u0000 structured and construed in the same genre in both domains. We will test our methodology in a corpus of English tasting notes from\u0000 both genres written by three different writer profiles: professionals, amateurs and wineries/mills. Our results will be useful for\u0000 scholars as well as technical writers when writing tasting notes.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44380022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}