Tanja Ivanović, R. Stanković, B. Todorovic, Cvetana Krstev
{"title":"Corpus-based bilingual terminology extraction in the power engineering domain","authors":"Tanja Ivanović, R. Stanković, B. Todorovic, Cvetana Krstev","doi":"10.1075/term.20038.iva","DOIUrl":null,"url":null,"abstract":"\n This paper presents the resources and tools used to extract and evaluate bilingual, English-Serbian terminology in\n the power engineering domain. The resources consist of existing general and domain lexica, and a domain parallel corpus; tools\n include term extractors for both languages and a tool for aligning the segments belonging to corpus sentences. The system was\n tested by varying a match function that establishes the presence of an extracted term in an aligned segment (a chunk), ranging\n from very loose to strict. The evaluation of results showed that the precision of English term extraction was 92%, Serbian term\n extraction 86%, while the precision of bilingual pair extraction was 72% based on the strictest match function. The result of\n extraction was 2,684 correct bilingual pairs that enhanced the terminology database and can further be used to support the search\n of the power engineering aligned collection stored in a digital library.","PeriodicalId":44429,"journal":{"name":"Terminology","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Terminology","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1075/term.20038.iva","RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 3
Abstract
This paper presents the resources and tools used to extract and evaluate bilingual, English-Serbian terminology in
the power engineering domain. The resources consist of existing general and domain lexica, and a domain parallel corpus; tools
include term extractors for both languages and a tool for aligning the segments belonging to corpus sentences. The system was
tested by varying a match function that establishes the presence of an extracted term in an aligned segment (a chunk), ranging
from very loose to strict. The evaluation of results showed that the precision of English term extraction was 92%, Serbian term
extraction 86%, while the precision of bilingual pair extraction was 72% based on the strictest match function. The result of
extraction was 2,684 correct bilingual pairs that enhanced the terminology database and can further be used to support the search
of the power engineering aligned collection stored in a digital library.
期刊介绍:
Terminology is an independent journal with a cross-cultural and cross-disciplinary scope. It focusses on the discussion of (systematic) solutions not only of language problems encountered in translation, but also, for example, of (monolingual) problems of ambiguity, reference and developments in multidisciplinary communication. Particular attention will be given to new and developing subject areas such as knowledge representation and transfer, information technology tools, expert systems and terminological databases. Terminology encompasses terminology both in general (theory and practice) and in specialized fields (LSP), such as physics.