The number of Internet of Things (IoT) devices has increased dramatically in recent years, and Bluetooth technology is critical for communication between IoT devices. It is possible to protect electronic communications, the Internet of Things (IoT), and big data from malware and data theft with BlueZ’s Bluetooth File Transfer Filter (BTF). It can use a configurable filter to block unauthorized Bluetooth file transfers. The BTF is available for various Linux distributions and can protect many Bluetooth-enabled devices, including smartphones, tablets, laptops, and the Internet of Things. However, the increased number and density of Bluetooth devices have also created a serious problem—the Bluetooth worm. It poses a severe threat to the security of Bluetooth devices. In this paper, we propose a Bluetooth OBEX Proxy (BOP) to filter malicious files transferred to devices via the OBEX system service in BlueZ. The method described in this article prevents illegal Bluetooth file transfers, defending big data, the Internet of Things (IoT), and electronic communications from malware and data theft. It also protects numerous Bluetooth devices, including smartphones, tablets, laptops, and the Internet of Things, with many Linux distributions. Overall, the detection findings were entirely accurate, with zero false positives and 2.29% misses.
{"title":"Defending IoT Devices against Bluetooth Worms with Bluetooth OBEX Proxy","authors":"Fu-Hau Hsu, Min-Hao Wu, Yan-Ling Hwang, Jian-Xin Chen, Jian-Hong Huang, Hao-Jyun Wang, Yi-Wen Lai","doi":"10.3390/info14100525","DOIUrl":"https://doi.org/10.3390/info14100525","url":null,"abstract":"The number of Internet of Things (IoT) devices has increased dramatically in recent years, and Bluetooth technology is critical for communication between IoT devices. It is possible to protect electronic communications, the Internet of Things (IoT), and big data from malware and data theft with BlueZ’s Bluetooth File Transfer Filter (BTF). It can use a configurable filter to block unauthorized Bluetooth file transfers. The BTF is available for various Linux distributions and can protect many Bluetooth-enabled devices, including smartphones, tablets, laptops, and the Internet of Things. However, the increased number and density of Bluetooth devices have also created a serious problem—the Bluetooth worm. It poses a severe threat to the security of Bluetooth devices. In this paper, we propose a Bluetooth OBEX Proxy (BOP) to filter malicious files transferred to devices via the OBEX system service in BlueZ. The method described in this article prevents illegal Bluetooth file transfers, defending big data, the Internet of Things (IoT), and electronic communications from malware and data theft. It also protects numerous Bluetooth devices, including smartphones, tablets, laptops, and the Internet of Things, with many Linux distributions. Overall, the detection findings were entirely accurate, with zero false positives and 2.29% misses.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135579599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qurat Ul Ain, Mohamed Amine Chatti, Komlan Gluck Charles Bakar, Shoeb Joarder, Rawaa Alatrash
Knowledge graphs (KGs) are widely used in the education domain to offer learners a semantic representation of domain concepts from educational content and their relations, termed as educational knowledge graphs (EduKGs). Previous studies on EduKGs have incorporated concept extraction and weighting modules. However, these studies face limitations in terms of accuracy and performance. To address these challenges, this work aims to improve the concept extraction and weighting mechanisms by leveraging state-of-the-art word and sentence embedding techniques. Concretely, we enhance the SIFRank keyphrase extraction method by using SqueezeBERT and we propose a concept-weighting strategy based on SBERT. Furthermore, we conduct extensive experiments on different datasets, demonstrating significant improvements over several state-of-the-art keyphrase extraction and concept-weighting techniques.
{"title":"Automatic Construction of Educational Knowledge Graphs: A Word Embedding-Based Approach","authors":"Qurat Ul Ain, Mohamed Amine Chatti, Komlan Gluck Charles Bakar, Shoeb Joarder, Rawaa Alatrash","doi":"10.3390/info14100526","DOIUrl":"https://doi.org/10.3390/info14100526","url":null,"abstract":"Knowledge graphs (KGs) are widely used in the education domain to offer learners a semantic representation of domain concepts from educational content and their relations, termed as educational knowledge graphs (EduKGs). Previous studies on EduKGs have incorporated concept extraction and weighting modules. However, these studies face limitations in terms of accuracy and performance. To address these challenges, this work aims to improve the concept extraction and weighting mechanisms by leveraging state-of-the-art word and sentence embedding techniques. Concretely, we enhance the SIFRank keyphrase extraction method by using SqueezeBERT and we propose a concept-weighting strategy based on SBERT. Furthermore, we conduct extensive experiments on different datasets, demonstrating significant improvements over several state-of-the-art keyphrase extraction and concept-weighting techniques.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135580101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Hesham Ibrahim Abdalla, Simon Malberg, Daryna Dementieva, Edoardo Mosca, Georg Groh
As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature.
{"title":"A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers","authors":"Mohamed Hesham Ibrahim Abdalla, Simon Malberg, Daryna Dementieva, Edoardo Mosca, Georg Groh","doi":"10.3390/info14100522","DOIUrl":"https://doi.org/10.3390/info14100522","url":null,"abstract":"As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135718806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Mayr, Lucas Palma, Gustavo Zambonin, Wellington Silvano, Ricardo Custódio
Private key management is a complex obstacle arising from the traditional public key infrastructure model. However, before any related security breach can be addressed, it must first be reliably detected. Certificate Transparency (CT) is an example of a certificate issuance monitoring strategy, developed to detect the possible malfeasance of certification authorities (CAs). To the best of our knowledge, CT and other detection mechanisms do not cover digitally signed documents made by an end user, which are also susceptible to CA misbehavior. We modify the CT framework to handle signed documents via logging certificates in the blockchain to enable the secure and user-friendly monitoring of one-time signatures, backdating protection, and effective CA misbehavior detection. Moreover, to demonstrate the feasibility of our proposal, we present distinct deployment scenarios and analyze the storage, performance, and monetary costs.
{"title":"Monitoring Key Pair Usage through Distributed Ledgers and One-Time Signatures","authors":"Lucas Mayr, Lucas Palma, Gustavo Zambonin, Wellington Silvano, Ricardo Custódio","doi":"10.3390/info14100523","DOIUrl":"https://doi.org/10.3390/info14100523","url":null,"abstract":"Private key management is a complex obstacle arising from the traditional public key infrastructure model. However, before any related security breach can be addressed, it must first be reliably detected. Certificate Transparency (CT) is an example of a certificate issuance monitoring strategy, developed to detect the possible malfeasance of certification authorities (CAs). To the best of our knowledge, CT and other detection mechanisms do not cover digitally signed documents made by an end user, which are also susceptible to CA misbehavior. We modify the CT framework to handle signed documents via logging certificates in the blockchain to enable the secure and user-friendly monitoring of one-time signatures, backdating protection, and effective CA misbehavior detection. Moreover, to demonstrate the feasibility of our proposal, we present distinct deployment scenarios and analyze the storage, performance, and monetary costs.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135718925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The German labor market relies heavily on vocational training, retraining, and continuing education. In order to match training seekers with training offers and to make the available data interoperable, we present a novel approach to automatically detect access to education and training in German training offers and advertisements and identify open research questions and areas for further research. In particular, we focus on (a) general education and school leaving certificates, (b) work experience, (c) previous apprenticeship, and (d) a list of skills provided by the German Federal Employment Agency. This novel approach combines several methods: First, we provide technical terms and classes of the education system that are used synonymously, combining different qualifications and adding obsolete terms. Second, we provide rule-based matching to identify the need for work experience or education. However, not all qualification requirements can be matched due to incompatible data schemas or non-standardized requirements such as initial tests or interviews. Although there are several shortcomings, the presented approach shows promising results for two data sets: training and retraining advertisements.
{"title":"Challenges of Automated Identification of Access to Education and Training in Germany","authors":"Jens Dörpinghaus, David Samray, Robert Helmrich","doi":"10.3390/info14100524","DOIUrl":"https://doi.org/10.3390/info14100524","url":null,"abstract":"The German labor market relies heavily on vocational training, retraining, and continuing education. In order to match training seekers with training offers and to make the available data interoperable, we present a novel approach to automatically detect access to education and training in German training offers and advertisements and identify open research questions and areas for further research. In particular, we focus on (a) general education and school leaving certificates, (b) work experience, (c) previous apprenticeship, and (d) a list of skills provided by the German Federal Employment Agency. This novel approach combines several methods: First, we provide technical terms and classes of the education system that are used synonymously, combining different qualifications and adding obsolete terms. Second, we provide rule-based matching to identify the need for work experience or education. However, not all qualification requirements can be matched due to incompatible data schemas or non-standardized requirements such as initial tests or interviews. Although there are several shortcomings, the presented approach shows promising results for two data sets: training and retraining advertisements.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135719610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bilal Naji Alhasnawi, Basil H. Jasim, Ali M. Jasim, Vladimír Bureš, Arshad Naji Alhasnawi, Raad Z. Homod, Majid Razaq Mohamed Alsemawai, Rabeh Abbassi, Bishoy E. Sedhom
The electrical demand and generation in power systems is currently the biggest source of uncertainty for an electricity provider. For a dependable and financially advantageous electricity system, demand response (DR) success as a result of household appliance energy management has attracted significant attention. Due to fluctuating electricity rates and usage trends, determining the best schedule for apartment appliances can be difficult. As a result of this context, the Improved Cockroach Swarm Optimization Algorithm (ICSOA) is combined with the Innovative Apartments Appliance Scheduling (IAAS) framework. Using the proposed technique, the cost of electricity reduction, user comfort maximization, and peak-to-average ratio reduction are analyzed for apartment appliances. The proposed framework is evaluated by comparing it with BFOA and W/O scheduling cases. In comparison to the W/O scheduling case, the BFOA method lowered energy costs by 17.75%, but the ICSA approach reduced energy cost by 46.085%. According to the results, the created ICSA algorithm performed better than the BFOA and W/O scheduling situations in terms of the stated objectives and was advantageous to both utilities and consumers.
{"title":"A Multi-Objective Improved Cockroach Swarm Algorithm Approach for Apartment Energy Management Systems","authors":"Bilal Naji Alhasnawi, Basil H. Jasim, Ali M. Jasim, Vladimír Bureš, Arshad Naji Alhasnawi, Raad Z. Homod, Majid Razaq Mohamed Alsemawai, Rabeh Abbassi, Bishoy E. Sedhom","doi":"10.3390/info14100521","DOIUrl":"https://doi.org/10.3390/info14100521","url":null,"abstract":"The electrical demand and generation in power systems is currently the biggest source of uncertainty for an electricity provider. For a dependable and financially advantageous electricity system, demand response (DR) success as a result of household appliance energy management has attracted significant attention. Due to fluctuating electricity rates and usage trends, determining the best schedule for apartment appliances can be difficult. As a result of this context, the Improved Cockroach Swarm Optimization Algorithm (ICSOA) is combined with the Innovative Apartments Appliance Scheduling (IAAS) framework. Using the proposed technique, the cost of electricity reduction, user comfort maximization, and peak-to-average ratio reduction are analyzed for apartment appliances. The proposed framework is evaluated by comparing it with BFOA and W/O scheduling cases. In comparison to the W/O scheduling case, the BFOA method lowered energy costs by 17.75%, but the ICSA approach reduced energy cost by 46.085%. According to the results, the created ICSA algorithm performed better than the BFOA and W/O scheduling situations in terms of the stated objectives and was advantageous to both utilities and consumers.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135864711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gergely Márk Csányi, Renátó Vági, Andrea Megyeri, Anna Fülöp , Dániel Nagy, János Pál Vadász, István Üveges
Few-shot learning is a deep learning subfield that is the focus of research nowadays. This paper addresses the research question of whether a triplet-trained Siamese network, initially designed for multi-class classification, can effectively handle multi-label classification. We conducted a case study to identify any limitations in its application. The experiments were conducted on a dataset containing Hungarian legal decisions of administrative agencies in tax matters belonging to a major legal content provider. We also tested how different Siamese embeddings compare on classifying a previously non-existing label on a binary and a multi-label setting. We found that triplet-trained Siamese networks can be applied to perform classification but with a sampling restriction during training. We also found that the overlap between labels affects the results negatively. The few-shot model, seeing only ten examples for each label, provided competitive results compared to models trained on tens of thousands of court decisions using tf-idf vectorization and logistic regression.
{"title":"Can Triplet Loss Be Used for Multi-Label Few-Shot Classification? A Case Study","authors":"Gergely Márk Csányi, Renátó Vági, Andrea Megyeri, Anna Fülöp , Dániel Nagy, János Pál Vadász, István Üveges","doi":"10.3390/info14100520","DOIUrl":"https://doi.org/10.3390/info14100520","url":null,"abstract":"Few-shot learning is a deep learning subfield that is the focus of research nowadays. This paper addresses the research question of whether a triplet-trained Siamese network, initially designed for multi-class classification, can effectively handle multi-label classification. We conducted a case study to identify any limitations in its application. The experiments were conducted on a dataset containing Hungarian legal decisions of administrative agencies in tax matters belonging to a major legal content provider. We also tested how different Siamese embeddings compare on classifying a previously non-existing label on a binary and a multi-label setting. We found that triplet-trained Siamese networks can be applied to perform classification but with a sampling restriction during training. We also found that the overlap between labels affects the results negatively. The few-shot model, seeing only ten examples for each label, provided competitive results compared to models trained on tens of thousands of court decisions using tf-idf vectorization and logistic regression.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135966563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learnability in Automated Driving (LiAD) is a neglected research topic, especially when considering the unpredictable and intricate ways humans learn to interact and use automated driving systems (ADS) over the sequence of time. Moreover, there is a scarcity of publications dedicated to LiAD (specifically extended learnability methods) to guide the scientific paradigm. As a result, this generates scientific discord and, thus, leaves many facets of long-term learning effects associated with automated driving in dire need of significant research courtesy. This, we believe, is a constraint to knowledge discovery on quality interaction design phenomena. In a sense, it is imperative to abstract knowledge on how long-term effects and learning effects may affect (negatively and positively) users’ learning and mental models. As well as induce changeable behavioural configurations and performances. In view of that, it may be imperative to examine operational concepts that may help researchers envision future scenarios with automation by assessing users’ learning ability, how they learn and what they learn over the sequence of time. As well as constructing a theory of effects (from micro, meso and macro perspectives), which may help profile ergonomic quality design aspects that stand the test of time. As a result, we reviewed the literature on learnability, which we mined for LiAD knowledge discovery from the experience perspective of long-term learning effects. Therefore, the paper offers the reader the resulting discussion points formulated under the Learnability Engineering Life Cycle. For instance, firstly, contextualisation of LiAD with emphasis on extended LiAD. Secondly, conceptualisation and operationalisation of the operational mechanics of LiAD as a concept in ergonomic quality engineering (with an introduction of Concepts for Applying Learnability Engineering (CALE) research based on LiAD knowledge discovery). Thirdly, the systemisation of implementable long-term research strategies towards comprehending behaviour modification associated with extended LiAD. As the vehicle industry revolutionises at a rapid pace towards automation and artificially intelligent (AI) systems, this knowledge is useful for illuminating and instructing quality interaction strategies and Quality Automated Driving (QAD).
{"title":"Learnability in Automated Driving (LiAD): Concepts for Applying Learnability Engineering (CALE) Based on Long-Term Learning Effects","authors":"Naomi Y. Mbelekani, Klaus Bengler","doi":"10.3390/info14100519","DOIUrl":"https://doi.org/10.3390/info14100519","url":null,"abstract":"Learnability in Automated Driving (LiAD) is a neglected research topic, especially when considering the unpredictable and intricate ways humans learn to interact and use automated driving systems (ADS) over the sequence of time. Moreover, there is a scarcity of publications dedicated to LiAD (specifically extended learnability methods) to guide the scientific paradigm. As a result, this generates scientific discord and, thus, leaves many facets of long-term learning effects associated with automated driving in dire need of significant research courtesy. This, we believe, is a constraint to knowledge discovery on quality interaction design phenomena. In a sense, it is imperative to abstract knowledge on how long-term effects and learning effects may affect (negatively and positively) users’ learning and mental models. As well as induce changeable behavioural configurations and performances. In view of that, it may be imperative to examine operational concepts that may help researchers envision future scenarios with automation by assessing users’ learning ability, how they learn and what they learn over the sequence of time. As well as constructing a theory of effects (from micro, meso and macro perspectives), which may help profile ergonomic quality design aspects that stand the test of time. As a result, we reviewed the literature on learnability, which we mined for LiAD knowledge discovery from the experience perspective of long-term learning effects. Therefore, the paper offers the reader the resulting discussion points formulated under the Learnability Engineering Life Cycle. For instance, firstly, contextualisation of LiAD with emphasis on extended LiAD. Secondly, conceptualisation and operationalisation of the operational mechanics of LiAD as a concept in ergonomic quality engineering (with an introduction of Concepts for Applying Learnability Engineering (CALE) research based on LiAD knowledge discovery). Thirdly, the systemisation of implementable long-term research strategies towards comprehending behaviour modification associated with extended LiAD. As the vehicle industry revolutionises at a rapid pace towards automation and artificially intelligent (AI) systems, this knowledge is useful for illuminating and instructing quality interaction strategies and Quality Automated Driving (QAD).","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136061016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takuya Nakata, Sinan Chen, Sachio Saiki, Masahide Nakamura
Software upcycling, a form of software reuse, is a concept that efficiently generates novel, innovative, and value-added development projects by utilizing knowledge extracted from past projects. However, how to integrate the materials derived from these projects for upcycling remains uncertain. This study defines a systematic model for upcycling cases and develops the Sharing Upcycling Cases with Context and Evaluation for Efficient Software Development (SUCCEED) system to support the implementation of new upcycling initiatives by effectively sharing cases within the organization. To ascertain the efficacy of upcycling within our proposed model and system, we formulated three research questions and conducted two distinct experiments. Through surveys, we identified motivations and characteristics of shared upcycling-relevant development cases. Development tasks were divided into groups, those that employed the SUCCEED system and those that did not, in order to discern the enhancements brought about by upcycling. As a result of this research, we accomplished a comprehensive structuring of both technical and experiential knowledge beneficial for development, a feat previously unrealizable through conventional software reuse, and successfully realized reuse in a proactive and closed environment through construction of the wisdom of crowds for upcycling cases. Consequently, it becomes possible to systematically perform software upcycling by leveraging knowledge from existing projects for streamlining of software development.
{"title":"SUCCEED: Sharing Upcycling Cases with Context and Evaluation for Efficient Software Development","authors":"Takuya Nakata, Sinan Chen, Sachio Saiki, Masahide Nakamura","doi":"10.3390/info14090518","DOIUrl":"https://doi.org/10.3390/info14090518","url":null,"abstract":"Software upcycling, a form of software reuse, is a concept that efficiently generates novel, innovative, and value-added development projects by utilizing knowledge extracted from past projects. However, how to integrate the materials derived from these projects for upcycling remains uncertain. This study defines a systematic model for upcycling cases and develops the Sharing Upcycling Cases with Context and Evaluation for Efficient Software Development (SUCCEED) system to support the implementation of new upcycling initiatives by effectively sharing cases within the organization. To ascertain the efficacy of upcycling within our proposed model and system, we formulated three research questions and conducted two distinct experiments. Through surveys, we identified motivations and characteristics of shared upcycling-relevant development cases. Development tasks were divided into groups, those that employed the SUCCEED system and those that did not, in order to discern the enhancements brought about by upcycling. As a result of this research, we accomplished a comprehensive structuring of both technical and experiential knowledge beneficial for development, a feat previously unrealizable through conventional software reuse, and successfully realized reuse in a proactive and closed environment through construction of the wisdom of crowds for upcycling cases. Consequently, it becomes possible to systematically perform software upcycling by leveraging knowledge from existing projects for streamlining of software development.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136129397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In practical applications, the accuracy of domain terminology translation is an important criterion for the performance evaluation of domain machine translation models. Aiming at the problem of phrase mismatch and improper translation caused by word-by-word translation of English terminology phrases, this paper constructs a dictionary of terminology phrases in the field of electrical engineering and proposes three schemes to integrate the dictionary knowledge into the translation model. Scheme 1 replaces the terminology phrases of the source language. Scheme 2 uses the residual connection at the encoder end after the terminology phrase is replaced. Scheme 3 uses a segmentation method of combining character segmentation and terminology segmentation for the target language and uses an additional loss module in the training process. The results show that all three schemes are superior to the baseline model in two aspects: BLEU value and correct translation rate of terminology words. In the test set, the highest accuracy of terminology words was 48.3% higher than that of the baseline model. The BLEU value is up to 3.6 higher than the baseline model. The phenomenon is also analyzed and discussed in this paper.
{"title":"Machine Translation of Electrical Terminology Constraints","authors":"Zepeng Wang, Yuan Chen, Juwei Zhang","doi":"10.3390/info14090517","DOIUrl":"https://doi.org/10.3390/info14090517","url":null,"abstract":"In practical applications, the accuracy of domain terminology translation is an important criterion for the performance evaluation of domain machine translation models. Aiming at the problem of phrase mismatch and improper translation caused by word-by-word translation of English terminology phrases, this paper constructs a dictionary of terminology phrases in the field of electrical engineering and proposes three schemes to integrate the dictionary knowledge into the translation model. Scheme 1 replaces the terminology phrases of the source language. Scheme 2 uses the residual connection at the encoder end after the terminology phrase is replaced. Scheme 3 uses a segmentation method of combining character segmentation and terminology segmentation for the target language and uses an additional loss module in the training process. The results show that all three schemes are superior to the baseline model in two aspects: BLEU value and correct translation rate of terminology words. In the test set, the highest accuracy of terminology words was 48.3% higher than that of the baseline model. The BLEU value is up to 3.6 higher than the baseline model. The phenomenon is also analyzed and discussed in this paper.","PeriodicalId":38479,"journal":{"name":"Information (Switzerland)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136306961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}