Pub Date : 2025-01-24eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2497
Jiaxuan Lu, Hyukjun Gweon
The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.
{"title":"Random k conditional nearest neighbor for high-dimensional data.","authors":"Jiaxuan Lu, Hyukjun Gweon","doi":"10.7717/peerj-cs.2497","DOIUrl":"10.7717/peerj-cs.2497","url":null,"abstract":"<p><p>The k nearest neighbor (kNN) approach is a simple and effective algorithm for classification and a number of variants have been proposed based on the kNN algorithm. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data. To address the limitation of nearest-neighbor based approaches in high-dimensional data, we propose to extend the k conditional nearest neighbor (kCNN) method which is an effective variant of kNN. The proposed approach aggregates multiple kCNN classifiers, each constructed from a randomly sampled feature subset. We also develop a score metric to weigh individual classifiers based on the level of separation of the feature subsets. We investigate the properties of the proposed method using simulation. Moreover, the experiments on gene expression datasets show that the proposed method is promising in terms of predictive classification performance.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2497"},"PeriodicalIF":3.5,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2639
Shanjin Zhong, Yang Cao, Qiaosen Chen, Jie Gong
Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as "on" and "in". Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.
{"title":"Learning with semantic ambiguity for unbiased scene graph generation.","authors":"Shanjin Zhong, Yang Cao, Qiaosen Chen, Jie Gong","doi":"10.7717/peerj-cs.2639","DOIUrl":"https://doi.org/10.7717/peerj-cs.2639","url":null,"abstract":"<p><p>Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as \"<i>on\"</i> and \"<i>in\"</i>. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2639"},"PeriodicalIF":3.5,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2653
Gang Han, Yan Ma, Zhongliang Zhang, Yuxin Wang
Patient privacy data security is a pivotal area of research within the burgeoning field of smart healthcare. This study proposes an innovative hybrid blockchain-based framework for the secure sharing of electronic medical record (EMR) data. Unlike traditional privacy protection schemes, our approach employs a novel tripartite blockchain architecture that segregates healthcare data across distinct blockchains for patients and healthcare providers while introducing a separate social blockchain to enable privacy-preserving data sharing with authorized external entities. This structure enhances both security and transparency while fostering collaborative efforts across different stakeholders. To address the inherent complexity of managing multiple blockchains, a unique cross-chain signature algorithm is introduced, based on the Boneh-Lynn-Shacham (BLS) signature aggregation technique. This algorithm not only streamlines the signature process across chains but also strengthens system security and optimizes storage efficiency, addressing a key challenge in multi-chain systems. Additionally, our external sharing algorithm resolves the prevalent issue of medical data silos by facilitating better data categorization and enabling selective, secure external sharing through the social blockchain. Security analyses and experimental results demonstrate that the proposed scheme offers superior security, storage optimization, and flexibility compared to existing solutions, making it a robust choice for safeguarding patient data in smart healthcare environments.
{"title":"A hybrid blockchain-based solution for secure sharing of electronic medical record data.","authors":"Gang Han, Yan Ma, Zhongliang Zhang, Yuxin Wang","doi":"10.7717/peerj-cs.2653","DOIUrl":"10.7717/peerj-cs.2653","url":null,"abstract":"<p><p>Patient privacy data security is a pivotal area of research within the burgeoning field of smart healthcare. This study proposes an innovative hybrid blockchain-based framework for the secure sharing of electronic medical record (EMR) data. Unlike traditional privacy protection schemes, our approach employs a novel tripartite blockchain architecture that segregates healthcare data across distinct blockchains for patients and healthcare providers while introducing a separate social blockchain to enable privacy-preserving data sharing with authorized external entities. This structure enhances both security and transparency while fostering collaborative efforts across different stakeholders. To address the inherent complexity of managing multiple blockchains, a unique cross-chain signature algorithm is introduced, based on the Boneh-Lynn-Shacham (BLS) signature aggregation technique. This algorithm not only streamlines the signature process across chains but also strengthens system security and optimizes storage efficiency, addressing a key challenge in multi-chain systems. Additionally, our external sharing algorithm resolves the prevalent issue of medical data silos by facilitating better data categorization and enabling selective, secure external sharing through the social blockchain. Security analyses and experimental results demonstrate that the proposed scheme offers superior security, storage optimization, and flexibility compared to existing solutions, making it a robust choice for safeguarding patient data in smart healthcare environments.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2653"},"PeriodicalIF":3.5,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temporal knowledge graphs (TKGs) are critical tools for capturing the dynamic nature of facts that evolve over time, making them highly valuable in a broad spectrum of intelligent applications. In the domain of temporal knowledge graph extrapolation reasoning, the prediction of future occurrences is of great significance and presents considerable obstacles. While current models consider the fact changes over time and recognize that historical facts may recur, they often overlook the influence of past events on future predictions. Motivated by these considerations, this work introduces a novel temporal knowledge graph reasoning model, named Temporal Reasoning with Recurrent Encoding and Contrastive Learning (TRCL), which integrates recurrent encoding and contrastive learning techniques. The proposed model has the ability to capture the evolution of historical facts, generating representations of entities and relationships through recurrent encoding. Additionally, TRCL incorporates a global historical matrix to account for repeated historical occurrences and employs contrastive learning to alleviate the interference of historical facts in predicting future events. The TKG reasoning outcomes are subsequently derived through a time decoder. A quantity of experiments conducted on four benchmark datasets demonstrate the exceptional performance of the proposed TRCL model across a range of metrics, surpassing state-of-the-art TKG reasoning models. When compared to the strong baseline Time-Guided Recurrent Graph Network (TiRGN) model, the proposed TRCL achieves 1.03% improvements on ICEWS14 using mean reciprocal rank (MRR) evaluation metric. This innovative proposed method not only enhances the accuracy of TKG extrapolation, but also sets a new standard for robustness in dynamic knowledge graph applications, paving the way for future research and practical applications in predictive intelligence systems.
{"title":"A temporal knowledge graph reasoning model based on recurrent encoding and contrastive learning.","authors":"Weitong Liu, Khairunnisa Hasikin, Anis Salwa Mohd Khairuddin, Meizhen Liu, Xuechen Zhao","doi":"10.7717/peerj-cs.2595","DOIUrl":"10.7717/peerj-cs.2595","url":null,"abstract":"<p><p>Temporal knowledge graphs (TKGs) are critical tools for capturing the dynamic nature of facts that evolve over time, making them highly valuable in a broad spectrum of intelligent applications. In the domain of temporal knowledge graph extrapolation reasoning, the prediction of future occurrences is of great significance and presents considerable obstacles. While current models consider the fact changes over time and recognize that historical facts may recur, they often overlook the influence of past events on future predictions. Motivated by these considerations, this work introduces a novel temporal knowledge graph reasoning model, named Temporal Reasoning with Recurrent Encoding and Contrastive Learning (TRCL), which integrates recurrent encoding and contrastive learning techniques. The proposed model has the ability to capture the evolution of historical facts, generating representations of entities and relationships through recurrent encoding. Additionally, TRCL incorporates a global historical matrix to account for repeated historical occurrences and employs contrastive learning to alleviate the interference of historical facts in predicting future events. The TKG reasoning outcomes are subsequently derived through a time decoder. A quantity of experiments conducted on four benchmark datasets demonstrate the exceptional performance of the proposed TRCL model across a range of metrics, surpassing state-of-the-art TKG reasoning models. When compared to the strong baseline Time-Guided Recurrent Graph Network (TiRGN) model, the proposed TRCL achieves 1.03% improvements on ICEWS14 using mean reciprocal rank (MRR) evaluation metric. This innovative proposed method not only enhances the accuracy of TKG extrapolation, but also sets a new standard for robustness in dynamic knowledge graph applications, paving the way for future research and practical applications in predictive intelligence systems.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2595"},"PeriodicalIF":3.5,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2647
Nadeem Yaqub, Jianbiao Zhang, Muhammad Irfan Khalid, Weiru Wang, Markus Helfert, Mansoor Ahmed, Jungsuk Kim
Electronic health record transmission and storage involve sensitive information, requiring robust security measures to ensure access is limited to authorized personnel. In the existing state of the art, there is a growing need for efficient access control approaches for the secure accessibility of patient health data by sustainable electronic health records. Locking medical data in a healthcare center forms information isolation; thus, setting up healthcare data exchange platforms is a driving force behind electronic healthcare centers. The healthcare entities access rights like subject, controller, and requester are defined and regulated by access control policies as defined by the General Data Protection Regulation (GDPR). In this work, we have introduced a blend of policy-based access control (PBAC) system backed by blockchain technology, where smart contracts govern the intrinsic part of security and privacy. As a result, any Subject can know at any time who currently has the right to access his data. The PBAC grants access to electronic health records based on predefined policies. Our proposed PBAC approach employs policies in which the subject, controller, and requester can grant access, revoke access, and check logs and actions made in a particular healthcare system. Smart contracts dynamically enforce access control policies and manage access permissions, ensuring that sensitive data is available only to authorized users. Delineating the proposed access control system and comparing it to other systems demonstrates that our approach is more adaptable to various healthcare data protection scenarios where there is a need to share sensitive data simultaneously and a robust need to safeguard the rights of the involved entities.
{"title":"Blockchain enabled policy-based access control mechanism to restrict unauthorized access to electronic health records.","authors":"Nadeem Yaqub, Jianbiao Zhang, Muhammad Irfan Khalid, Weiru Wang, Markus Helfert, Mansoor Ahmed, Jungsuk Kim","doi":"10.7717/peerj-cs.2647","DOIUrl":"10.7717/peerj-cs.2647","url":null,"abstract":"<p><p>Electronic health record transmission and storage involve sensitive information, requiring robust security measures to ensure access is limited to authorized personnel. In the existing state of the art, there is a growing need for efficient access control approaches for the secure accessibility of patient health data by sustainable electronic health records. Locking medical data in a healthcare center forms information isolation; thus, setting up healthcare data exchange platforms is a driving force behind electronic healthcare centers. The healthcare entities access rights like subject, controller, and requester are defined and regulated by access control policies as defined by the General Data Protection Regulation (GDPR). In this work, we have introduced a blend of policy-based access control (PBAC) system backed by blockchain technology, where smart contracts govern the intrinsic part of security and privacy. As a result, any Subject can know at any time who currently has the right to access his data. The PBAC grants access to electronic health records based on predefined policies. Our proposed PBAC approach employs policies in which the subject, controller, and requester can grant access, revoke access, and check logs and actions made in a particular healthcare system. Smart contracts dynamically enforce access control policies and manage access permissions, ensuring that sensitive data is available only to authorized users. Delineating the proposed access control system and comparing it to other systems demonstrates that our approach is more adaptable to various healthcare data protection scenarios where there is a need to share sensitive data simultaneously and a robust need to safeguard the rights of the involved entities.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2647"},"PeriodicalIF":3.5,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143081456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-22eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2641
Lei Yan, Guanghuai Zhao, Xiaohui Li, Pengxuan Sun
The inconsistency in software development standards frequently leads to vulnerabilities that can jeopardize an application's cryptographic integrity. This situation can result in incomplete or flawed encryption processes. Vulnerabilities may manifest as missing, bypassed, or improperly executed encryption functions or the absence of critical cryptographic mechanisms, which eventually weaken security goals. This article introduces a thorough method for detecting vulnerabilities using dynamic and static analysis, focusing on a cryptographic function dominance tree. This strategy systematically minimizes the likelihood of integrity breaches in cryptographic applications. A layered and modular model is developed to maintain integrity by mapping the entire flow of cryptographic function calls across various components. The cryptographic function call graph and dominance tree are extracted and subsequently analyzed using an integrated dynamic and static technique. The extracted information undergoes strict evaluation against the anticipated function call sequence in the relevant cryptographic module to identify and localize potential security issues. Experimental findings demonstrate that the proposed method considerably enhances the accuracy and comprehensiveness of vulnerability detection in cryptographic applications, improving implementation security and resilience against misuse vulnerabilities.
{"title":"Secure software development: leveraging application call graphs to detect security vulnerabilities.","authors":"Lei Yan, Guanghuai Zhao, Xiaohui Li, Pengxuan Sun","doi":"10.7717/peerj-cs.2641","DOIUrl":"10.7717/peerj-cs.2641","url":null,"abstract":"<p><p>The inconsistency in software development standards frequently leads to vulnerabilities that can jeopardize an application's cryptographic integrity. This situation can result in incomplete or flawed encryption processes. Vulnerabilities may manifest as missing, bypassed, or improperly executed encryption functions or the absence of critical cryptographic mechanisms, which eventually weaken security goals. This article introduces a thorough method for detecting vulnerabilities using dynamic and static analysis, focusing on a cryptographic function dominance tree. This strategy systematically minimizes the likelihood of integrity breaches in cryptographic applications. A layered and modular model is developed to maintain integrity by mapping the entire flow of cryptographic function calls across various components. The cryptographic function call graph and dominance tree are extracted and subsequently analyzed using an integrated dynamic and static technique. The extracted information undergoes strict evaluation against the anticipated function call sequence in the relevant cryptographic module to identify and localize potential security issues. Experimental findings demonstrate that the proposed method considerably enhances the accuracy and comprehensiveness of vulnerability detection in cryptographic applications, improving implementation security and resilience against misuse vulnerabilities.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2641"},"PeriodicalIF":3.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph auto-encoders are a crucial research area within graph neural networks, commonly employed for generating graph embeddings while minimizing errors in unsupervised learning. Traditional graph auto-encoders focus on reconstructing minimal graph data loss to encode neighborhood information for each node, yielding node embedding representations. However, existing graph auto-encoder models often overlook node representations and fail to capture contextual node information within the graph data, resulting in poor embedding effects. Accordingly, this study proposes the ensemble graph auto-encoders (E-GAE) model. It utilizes the ensemble random walk graph auto-encoder, the random walk graph auto-encoder of the ensemble network, and the graph attention auto-encoder to generate three node embedding matrices Z. Then, these techniques are combined using adaptive weights to reconstruct a new node embedding matrix. This method addresses the problem of low-quality embeddings. The model's performance is evaluated using three publicly available datasets (Cora, Citeseer, and PubMed), indicating its effectiveness through multiple experiments. It achieves up to a 2.0% improvement in the link prediction task and a 9.4% enhancement in the clustering task. Our code for this work can be found at https://github.com/xcgydfjjjderg/graphautoencoder.
{"title":"Ensemble graph auto-encoders for clustering and link prediction.","authors":"Chengxin Xie, Jingui Huang, Yongjiang Shi, Hui Pang, Liting Gao, Xiumei Wen","doi":"10.7717/peerj-cs.2648","DOIUrl":"10.7717/peerj-cs.2648","url":null,"abstract":"<p><p>Graph auto-encoders are a crucial research area within graph neural networks, commonly employed for generating graph embeddings while minimizing errors in unsupervised learning. Traditional graph auto-encoders focus on reconstructing minimal graph data loss to encode neighborhood information for each node, yielding node embedding representations. However, existing graph auto-encoder models often overlook node representations and fail to capture contextual node information within the graph data, resulting in poor embedding effects. Accordingly, this study proposes the ensemble graph auto-encoders (E-GAE) model. It utilizes the ensemble random walk graph auto-encoder, the random walk graph auto-encoder of the ensemble network, and the graph attention auto-encoder to generate three node embedding matrices Z. Then, these techniques are combined using adaptive weights to reconstruct a new node embedding matrix. This method addresses the problem of low-quality embeddings. The model's performance is evaluated using three publicly available datasets (Cora, Citeseer, and PubMed), indicating its effectiveness through multiple experiments. It achieves up to a 2.0% improvement in the link prediction task and a 9.4% enhancement in the clustering task. Our code for this work can be found at https://github.com/xcgydfjjjderg/graphautoencoder.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2648"},"PeriodicalIF":3.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-22eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2534
Marwa Essam, Tamer Elsayed
News background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search query to retrieve the background links by sparse retrieval. While being effective, that method is still far from being optimal. Furthermore, it only leverages the lexical matching signal between the input article and the candidate background links. Nevertheless, intuitively, there may exist resources with useful background information that do not lexically overlap with the input article's vocabulary. While many studies proposed systems that adopt semantic matching for addressing news background linking, none were able to outperform the simple lexical-based matching method. In this paper, we investigate multiple methods to integrate both the lexical and semantic relevance signals for better reranking of candidate background links. To represent news articles in the semantic space, we compare multiple Transformer-based encoder models in a zero-shot setting without the need for any labeled data. Our results show that using a hierarchical aggregation of sentence-level representations generates a good semantic representation of news articles, which is then integrated with lexical matching to achieve a new state-of-the-art solution for the problem. We further show that a significant performance improvement is potentially attainable if the degree by which a semantic relevance signal is needed is accurately predicted per input article.
{"title":"Zero-shot reranking with dense encoder models for news background linking.","authors":"Marwa Essam, Tamer Elsayed","doi":"10.7717/peerj-cs.2534","DOIUrl":"10.7717/peerj-cs.2534","url":null,"abstract":"<p><p>News background linking is the problem of finding useful links to resources that provide contextual background information for a given news article. Many systems were proposed to address this problem. Yet, the most effective and reproducible method, to date, used the entire input article as a search query to retrieve the background links by sparse retrieval. While being effective, that method is still far from being optimal. Furthermore, it only leverages the lexical matching signal between the input article and the candidate background links. Nevertheless, intuitively, there may exist resources with useful background information that do not lexically overlap with the input article's vocabulary. While many studies proposed systems that adopt semantic matching for addressing news background linking, none were able to outperform the simple lexical-based matching method. In this paper, we investigate multiple methods to integrate both the lexical and semantic relevance signals for better reranking of candidate background links. To represent news articles in the semantic space, we compare multiple Transformer-based encoder models in a zero-shot setting without the need for any labeled data. Our results show that using a hierarchical aggregation of sentence-level representations generates a good semantic representation of news articles, which is then integrated with lexical matching to achieve a new state-of-the-art solution for the problem. We further show that a significant performance improvement is potentially attainable if the degree by which a semantic relevance signal is needed is accurately predicted per input article.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2534"},"PeriodicalIF":3.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2570
Priyadharsini S, Bhuvaneshwara Raja K, Kousi Krishnan T, Senthil Kumar Jagatheesaperumal, Bader Fahad Alkhamees, Mohammad Mehedi Hassan
Background: Foreign object debris (FOD) is an unwanted substance that damages vehicular systems, most commonly the wheels of vehicles. In airport runways, these foreign objects can damage the wheels or internal systems of planes, potentially leading to flight crashes. Surveys indicate that FOD-related damage costs over $4 billion annually, affecting airlines, airport tenants, and passengers. Current FOD clearance involves high-cost radars and significant manpower, and existing radar and camera-based surveillance methods are expensive to install.
Methods: This work proposes a video-based deep learning methodology to address the high cost of radar-based FOD detection. The proposed system consists of two modules for FOD detection: object classification and object localization. The classification module categorizes FOD into specific types of foreign objects. In the object localization module, these classified objects are pinpointed in video frames.
Results: The proposed system was experimentally tested with a large video dataset and compared with existing methods. The results demonstrated improved accuracy and robustness, allowing the FOD clearance team to quickly detect and remove foreign objects, thereby enhancing the safety and efficiency of airport runway operations.
{"title":"Foreign object debris detection in lane images using deep learning methodology.","authors":"Priyadharsini S, Bhuvaneshwara Raja K, Kousi Krishnan T, Senthil Kumar Jagatheesaperumal, Bader Fahad Alkhamees, Mohammad Mehedi Hassan","doi":"10.7717/peerj-cs.2570","DOIUrl":"10.7717/peerj-cs.2570","url":null,"abstract":"<p><strong>Background: </strong>Foreign object debris (FOD) is an unwanted substance that damages vehicular systems, most commonly the wheels of vehicles. In airport runways, these foreign objects can damage the wheels or internal systems of planes, potentially leading to flight crashes. Surveys indicate that FOD-related damage costs over $4 billion annually, affecting airlines, airport tenants, and passengers. Current FOD clearance involves high-cost radars and significant manpower, and existing radar and camera-based surveillance methods are expensive to install.</p><p><strong>Methods: </strong>This work proposes a video-based deep learning methodology to address the high cost of radar-based FOD detection. The proposed system consists of two modules for FOD detection: object classification and object localization. The classification module categorizes FOD into specific types of foreign objects. In the object localization module, these classified objects are pinpointed in video frames.</p><p><strong>Results: </strong>The proposed system was experimentally tested with a large video dataset and compared with existing methods. The results demonstrated improved accuracy and robustness, allowing the FOD clearance team to quickly detect and remove foreign objects, thereby enhancing the safety and efficiency of airport runway operations.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2570"},"PeriodicalIF":3.5,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.2556
Kerem Gencer, Gülcan Gencer
One of the most complex and life-threatening pathologies of the central nervous system is brain tumors. Correct diagnosis of these tumors plays an important role in determining the treatment plans of patients. Traditional classification methods often rely on manual assessments, which can be prone to error. Therefore, multiple classification of brain tumors has gained significant interest in recent years in both the medical and computer science fields. The use of artificial intelligence and machine learning, especially in the automatic classification of brain tumors, is increasing significantly. Deep learning models can achieve high accuracy when trained on datasets in diagnosis and classification. This study examined deep learning-based approaches for automatic multi-class classification of brain tumors, and a new approach combining deep learning and quantum genetic algorithms (QGA) was proposed. The powerful feature extraction ability of the pre-trained EfficientNetB0 was utilized and combined with this quantum genetic algorithms, a new approach was proposed. It is aimed to develop the feature selection method. With this hybrid method, high reliability and accuracy in brain tumor classification was achieved. The proposed model achieved high accuracy of 98.36% and 98.25%, respectively, with different data sets and significantly outperformed traditional methods. As a result, the proposed method offers a robust and scalable solution that will help classify brain tumors in early and accurate diagnosis and contribute to the field of medical imaging with patient outcomes.
{"title":"Hybrid deep learning approach for brain tumor classification using EfficientNetB0 and novel quantum genetic algorithm.","authors":"Kerem Gencer, Gülcan Gencer","doi":"10.7717/peerj-cs.2556","DOIUrl":"10.7717/peerj-cs.2556","url":null,"abstract":"<p><p>One of the most complex and life-threatening pathologies of the central nervous system is brain tumors. Correct diagnosis of these tumors plays an important role in determining the treatment plans of patients. Traditional classification methods often rely on manual assessments, which can be prone to error. Therefore, multiple classification of brain tumors has gained significant interest in recent years in both the medical and computer science fields. The use of artificial intelligence and machine learning, especially in the automatic classification of brain tumors, is increasing significantly. Deep learning models can achieve high accuracy when trained on datasets in diagnosis and classification. This study examined deep learning-based approaches for automatic multi-class classification of brain tumors, and a new approach combining deep learning and quantum genetic algorithms (QGA) was proposed. The powerful feature extraction ability of the pre-trained EfficientNetB0 was utilized and combined with this quantum genetic algorithms, a new approach was proposed. It is aimed to develop the feature selection method. With this hybrid method, high reliability and accuracy in brain tumor classification was achieved. The proposed model achieved high accuracy of 98.36% and 98.25%, respectively, with different data sets and significantly outperformed traditional methods. As a result, the proposed method offers a robust and scalable solution that will help classify brain tumors in early and accurate diagnosis and contribute to the field of medical imaging with patient outcomes.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2556"},"PeriodicalIF":3.5,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}