Pub Date : 2025-01-13eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1454258
Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger
Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.
{"title":"Evaluating the effectiveness of prompt engineering for knowledge graph question answering.","authors":"Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger","doi":"10.3389/frai.2024.1454258","DOIUrl":"https://doi.org/10.3389/frai.2024.1454258","url":null,"abstract":"<p><p>Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1454258"},"PeriodicalIF":3.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143052586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-13eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1472411
Jack Grieve, Sara Bartl, Matteo Fuoli, Jason Grafmiller, Weihang Huang, Alejandro Jawerbaum, Akira Murakami, Marcus Perlman, Dana Roemling, Bodo Winter
In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling varieties of language, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: social bias, domain adaptation, alignment, language change, and scale. We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.
{"title":"The sociolinguistic foundations of language modeling.","authors":"Jack Grieve, Sara Bartl, Matteo Fuoli, Jason Grafmiller, Weihang Huang, Alejandro Jawerbaum, Akira Murakami, Marcus Perlman, Dana Roemling, Bodo Winter","doi":"10.3389/frai.2024.1472411","DOIUrl":"https://doi.org/10.3389/frai.2024.1472411","url":null,"abstract":"<p><p>In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling <i>varieties of language</i>, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: <i>social bias, domain adaptation, alignment, language change</i>, and <i>scale</i>. We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1472411"},"PeriodicalIF":3.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143052629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-10eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1539588
Monir Abdullah
Cardiac disease refers to diseases that affect the heart such as coronary artery diseases, arrhythmia and heart defects and is amongst the most difficult health conditions known to humanity. According to the WHO, heart disease is the foremost cause of mortality worldwide, causing an estimated 17.8 million deaths every year it consumes a significant amount of time as well as effort to figure out what is causing this, especially for medical specialists and doctors. Manual methods for detecting cardiac disease are biased and subject to medical specialist variance. In this aspect, machine learning algorithms have proved to be effective and dependable alternatives for detecting and classifying patients who are affected by heart disease. Precise and prompt detection of human heart disease can assist in avoiding heart failure within the initial stages and enhance patient survival. This study proposed a novel Enhanced Multilayer Perceptron (EMLP) framework complemented by data refinement techniques to enhance predictive accuracy. The classification model asses using the CDC cardiac disease dataset and achieved 92% accuracy by surpassing all the traditional methods. The proposed framework demonstrates significant potential for the early detection and prediction of cardiac-related diseases. Experimental results indicate that the Enhanced Multilayer Perceptron (EMLP) model outperformed the other algorithms in terms of accuracy, precision, F1-score, and recall, underscoring its efficacy in cardiac disease detection.
{"title":"Artificial intelligence-based framework for early detection of heart disease using enhanced multilayer perceptron.","authors":"Monir Abdullah","doi":"10.3389/frai.2024.1539588","DOIUrl":"https://doi.org/10.3389/frai.2024.1539588","url":null,"abstract":"<p><p>Cardiac disease refers to diseases that affect the heart such as coronary artery diseases, arrhythmia and heart defects and is amongst the most difficult health conditions known to humanity. According to the WHO, heart disease is the foremost cause of mortality worldwide, causing an estimated 17.8 million deaths every year it consumes a significant amount of time as well as effort to figure out what is causing this, especially for medical specialists and doctors. Manual methods for detecting cardiac disease are biased and subject to medical specialist variance. In this aspect, machine learning algorithms have proved to be effective and dependable alternatives for detecting and classifying patients who are affected by heart disease. Precise and prompt detection of human heart disease can assist in avoiding heart failure within the initial stages and enhance patient survival. This study proposed a novel Enhanced Multilayer Perceptron (EMLP) framework complemented by data refinement techniques to enhance predictive accuracy. The classification model asses using the CDC cardiac disease dataset and achieved 92% accuracy by surpassing all the traditional methods. The proposed framework demonstrates significant potential for the early detection and prediction of cardiac-related diseases. Experimental results indicate that the Enhanced Multilayer Perceptron (EMLP) model outperformed the other algorithms in terms of accuracy, precision, F1-score, and recall, underscoring its efficacy in cardiac disease detection.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1539588"},"PeriodicalIF":3.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11760590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1514896
Huibo Yang, Mengxuan Hu, Amoreena Most, W Anthony Hawkins, Brian Murray, Susan E Smith, Sheng Li, Andrea Sikora
Background: Large language models (LLMs) have demonstrated impressive performance on medical licensing and diagnosis-related exams. However, comparative evaluations to optimize LLM performance and ability in the domain of comprehensive medication management (CMM) are lacking. The purpose of this evaluation was to test various LLMs performance optimization strategies and performance on critical care pharmacotherapy questions used in the assessment of Doctor of Pharmacy students.
Methods: In a comparative analysis using 219 multiple-choice pharmacotherapy questions, five LLMs (GPT-3.5, GPT-4, Claude 2, Llama2-7b and 2-13b) were evaluated. Each LLM was queried five times to evaluate the primary outcome of accuracy (i.e., correctness). Secondary outcomes included variance, the impact of prompt engineering techniques (e.g., chain-of-thought, CoT) and training of a customized GPT on performance, and comparison to third year doctor of pharmacy students on knowledge recall vs. knowledge application questions. Accuracy and variance were compared with student's t-test to compare performance under different model settings.
Results: ChatGPT-4 exhibited the highest accuracy (71.6%), while Llama2-13b had the lowest variance (0.070). All LLMs performed more accurately on knowledge recall vs. knowledge application questions (e.g., ChatGPT-4: 87% vs. 67%). When applied to ChatGPT-4, few-shot CoT across five runs improved accuracy (77.4% vs. 71.5%) with no effect on variance. Self-consistency and the custom-trained GPT demonstrated similar accuracy to ChatGPT-4 with few-shot CoT. Overall pharmacy student accuracy was 81%, compared to an optimal overall LLM accuracy of 73%. Comparing question types, six of the LLMs demonstrated equivalent or higher accuracy than pharmacy students on knowledge recall questions (e.g., self-consistency vs. students: 93% vs. 84%), but pharmacy students achieved higher accuracy than all LLMs on knowledge application questions (e.g., self-consistency vs. students: 68% vs. 80%).
Conclusion: ChatGPT-4 was the most accurate LLM on critical care pharmacy questions and few-shot CoT improved accuracy the most. Average student accuracy was similar to LLMs overall, and higher on knowledge application questions. These findings support the need for future assessment of customized training for the type of output needed. Reliance on LLMs is only supported with recall-based questions.
{"title":"Evaluating accuracy and reproducibility of large language model performance on critical care assessments in pharmacy education.","authors":"Huibo Yang, Mengxuan Hu, Amoreena Most, W Anthony Hawkins, Brian Murray, Susan E Smith, Sheng Li, Andrea Sikora","doi":"10.3389/frai.2024.1514896","DOIUrl":"10.3389/frai.2024.1514896","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have demonstrated impressive performance on medical licensing and diagnosis-related exams. However, comparative evaluations to optimize LLM performance and ability in the domain of comprehensive medication management (CMM) are lacking. The purpose of this evaluation was to test various LLMs performance optimization strategies and performance on critical care pharmacotherapy questions used in the assessment of Doctor of Pharmacy students.</p><p><strong>Methods: </strong>In a comparative analysis using 219 multiple-choice pharmacotherapy questions, five LLMs (GPT-3.5, GPT-4, Claude 2, Llama2-7b and 2-13b) were evaluated. Each LLM was queried five times to evaluate the primary outcome of accuracy (i.e., correctness). Secondary outcomes included variance, the impact of prompt engineering techniques (e.g., chain-of-thought, CoT) and training of a customized GPT on performance, and comparison to third year doctor of pharmacy students on knowledge recall vs. knowledge application questions. Accuracy and variance were compared with student's t-test to compare performance under different model settings.</p><p><strong>Results: </strong>ChatGPT-4 exhibited the highest accuracy (71.6%), while Llama2-13b had the lowest variance (0.070). All LLMs performed more accurately on knowledge recall vs. knowledge application questions (e.g., ChatGPT-4: 87% vs. 67%). When applied to ChatGPT-4, few-shot CoT across five runs improved accuracy (77.4% vs. 71.5%) with no effect on variance. Self-consistency and the custom-trained GPT demonstrated similar accuracy to ChatGPT-4 with few-shot CoT. Overall pharmacy student accuracy was 81%, compared to an optimal overall LLM accuracy of 73%. Comparing question types, six of the LLMs demonstrated equivalent or higher accuracy than pharmacy students on knowledge recall questions (e.g., self-consistency vs. students: 93% vs. 84%), but pharmacy students achieved higher accuracy than all LLMs on knowledge application questions (e.g., self-consistency vs. students: 68% vs. 80%).</p><p><strong>Conclusion: </strong>ChatGPT-4 was the most accurate LLM on critical care pharmacy questions and few-shot CoT improved accuracy the most. Average student accuracy was similar to LLMs overall, and higher on knowledge application questions. These findings support the need for future assessment of customized training for the type of output needed. Reliance on LLMs is only supported with recall-based questions.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1514896"},"PeriodicalIF":3.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1394363
Tariq Ammar Almoabady, Yasser Mohammad Alblawi, Ahmad Emad Albalawi, Majed M Aborokbah, S Manimurugan, Ahmed Aljuhani, Hussain Aldawood, P Karthikeyan
Introduction: Cyber situational awareness is critical for detecting and mitigating cybersecurity threats in real-time. This study introduces a comprehensive methodology that integrates the Isolation Forest and autoencoder algorithms, Structured Threat Information Expression (STIX) implementation, and ontology development to enhance cybersecurity threat detection and intelligence. The Isolation Forest algorithm excels in anomaly detection in high-dimensional datasets, while autoencoders provide nonlinear detection capabilities and adaptive feature learning. Together, they form a robust framework for proactive anomaly detection.
Methods: The proposed methodology leverages the Isolation Forest for efficient anomaly identification and autoencoders for feature learning and nonlinear anomaly detection. Threat information was standardized using the STIX framework, facilitating structured and dynamic assessment of threat intelligence. Ontology development was employed to represent knowledge systematically and enable semantic correlation of threats. Feature mapping enriched datasets with contextual threat information.
Results: The proposed dual-algorithm framework demonstrated superior performance, achieving 95% accuracy, a 99% F1 score, and a 94.60% recall rate. These results outperformed the benchmarks, highlighting the model's effectiveness in proactive anomaly detection and cyber situational awareness enhancement.
Discussion: The integration of STIX and ontology development within the proposed methodology significantly enhanced threat information standardization and semantic analysis. The dual-algorithm approach provided improved detection capabilities compared to traditional methods, underscoring its potential for scalable and effective cybersecurity applications. Future research could explore further optimization and real-world deployments to refine and validate the approach.
{"title":"Protecting digital assets using an ontology based cyber situational awareness system.","authors":"Tariq Ammar Almoabady, Yasser Mohammad Alblawi, Ahmad Emad Albalawi, Majed M Aborokbah, S Manimurugan, Ahmed Aljuhani, Hussain Aldawood, P Karthikeyan","doi":"10.3389/frai.2024.1394363","DOIUrl":"10.3389/frai.2024.1394363","url":null,"abstract":"<p><strong>Introduction: </strong>Cyber situational awareness is critical for detecting and mitigating cybersecurity threats in real-time. This study introduces a comprehensive methodology that integrates the Isolation Forest and autoencoder algorithms, Structured Threat Information Expression (STIX) implementation, and ontology development to enhance cybersecurity threat detection and intelligence. The Isolation Forest algorithm excels in anomaly detection in high-dimensional datasets, while autoencoders provide nonlinear detection capabilities and adaptive feature learning. Together, they form a robust framework for proactive anomaly detection.</p><p><strong>Methods: </strong>The proposed methodology leverages the Isolation Forest for efficient anomaly identification and autoencoders for feature learning and nonlinear anomaly detection. Threat information was standardized using the STIX framework, facilitating structured and dynamic assessment of threat intelligence. Ontology development was employed to represent knowledge systematically and enable semantic correlation of threats. Feature mapping enriched datasets with contextual threat information.</p><p><strong>Results: </strong>The proposed dual-algorithm framework demonstrated superior performance, achieving 95% accuracy, a 99% F1 score, and a 94.60% recall rate. These results outperformed the benchmarks, highlighting the model's effectiveness in proactive anomaly detection and cyber situational awareness enhancement.</p><p><strong>Discussion: </strong>The integration of STIX and ontology development within the proposed methodology significantly enhanced threat information standardization and semantic analysis. The dual-algorithm approach provided improved detection capabilities compared to traditional methods, underscoring its potential for scalable and effective cybersecurity applications. Future research could explore further optimization and real-world deployments to refine and validate the approach.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1394363"},"PeriodicalIF":3.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1498956
Yuemin Wang, Ian Stavness
Introduction: Active learning can significantly decrease the labeling cost of deep learning workflows by prioritizing the limited labeling budget to high-impact data points that have the highest positive impact on model accuracy. Active learning is especially useful for semantic segmentation tasks where we can selectively label only a few high-impact regions within these high-impact images. Most established regional active learning algorithms deploy a static-budget querying strategy where a fixed percentage of regions are queried in each image. A static budget could result in over- or under-labeling images as the number of high-impact regions in each image can vary.
Methods: In this paper, we present a novel dynamic-budget superpixel querying strategy that can query the optimal numbers of high-uncertainty superpixels in an image to improve the querying efficiency of regional active learning algorithms designed for semantic segmentation.
Results: For two distinct datasets, we show that by allowing a dynamic budget for each image, the active learning algorithm is more effective compared to static-budget querying at the same low total labeling budget. We investigate both low- and high-budget scenarios and the impact of superpixel size on our dynamic active learning scheme. In a low-budget scenario, our dynamic-budget querying outperforms static-budget querying by 5.6% mIoU on a specialized agriculture field image dataset and 2.4% mIoU on Cityscapes.
Discussion: The presented dynamic-budget querying strategy is simple, effective, and can be easily adapted to other regional active learning algorithms to further improve the data efficiency of semantic segmentation tasks.
{"title":"Dynamic-budget superpixel active learning for semantic segmentation.","authors":"Yuemin Wang, Ian Stavness","doi":"10.3389/frai.2024.1498956","DOIUrl":"10.3389/frai.2024.1498956","url":null,"abstract":"<p><strong>Introduction: </strong>Active learning can significantly decrease the labeling cost of deep learning workflows by prioritizing the limited labeling budget to high-impact data points that have the highest positive impact on model accuracy. Active learning is especially useful for semantic segmentation tasks where we can selectively label only a few high-impact regions within these high-impact images. Most established regional active learning algorithms deploy a static-budget querying strategy where a fixed percentage of regions are queried in each image. A static budget could result in over- or under-labeling images as the number of high-impact regions in each image can vary.</p><p><strong>Methods: </strong>In this paper, we present a novel dynamic-budget superpixel querying strategy that can query the optimal numbers of high-uncertainty superpixels in an image to improve the querying efficiency of regional active learning algorithms designed for semantic segmentation.</p><p><strong>Results: </strong>For two distinct datasets, we show that by allowing a dynamic budget for each image, the active learning algorithm is more effective compared to static-budget querying at the same low total labeling budget. We investigate both low- and high-budget scenarios and the impact of superpixel size on our dynamic active learning scheme. In a low-budget scenario, our dynamic-budget querying outperforms static-budget querying by 5.6% mIoU on a specialized agriculture field image dataset and 2.4% mIoU on Cityscapes.</p><p><strong>Discussion: </strong>The presented dynamic-budget querying strategy is simple, effective, and can be easily adapted to other regional active learning algorithms to further improve the data efficiency of semantic segmentation tasks.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1498956"},"PeriodicalIF":3.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1464690
Andrea Passerini, Aryo Gema, Pasquale Minervini, Burcu Sayin, Katya Tentori
The impressive performance of modern Large Language Models (LLMs) across a wide range of tasks, along with their often non-trivial errors, has garnered unprecedented attention regarding the potential of AI and its impact on everyday life. While considerable effort has been and continues to be dedicated to overcoming the limitations of current models, the potentials and risks of human-LLM collaboration remain largely underexplored. In this perspective, we argue that enhancing the focus on human-LLM interaction should be a primary target for future LLM research. Specifically, we will briefly examine some of the biases that may hinder effective collaboration between humans and machines, explore potential solutions, and discuss two broader goals-mutual understanding and complementary team performance-that, in our view, future research should address to enhance effective human-LLM reasoning and decision-making.
{"title":"Fostering effective hybrid human-LLM reasoning and decision making.","authors":"Andrea Passerini, Aryo Gema, Pasquale Minervini, Burcu Sayin, Katya Tentori","doi":"10.3389/frai.2024.1464690","DOIUrl":"10.3389/frai.2024.1464690","url":null,"abstract":"<p><p>The impressive performance of modern Large Language Models (LLMs) across a wide range of tasks, along with their often non-trivial errors, has garnered unprecedented attention regarding the potential of AI and its impact on everyday life. While considerable effort has been and continues to be dedicated to overcoming the limitations of current models, the potentials and risks of human-LLM collaboration remain largely underexplored. In this perspective, we argue that enhancing the focus on human-LLM interaction should be a primary target for future LLM research. Specifically, we will briefly examine some of the biases that may hinder effective collaboration between humans and machines, explore potential solutions, and discuss two broader goals-mutual understanding and complementary team performance-that, in our view, future research should address to enhance effective human-LLM reasoning and decision-making.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1464690"},"PeriodicalIF":3.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1467218
Shizhou Ma, Yifeng Zhang, Delong Li, Yixin Sun, Zhaowen Qiu, Lei Wei, Suyu Dong
Introduction: In clinical, the echocardiogram is the most widely used for diagnosing heart diseases. Different heart diseases are diagnosed based on different views of the echocardiogram images, so efficient echocardiogram view classification can help cardiologists diagnose heart disease rapidly. Echocardiogram view classification is mainly divided into supervised and semi-supervised methods. The supervised echocardiogram view classification methods have worse generalization performance due to the difficulty of labeling echocardiographic images, while the semi-supervised echocardiogram view classification can achieve acceptable results via a little labeled data. However, the current semi-supervised echocardiogram view classification faces challenges of declining accuracy due to out-of-distribution data and is constrained by complex model structures in clinical application.
Methods: To deal with the above challenges, we proposed a novel open-set semi-supervised method for echocardiogram view classification, SPEMix, which can improve performance and generalization by leveraging out-of-distribution unlabeled data. Our SPEMix consists of two core blocks, DAMix Block and SP Block. DAMix Block can generate a mixed mask that focuses on the valuable regions of echocardiograms at the pixel level to generate high-quality augmented echocardiograms for unlabeled data, improving classification accuracy. SP Block can generate a superclass pseudo-label of unlabeled data from the perspective of the superclass probability distribution, improving the classification generalization by leveraging the superclass pseudolabel.
Results: We also evaluate the generalization of our method on the Unity dataset and the CAMUS dataset. The lightweight model trained with SPEMix can achieve the best classification performance on the publicly available TMED2 dataset.
Discussion: For the first time, we applied the lightweight model to the echocardiogram view classification, which can solve the limits of the clinical application due to the complex model architecture and help cardiologists diagnose heart diseases more efficiently.
{"title":"SPEMix: a lightweight method via superclass pseudo-label and efficient mixup for echocardiogram view classification.","authors":"Shizhou Ma, Yifeng Zhang, Delong Li, Yixin Sun, Zhaowen Qiu, Lei Wei, Suyu Dong","doi":"10.3389/frai.2024.1467218","DOIUrl":"10.3389/frai.2024.1467218","url":null,"abstract":"<p><strong>Introduction: </strong>In clinical, the echocardiogram is the most widely used for diagnosing heart diseases. Different heart diseases are diagnosed based on different views of the echocardiogram images, so efficient echocardiogram view classification can help cardiologists diagnose heart disease rapidly. Echocardiogram view classification is mainly divided into supervised and semi-supervised methods. The supervised echocardiogram view classification methods have worse generalization performance due to the difficulty of labeling echocardiographic images, while the semi-supervised echocardiogram view classification can achieve acceptable results via a little labeled data. However, the current semi-supervised echocardiogram view classification faces challenges of declining accuracy due to out-of-distribution data and is constrained by complex model structures in clinical application.</p><p><strong>Methods: </strong>To deal with the above challenges, we proposed a novel open-set semi-supervised method for echocardiogram view classification, SPEMix, which can improve performance and generalization by leveraging out-of-distribution unlabeled data. Our SPEMix consists of two core blocks, DAMix Block and SP Block. DAMix Block can generate a mixed mask that focuses on the valuable regions of echocardiograms at the pixel level to generate high-quality augmented echocardiograms for unlabeled data, improving classification accuracy. SP Block can generate a superclass pseudo-label of unlabeled data from the perspective of the superclass probability distribution, improving the classification generalization by leveraging the superclass pseudolabel.</p><p><strong>Results: </strong>We also evaluate the generalization of our method on the Unity dataset and the CAMUS dataset. The lightweight model trained with SPEMix can achieve the best classification performance on the publicly available TMED2 dataset.</p><p><strong>Discussion: </strong>For the first time, we applied the lightweight model to the echocardiogram view classification, which can solve the limits of the clinical application due to the complex model architecture and help cardiologists diagnose heart diseases more efficiently.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1467218"},"PeriodicalIF":3.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1444127
Yan Wang, Xiaoye Jin, Rui Qiu, Bo Ma, Sheng Zhang, Xuyang Song, Jinxi He
Introduction: Tumor heterogeneity significantly complicates the selection of effective cancer treatments, as patient responses to drugs can vary widely. Personalized cancer therapy has emerged as a promising strategy to enhance treatment effectiveness and precision. This study aimed to develop a personalized drug recommendation model leveraging genomic profiles to optimize therapeutic outcomes.
Methods: A content-based filtering algorithm was implemented to predict drug sensitivity. Patient features were characterized by the tumor microenvironment (TME), and drug features were represented by drug fingerprints. The model was trained and validated using the Genomics of Drug Sensitivity in Cancer (GDSC) database, followed by independent validation with the Cancer Cell Line Encyclopedia (CCLE) dataset. Clinical application was assessed using The Cancer Genome Atlas (TCGA) dataset, with Best Overall Response (BOR) serving as the clinical efficacy measure. Two multilayer perceptron (MLP) models were built to predict IC50 values for 542 tumor cell lines across 18 drugs.
Results: The model exhibited high predictive accuracy, with correlation coefficients (R) of 0.914 in the training set and 0.902 in the test set. Predictions for cytotoxic drugs, including Docetaxel (R = 0.72) and Cisplatin (R = 0.71), were particularly robust, whereas predictions for targeted therapies were less accurate (R < 0.3). Validation with CCLE (MFI as the endpoint) showed strong correlations (R = 0.67). Application to TCGA data successfully predicted clinical outcomes, including a significant association with 6-month progression-free survival (PFS, P = 0.007, AUC = 0.793).
Discussion: The model demonstrates strong performance across preclinical datasets, showing its potential for real-world application in personalized cancer therapy. By bridging preclinical IC50 and clinical BOR endpoints, this approach provides a promising tool for optimizing patient-specific treatments.
{"title":"Developing and validating a drug recommendation system based on tumor microenvironment and drug fingerprint.","authors":"Yan Wang, Xiaoye Jin, Rui Qiu, Bo Ma, Sheng Zhang, Xuyang Song, Jinxi He","doi":"10.3389/frai.2024.1444127","DOIUrl":"10.3389/frai.2024.1444127","url":null,"abstract":"<p><strong>Introduction: </strong>Tumor heterogeneity significantly complicates the selection of effective cancer treatments, as patient responses to drugs can vary widely. Personalized cancer therapy has emerged as a promising strategy to enhance treatment effectiveness and precision. This study aimed to develop a personalized drug recommendation model leveraging genomic profiles to optimize therapeutic outcomes.</p><p><strong>Methods: </strong>A content-based filtering algorithm was implemented to predict drug sensitivity. Patient features were characterized by the tumor microenvironment (TME), and drug features were represented by drug fingerprints. The model was trained and validated using the Genomics of Drug Sensitivity in Cancer (GDSC) database, followed by independent validation with the Cancer Cell Line Encyclopedia (CCLE) dataset. Clinical application was assessed using The Cancer Genome Atlas (TCGA) dataset, with Best Overall Response (BOR) serving as the clinical efficacy measure. Two multilayer perceptron (MLP) models were built to predict IC<sub>50</sub> values for 542 tumor cell lines across 18 drugs.</p><p><strong>Results: </strong>The model exhibited high predictive accuracy, with correlation coefficients (<i>R</i>) of 0.914 in the training set and 0.902 in the test set. Predictions for cytotoxic drugs, including Docetaxel (<i>R</i> = 0.72) and Cisplatin (<i>R</i> = 0.71), were particularly robust, whereas predictions for targeted therapies were less accurate (<i>R</i> < 0.3). Validation with CCLE (MFI as the endpoint) showed strong correlations (<i>R</i> = 0.67). Application to TCGA data successfully predicted clinical outcomes, including a significant association with 6-month progression-free survival (PFS, <i>P</i> = 0.007, AUC = 0.793).</p><p><strong>Discussion: </strong>The model demonstrates strong performance across preclinical datasets, showing its potential for real-world application in personalized cancer therapy. By bridging preclinical IC<sub>50</sub> and clinical BOR endpoints, this approach provides a promising tool for optimizing patient-specific treatments.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1444127"},"PeriodicalIF":3.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1479855
José Luis Uc Castillo, Ana Elizabeth Marín Celestino, Diego Armando Martínez Cruz, José Tuxpan Vargas, José Alfredo Ramos Leal, Janete Morán Ramírez
This systematic review provides a state-of-art of Artificial Intelligence (AI) models such as Machine Learning (ML) and Deep Learning (DL) development and its applications in Mexico in diverse fields. These models are recognized as powerful tools in many fields due to their capability to carry out several tasks such as forecasting, image classification, recognition, natural language processing, machine translation, etc. This review article aimed to provide comprehensive information on the Machine Learning and Deep Learning algorithms applied in Mexico. A total of 120 original research papers were included and details such as trends in publication, spatial location, institutions, publishing issues, subject areas, algorithms applied, and performance metrics were discussed. Furthermore, future directions and opportunities are presented. A total of 15 subject areas were identified, where Social Sciences and Medicine were the main application areas. It observed that Artificial Neural Networks (ANN) models were preferred, probably due to their capability to learn and model non-linear and complex relationships in addition to other popular models such as Random Forest (RF) and Support Vector Machines (SVM). It identified that the selection and application of the algorithms rely on the study objective and the data patterns. Regarding the performance metrics applied, accuracy and recall were the most employed. This paper could assist the readers in understanding the several Machine Learning and Deep Learning techniques used and their subject area of application in the Artificial Intelligence field in the country. Moreover, the study could provide significant knowledge in the development and implementation of a national AI strategy, according to country needs.
{"title":"A systematic review of Machine Learning and Deep Learning approaches in Mexico: challenges and opportunities.","authors":"José Luis Uc Castillo, Ana Elizabeth Marín Celestino, Diego Armando Martínez Cruz, José Tuxpan Vargas, José Alfredo Ramos Leal, Janete Morán Ramírez","doi":"10.3389/frai.2024.1479855","DOIUrl":"10.3389/frai.2024.1479855","url":null,"abstract":"<p><p>This systematic review provides a state-of-art of Artificial Intelligence (AI) models such as Machine Learning (ML) and Deep Learning (DL) development and its applications in Mexico in diverse fields. These models are recognized as powerful tools in many fields due to their capability to carry out several tasks such as forecasting, image classification, recognition, natural language processing, machine translation, etc. This review article aimed to provide comprehensive information on the Machine Learning and Deep Learning algorithms applied in Mexico. A total of 120 original research papers were included and details such as trends in publication, spatial location, institutions, publishing issues, subject areas, algorithms applied, and performance metrics were discussed. Furthermore, future directions and opportunities are presented. A total of 15 subject areas were identified, where Social Sciences and Medicine were the main application areas. It observed that Artificial Neural Networks (ANN) models were preferred, probably due to their capability to learn and model non-linear and complex relationships in addition to other popular models such as Random Forest (RF) and Support Vector Machines (SVM). It identified that the selection and application of the algorithms rely on the study objective and the data patterns. Regarding the performance metrics applied, accuracy and recall were the most employed. This paper could assist the readers in understanding the several Machine Learning and Deep Learning techniques used and their subject area of application in the Artificial Intelligence field in the country. Moreover, the study could provide significant knowledge in the development and implementation of a national AI strategy, according to country needs.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1479855"},"PeriodicalIF":3.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753225/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}