Pub Date : 2025-01-06eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1501184
Siu Cheong Ho, Yiliang Chen, Yao Jie Xie, Wing-Fai Yeung, Shu-Cheng Chen, Jing Qin
Traditional Chinese medicine (TCM) has long utilized tongue diagnosis as a crucial method for assessing internal visceral condition. This study aims to modernize this ancient practice by developing an automated system for analyzing tongue images in relation to the five organs, corresponding to the heart, liver, spleen, lung, and kidney-collectively known as the "five viscera" in TCM. We propose a novel tongue image partitioning algorithm that divides the tongue into four regions associated with these specific organs, according to TCM principles. These partitioned regions are then processed by our newly developed OrganNet, a specialized neural network designed to focus on organ-specific features. Our method simulates the TCM diagnostic process while leveraging modern machine learning techniques. To support this research, we have created a comprehensive tongue image dataset specifically tailored for these five visceral pattern assessment. Results demonstrate the effectiveness of our approach in accurately identifying correlations between tongue regions and visceral conditions. This study bridges TCM practices with contemporary technology, potentially enhancing diagnostic accuracy and efficiency in both TCM and modern medical contexts.
{"title":"Visceral condition assessment through digital tongue image analysis.","authors":"Siu Cheong Ho, Yiliang Chen, Yao Jie Xie, Wing-Fai Yeung, Shu-Cheng Chen, Jing Qin","doi":"10.3389/frai.2024.1501184","DOIUrl":"10.3389/frai.2024.1501184","url":null,"abstract":"<p><p>Traditional Chinese medicine (TCM) has long utilized tongue diagnosis as a crucial method for assessing internal visceral condition. This study aims to modernize this ancient practice by developing an automated system for analyzing tongue images in relation to the five organs, corresponding to the heart, liver, spleen, lung, and kidney-collectively known as the \"five viscera\" in TCM. We propose a novel tongue image partitioning algorithm that divides the tongue into four regions associated with these specific organs, according to TCM principles. These partitioned regions are then processed by our newly developed OrganNet, a specialized neural network designed to focus on organ-specific features. Our method simulates the TCM diagnostic process while leveraging modern machine learning techniques. To support this research, we have created a comprehensive tongue image dataset specifically tailored for these five visceral pattern assessment. Results demonstrate the effectiveness of our approach in accurately identifying correlations between tongue regions and visceral conditions. This study bridges TCM practices with contemporary technology, potentially enhancing diagnostic accuracy and efficiency in both TCM and modern medical contexts.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1501184"},"PeriodicalIF":3.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1457247
Estibaliz García-Huete, Sara Ignacio-Cerrato, David Pacios, José Luis Vázquez-Poletti, María José Pérez-Serrano, Andrea Donofrio, Clemente Cesarano, Nikolaos Schetakis, Alessio Di Iorio
This study explores the evolving role of social media in the spread of misinformation during the Ukraine-Russia conflict, with a focus on how artificial intelligence (AI) contributes to the creation of deceptive war imagery. Specifically, the research examines the relationship between color patterns (LUTs) in war-related visuals and their perceived authenticity, highlighting the economic, political, and social ramifications of such manipulative practices. AI technologies have significantly advanced the production of highly convincing, yet artificial, war imagery, blurring the line between fact and fiction. An experimental project is proposed to train a generative AI model capable of creating war imagery that mimics real-life footage. By analyzing the success of this experiment, the study aims to establish a link between specific color patterns and the likelihood of images being perceived as authentic. This could shed light on the mechanics of visual misinformation and manipulation. Additionally, the research investigates the potential of a serverless AI framework to advance both the generation and detection of fake news, marking a pivotal step in the fight against digital misinformation. Ultimately, the study seeks to contribute to ongoing debates on the ethical implications of AI in information manipulation and to propose strategies to combat these challenges in the digital era.
{"title":"Evaluating the role of generative AI and color patterns in the dissemination of war imagery and disinformation on social media.","authors":"Estibaliz García-Huete, Sara Ignacio-Cerrato, David Pacios, José Luis Vázquez-Poletti, María José Pérez-Serrano, Andrea Donofrio, Clemente Cesarano, Nikolaos Schetakis, Alessio Di Iorio","doi":"10.3389/frai.2024.1457247","DOIUrl":"10.3389/frai.2024.1457247","url":null,"abstract":"<p><p>This study explores the evolving role of social media in the spread of misinformation during the Ukraine-Russia conflict, with a focus on how artificial intelligence (AI) contributes to the creation of deceptive war imagery. Specifically, the research examines the relationship between color patterns (LUTs) in war-related visuals and their perceived authenticity, highlighting the economic, political, and social ramifications of such manipulative practices. AI technologies have significantly advanced the production of highly convincing, yet artificial, war imagery, blurring the line between fact and fiction. An experimental project is proposed to train a generative AI model capable of creating war imagery that mimics real-life footage. By analyzing the success of this experiment, the study aims to establish a link between specific color patterns and the likelihood of images being perceived as authentic. This could shed light on the mechanics of visual misinformation and manipulation. Additionally, the research investigates the potential of a serverless AI framework to advance both the generation and detection of fake news, marking a pivotal step in the fight against digital misinformation. Ultimately, the study seeks to contribute to ongoing debates on the ethical implications of AI in information manipulation and to propose strategies to combat these challenges in the digital era.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1457247"},"PeriodicalIF":3.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1428716
Anish Salvi, Leo Arnal, Kevin Ly, Gabriel Ferreira, Sophia Y Wang, Curtis Langlotz, Vinit Mahajan, Chase A Ludwig
Given close relationships between ocular structure and ophthalmic disease, ocular biometry measurements (including axial length, lens thickness, anterior chamber depth, and keratometry values) may be leveraged as features in the prediction of eye diseases. However, ocular biometry measurements are often stored as PDFs rather than as structured data in electronic health records. Thus, time-consuming and laborious manual data entry is required for using biometry data as a disease predictor. Herein, we used two separate models, PaddleOCR and Gemini, to extract eye specific biometric measurements from 2,965 Lenstar, 104 IOL Master 500, and 3,616 IOL Master 700 optical biometry reports. For each patient eye, our text extraction pipeline, referred to as Ocular Biometry OCR, involves 1) cropping the report to the biometric data, 2) extracting the text via the optical character recognition model, 3) post-processing the metrics and values into key value pairs, 4) correcting erroneous angles within the pairs, 5) computing the number of errors or missing values, and 6) selecting the window specific results with fewest errors or missing values. To ensure the models' predictions could be put into a machine learning-ready format, artifacts were removed from categorical text data through manual modification where necessary. Performance was evaluated by scoring PaddleOCR and Gemini results. In the absence of ground truth, higher scoring indicated greater inter-model reliability, assuming an equal value between models indicated an accurate result. The detection scores, measuring the number of valid values (i.e., not missing or erroneous), were Lenstar: 0.990, IOLM 500: 1.000, and IOLM 700: 0.998. The similarity scores, measuring the number of equal values, were Lenstar: 0.995, IOLM 500: 0.999, and IOLM 700: 0.999. The agreement scores, combining detection and similarity scores, were Lenstar: 0.985, IOLM 500: 0.999, and IOLM 700: 0.998. IOLM 500 was annotated for ground truths; in this case, higher scoring indicated greater model-to-annotator accuracy. PaddleOCR-to-Annotator achieved scores of detection: 1.000, similarity: 0.999, and agreement: 0.999. Gemini-to-Annotator achieved scores of detection: 1.000, similarity: 1.000, and agreement: 1.000. Scores range from 0 to 1. While PaddleOCR and Gemini demonstrated high agreement, PaddleOCR offered slightly better performance upon reviewing quantitative and qualitative results.
{"title":"Ocular Biometry OCR: a machine learning algorithm leveraging optical character recognition to extract intra ocular lens biometry measurements.","authors":"Anish Salvi, Leo Arnal, Kevin Ly, Gabriel Ferreira, Sophia Y Wang, Curtis Langlotz, Vinit Mahajan, Chase A Ludwig","doi":"10.3389/frai.2024.1428716","DOIUrl":"https://doi.org/10.3389/frai.2024.1428716","url":null,"abstract":"<p><p>Given close relationships between ocular structure and ophthalmic disease, ocular biometry measurements (including axial length, lens thickness, anterior chamber depth, and keratometry values) may be leveraged as features in the prediction of eye diseases. However, ocular biometry measurements are often stored as PDFs rather than as structured data in electronic health records. Thus, time-consuming and laborious manual data entry is required for using biometry data as a disease predictor. Herein, we used two separate models, PaddleOCR and Gemini, to extract eye specific biometric measurements from 2,965 Lenstar, 104 IOL Master 500, and 3,616 IOL Master 700 optical biometry reports. For each patient eye, our text extraction pipeline, referred to as Ocular Biometry OCR, involves 1) cropping the report to the biometric data, 2) extracting the text via the optical character recognition model, 3) post-processing the metrics and values into key value pairs, 4) correcting erroneous angles within the pairs, 5) computing the number of errors or missing values, and 6) selecting the window specific results with fewest errors or missing values. To ensure the models' predictions could be put into a machine learning-ready format, artifacts were removed from categorical text data through manual modification where necessary. Performance was evaluated by scoring PaddleOCR and Gemini results. In the absence of ground truth, higher scoring indicated greater inter-model reliability, assuming an equal value between models indicated an accurate result. The detection scores, measuring the number of valid values (i.e., not missing or erroneous), were Lenstar: 0.990, IOLM 500: 1.000, and IOLM 700: 0.998. The similarity scores, measuring the number of equal values, were Lenstar: 0.995, IOLM 500: 0.999, and IOLM 700: 0.999. The agreement scores, combining detection and similarity scores, were Lenstar: 0.985, IOLM 500: 0.999, and IOLM 700: 0.998. IOLM 500 was annotated for ground truths; in this case, higher scoring indicated greater model-to-annotator accuracy. PaddleOCR-to-Annotator achieved scores of detection: 1.000, similarity: 0.999, and agreement: 0.999. Gemini-to-Annotator achieved scores of detection: 1.000, similarity: 1.000, and agreement: 1.000. Scores range from 0 to 1. While PaddleOCR and Gemini demonstrated high agreement, PaddleOCR offered slightly better performance upon reviewing quantitative and qualitative results.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1428716"},"PeriodicalIF":3.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1477535
Jacob Matalon, August Spurzem, Sana Ahsan, Elizabeth White, Ronik Kothari, Madhu Varma
Introduction: As artificial intelligence systems like large language models (LLM) and natural language processing advance, the need to evaluate their utility within medicine and medical education grows. As medical research publications continue to grow exponentially, AI systems offer valuable opportunities to condense and synthesize information, especially in underrepresented areas such as Sleep Medicine. The present study aims to compare summarization capacity between LLM generated summaries of sleep medicine research article abstracts, to summaries generated by Medical Student (humans) and to evaluate if the research content, and literary readability summarized is retained comparably.
Methods: A collection of three AI-generated and human-generated summaries of sleep medicine research article abstracts were shared with 19 study participants (medical students) attending a sleep medicine conference. Participants were blind as to which summary was human or LLM generated. After reading both human and AI-generated research summaries participants completed a 1-5 Likert scale survey on the readability of the extracted writings. Participants also answered article-specific multiple-choice questions evaluating their comprehension of the summaries, as a representation of the quality of content retained by the AI-generated summaries.
Results: An independent sample t-test between the AI-generated and human-generated summaries comprehension by study participants revealed no significant difference between the Likert readability ratings (p = 0.702). A chi-squared test of proportions revealed no significant association (χ2 = 1.485, p = 0.223), and a McNemar test revealed no significant association between summary type and the proportion of correct responses to the comprehension multiple choice questions (p = 0.289).
Discussion: Some limitations in this study were a small number of participants and user bias. Participants attended at a sleep conference and study summaries were all from sleep medicine journals. Lastly the summaries did not include graphs, numbers, and pictures, and thus were limited in material extraction. While the present analysis did not demonstrate a significant difference among the readability and content quality between the AI and human-generated summaries, limitations in the present study indicate that more research is needed to objectively measure, and further define strengths and weaknesses of AI models in condensing medical literature into efficient and accurate summaries.
{"title":"Reader's digest version of scientific writing: comparative evaluation of summarization capacity between large language models and medical students in analyzing scientific writing in sleep medicine.","authors":"Jacob Matalon, August Spurzem, Sana Ahsan, Elizabeth White, Ronik Kothari, Madhu Varma","doi":"10.3389/frai.2024.1477535","DOIUrl":"https://doi.org/10.3389/frai.2024.1477535","url":null,"abstract":"<p><strong>Introduction: </strong>As artificial intelligence systems like large language models (LLM) and natural language processing advance, the need to evaluate their utility within medicine and medical education grows. As medical research publications continue to grow exponentially, AI systems offer valuable opportunities to condense and synthesize information, especially in underrepresented areas such as Sleep Medicine. The present study aims to compare summarization capacity between LLM generated summaries of sleep medicine research article abstracts, to summaries generated by Medical Student (humans) and to evaluate if the research content, and literary readability summarized is retained comparably.</p><p><strong>Methods: </strong>A collection of three AI-generated and human-generated summaries of sleep medicine research article abstracts were shared with 19 study participants (medical students) attending a sleep medicine conference. Participants were blind as to which summary was human or LLM generated. After reading both human and AI-generated research summaries participants completed a 1-5 Likert scale survey on the readability of the extracted writings. Participants also answered article-specific multiple-choice questions evaluating their comprehension of the summaries, as a representation of the quality of content retained by the AI-generated summaries.</p><p><strong>Results: </strong>An independent sample t-test between the AI-generated and human-generated summaries comprehension by study participants revealed no significant difference between the Likert readability ratings (<i>p</i> = 0.702). A chi-squared test of proportions revealed no significant association (<i>χ</i> <sup>2</sup> = 1.485, <i>p</i> = 0.223), and a McNemar test revealed no significant association between summary type and the proportion of correct responses to the comprehension multiple choice questions (<i>p</i> = 0.289).</p><p><strong>Discussion: </strong>Some limitations in this study were a small number of participants and user bias. Participants attended at a sleep conference and study summaries were all from sleep medicine journals. Lastly the summaries did not include graphs, numbers, and pictures, and thus were limited in material extraction. While the present analysis did not demonstrate a significant difference among the readability and content quality between the AI and human-generated summaries, limitations in the present study indicate that more research is needed to objectively measure, and further define strengths and weaknesses of AI models in condensing medical literature into efficient and accurate summaries.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1477535"},"PeriodicalIF":3.0,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Detecting programmed death ligand 1 (PD-L1) expression based on immunohistochemical (IHC) staining is an important guide for the treatment of lung cancer with immune checkpoint inhibitors. However, this method has problems such as high staining costs, tumor heterogeneity, and subjective differences among pathologists. Therefore, the application of deep learning models to segment and quantitatively predict PD-L1 expression in digital sections of Hematoxylin and eosin (H&E) stained lung squamous cell carcinoma is of great significance.
Methods: We constructed a dataset comprising H&E-stained digital sections of lung squamous cell carcinoma and used a Transformer Unet (TransUnet) deep learning network with an encoder-decoder design to segment PD-L1 negative and positive regions and quantitatively predict the tumor cell positive score (TPS).
Results: The results showed that the dice similarity coefficient (DSC) and intersection overunion (IoU) of deep learning for PD-L1 expression segmentation of H&E-stained digital slides of lung squamous cell carcinoma were 80 and 72%, respectively, which were better than the other seven cutting-edge segmentation models. The root mean square error (RMSE) of quantitative prediction TPS was 26.8, and the intra-group correlation coefficients with the gold standard was 0.92 (95% CI: 0.90-0.93), which was better than the consistency between the results of five pathologists and the gold standard.
Conclusion: The deep learning model is capable of segmenting and quantitatively predicting PD-L1 expression in H&E-stained digital sections of lung squamous cell carcinoma, which has significant implications for the application and guidance of immune checkpoint inhibitor treatments. And the link to the code is https://github.com/Baron-Huang/PD-L1-prediction-via-HE-image.
{"title":"Prediction of PD-L1 tumor positive score in lung squamous cell carcinoma with H&E staining images and deep learning.","authors":"Qiushi Wang, Xixiang Deng, Pan Huang, Qiang Ma, Lianhua Zhao, Yangyang Feng, Yiying Wang, Yuan Zhao, Yan Chen, Peng Zhong, Peng He, Mingrui Ma, Peng Feng, Hualiang Xiao","doi":"10.3389/frai.2024.1452563","DOIUrl":"https://doi.org/10.3389/frai.2024.1452563","url":null,"abstract":"<p><strong>Background: </strong>Detecting programmed death ligand 1 (PD-L1) expression based on immunohistochemical (IHC) staining is an important guide for the treatment of lung cancer with immune checkpoint inhibitors. However, this method has problems such as high staining costs, tumor heterogeneity, and subjective differences among pathologists. Therefore, the application of deep learning models to segment and quantitatively predict PD-L1 expression in digital sections of Hematoxylin and eosin (H&E) stained lung squamous cell carcinoma is of great significance.</p><p><strong>Methods: </strong>We constructed a dataset comprising H&E-stained digital sections of lung squamous cell carcinoma and used a Transformer Unet (TransUnet) deep learning network with an encoder-decoder design to segment PD-L1 negative and positive regions and quantitatively predict the tumor cell positive score (TPS).</p><p><strong>Results: </strong>The results showed that the dice similarity coefficient (DSC) and intersection overunion (IoU) of deep learning for PD-L1 expression segmentation of H&E-stained digital slides of lung squamous cell carcinoma were 80 and 72%, respectively, which were better than the other seven cutting-edge segmentation models. The root mean square error (RMSE) of quantitative prediction TPS was 26.8, and the intra-group correlation coefficients with the gold standard was 0.92 (95% CI: 0.90-0.93), which was better than the consistency between the results of five pathologists and the gold standard.</p><p><strong>Conclusion: </strong>The deep learning model is capable of segmenting and quantitatively predicting PD-L1 expression in H&E-stained digital sections of lung squamous cell carcinoma, which has significant implications for the application and guidance of immune checkpoint inhibitor treatments. And the link to the code is https://github.com/Baron-Huang/PD-L1-prediction-via-HE-image.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1452563"},"PeriodicalIF":3.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695341/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142932821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1509179
Georgios Tzoumanekas, Michail Chatzianastasis, Loukas Ilias, George Kiokes, John Psarras, Dimitris Askounis
Social media platforms, including X, Facebook, and Instagram, host millions of daily users, giving rise to bots automated programs disseminating misinformation and ideologies with tangible real-world consequences. While bot detection in platform X has been the area of many deep learning models with adequate results, most approaches neglect the graph structure of social media relationships and often rely on hand-engineered architectures. Our work introduces the implementation of a Neural Architecture Search (NAS) technique, namely Deep and Flexible Graph Neural Architecture Search (DFG-NAS), tailored to Relational Graph Convolutional Neural Networks (RGCNs) in the task of bot detection in platform X. Our model constructs a graph that incorporates both the user relationships and their metadata. Then, DFG-NAS is adapted to automatically search for the optimal configuration of Propagation and Transformation functions in the RGCNs. Our experiments are conducted on the TwiBot-20 dataset, constructing a graph with 229,580 nodes and 227,979 edges. We study the five architectures with the highest performance during the search and achieve an accuracy of 85.7%, surpassing state-of-the-art models. Our approach not only addresses the bot detection challenge but also advocates for the broader implementation of NAS models in neural network design automation.
{"title":"A graph neural architecture search approach for identifying bots in social media.","authors":"Georgios Tzoumanekas, Michail Chatzianastasis, Loukas Ilias, George Kiokes, John Psarras, Dimitris Askounis","doi":"10.3389/frai.2024.1509179","DOIUrl":"https://doi.org/10.3389/frai.2024.1509179","url":null,"abstract":"<p><p>Social media platforms, including X, Facebook, and Instagram, host millions of daily users, giving rise to bots automated programs disseminating misinformation and ideologies with tangible real-world consequences. While bot detection in platform X has been the area of many deep learning models with adequate results, most approaches neglect the graph structure of social media relationships and often rely on hand-engineered architectures. Our work introduces the implementation of a Neural Architecture Search (NAS) technique, namely Deep and Flexible Graph Neural Architecture Search (DFG-NAS), tailored to Relational Graph Convolutional Neural Networks (RGCNs) in the task of bot detection in platform X. Our model constructs a graph that incorporates both the user relationships and their metadata. Then, DFG-NAS is adapted to automatically search for the optimal configuration of Propagation and Transformation functions in the RGCNs. Our experiments are conducted on the TwiBot-20 dataset, constructing a graph with 229,580 nodes and 227,979 edges. We study the five architectures with the highest performance during the search and achieve an accuracy of 85.7%, surpassing state-of-the-art models. Our approach not only addresses the bot detection challenge but also advocates for the broader implementation of NAS models in neural network design automation.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1509179"},"PeriodicalIF":3.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142932805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the foremost causes of global healthcare burden is cancer of the gastrointestinal tract. The medical records, lab results, radiographs, endoscopic images, tissue samples, and medical histories of patients with gastrointestinal malignancies provide an enormous amount of medical data. There are encouraging signs that the advent of artificial intelligence could enhance the treatment of gastrointestinal issues with this data. Deep learning algorithms can swiftly and effectively analyze unstructured, high-dimensional data, including texts, images, and waveforms, while advanced machine learning approaches could reveal new insights into disease risk factors and phenotypes. In summary, artificial intelligence has the potential to revolutionize various features of gastrointestinal cancer care, such as early detection, diagnosis, therapy, and prognosis. This paper highlights some of the many potential applications of artificial intelligence in this domain. Additionally, we discuss the present state of the discipline and its potential future developments.
{"title":"Artificial intelligence: clinical applications and future advancement in gastrointestinal cancers.","authors":"Abolfazl Akbari, Maryam Adabi, Mohsen Masoodi, Abolfazl Namazi, Fatemeh Mansouri, Seidamir Pasha Tabaeian, Zahra Shokati Eshkiki","doi":"10.3389/frai.2024.1446693","DOIUrl":"https://doi.org/10.3389/frai.2024.1446693","url":null,"abstract":"<p><p>One of the foremost causes of global healthcare burden is cancer of the gastrointestinal tract. The medical records, lab results, radiographs, endoscopic images, tissue samples, and medical histories of patients with gastrointestinal malignancies provide an enormous amount of medical data. There are encouraging signs that the advent of artificial intelligence could enhance the treatment of gastrointestinal issues with this data. Deep learning algorithms can swiftly and effectively analyze unstructured, high-dimensional data, including texts, images, and waveforms, while advanced machine learning approaches could reveal new insights into disease risk factors and phenotypes. In summary, artificial intelligence has the potential to revolutionize various features of gastrointestinal cancer care, such as early detection, diagnosis, therapy, and prognosis. This paper highlights some of the many potential applications of artificial intelligence in this domain. Additionally, we discuss the present state of the discipline and its potential future developments.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1446693"},"PeriodicalIF":3.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-19eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1479905
Jingxuan Tu, Kyeongmin Rim, Bingyang Ye, Kenneth Lai, James Pustejovsky
Multimodal dialogue involving multiple participants presents complex computational challenges, primarily due to the rich interplay of diverse communicative modalities including speech, gesture, action, and gaze. These modalities interact in complex ways that traditional dialogue systems often struggle to accurately track and interpret. To address these challenges, we extend the textual enrichment strategy of Dense Paraphrasing (DP), by translating each nonverbal modality into linguistic expressions. By normalizing multimodal information into a language-based form, we hope to both simplify the representation for and enhance the computational understanding of situated dialogues. We show the effectiveness of the dense paraphrased language form by evaluating instruction-tuned Large Language Models (LLMs) against the Common Ground Tracking (CGT) problem using a publicly available collaborative problem-solving dialogue dataset. Instead of using multimodal LLMs, the dense paraphrasing technique represents the dialogue information from multiple modalities in a compact and structured machine-readable text format that can be directly processed by the language-only models. We leverage the capability of LLMs to transform machine-readable paraphrases into human-readable paraphrases, and show that this process can further improve the result on the CGT task. Overall, the results show that augmenting the context with dense paraphrasing effectively facilitates the LLMs' alignment of information from multiple modalities, and in turn largely improves the performance of common ground reasoning over the baselines. Our proposed pipeline with original utterances as input context already achieves comparable results to the baseline that utilized decontextualized utterances which contain rich coreference information. When also using the decontextualized input, our pipeline largely improves the performance of common ground reasoning over the baselines. We discuss the potential of DP to create a robust model that can effectively interpret and integrate the subtleties of multimodal communication, thereby improving dialogue system performance in real-world settings.
{"title":"Dense Paraphrasing for multimodal dialogue interpretation.","authors":"Jingxuan Tu, Kyeongmin Rim, Bingyang Ye, Kenneth Lai, James Pustejovsky","doi":"10.3389/frai.2024.1479905","DOIUrl":"10.3389/frai.2024.1479905","url":null,"abstract":"<p><p>Multimodal dialogue involving multiple participants presents complex computational challenges, primarily due to the rich interplay of diverse communicative modalities including speech, gesture, action, and gaze. These modalities interact in complex ways that traditional dialogue systems often struggle to accurately track and interpret. To address these challenges, we extend the textual enrichment strategy of Dense Paraphrasing (DP), by translating each nonverbal modality into linguistic expressions. By normalizing multimodal information into a language-based form, we hope to both simplify the representation for and enhance the computational understanding of situated dialogues. We show the effectiveness of the dense paraphrased language form by evaluating instruction-tuned Large Language Models (LLMs) against the Common Ground Tracking (CGT) problem using a publicly available collaborative problem-solving dialogue dataset. Instead of using multimodal LLMs, the dense paraphrasing technique represents the dialogue information from multiple modalities in a compact and structured machine-readable text format that can be directly processed by the language-only models. We leverage the capability of LLMs to transform machine-readable paraphrases into human-readable paraphrases, and show that this process can further improve the result on the CGT task. Overall, the results show that augmenting the context with dense paraphrasing effectively facilitates the LLMs' alignment of information from multiple modalities, and in turn largely improves the performance of common ground reasoning over the baselines. Our proposed pipeline with original utterances as input context already achieves comparable results to the baseline that utilized decontextualized utterances which contain rich coreference information. When also using the decontextualized input, our pipeline largely improves the performance of common ground reasoning over the baselines. We discuss the potential of DP to create a robust model that can effectively interpret and integrate the subtleties of multimodal communication, thereby improving dialogue system performance in real-world settings.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1479905"},"PeriodicalIF":3.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11693678/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142923533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1451228
Tayyeba Bashir, Tan Zhongfu, Burhan Sadiq, Ammara Naseem
There has been a rapid rise in utilization of artificial intelligence (AI) in many different sectors in the last several years. However, business-to-business (B2B) marketing is one of the more notable examples. The initial assessments emphasize the significant advantages of AI in B2B marketing, including its knack for yielding unique understandings into consumer behaviors, recognizing crucial market trends, and improving operational efficiency. However, there seems to be a limited grasp of the optimal way to develop artificial intelligence competencies (AIC) for B2B marketing and how these attributes inevitably affect customer lifetime value (CLV). Equipped with AIC and B2B marketing literary fiction, this research unveils a theoretical research framework for evaluating the repercussions of AIC on B2B marketing capabilities and, subsequently, on CLV. We analyze the suggested research model using partial least squares structural equation modeling (PLS-SEM), leveraging 367 survey replies from Pakistani companies. The outcomes show a significant relationship that describe the ability to leverage AIC to enhance CLV, and also signifies the mediating role of B2B marketing capabilities to enhance CLV by integrating AIC in internet marketing. The findings of this study provide practical implications for marketers to monetize their marketing skills to enhance CLV and researchers with theoretical underpinnings of integration of AIC into marketing.
{"title":"How AI competencies can make B2B marketing smarter: strategies to boost customer lifetime value.","authors":"Tayyeba Bashir, Tan Zhongfu, Burhan Sadiq, Ammara Naseem","doi":"10.3389/frai.2024.1451228","DOIUrl":"10.3389/frai.2024.1451228","url":null,"abstract":"<p><p>There has been a rapid rise in utilization of artificial intelligence (AI) in many different sectors in the last several years. However, business-to-business (B2B) marketing is one of the more notable examples. The initial assessments emphasize the significant advantages of AI in B2B marketing, including its knack for yielding unique understandings into consumer behaviors, recognizing crucial market trends, and improving operational efficiency. However, there seems to be a limited grasp of the optimal way to develop artificial intelligence competencies (AIC) for B2B marketing and how these attributes inevitably affect customer lifetime value (CLV). Equipped with AIC and B2B marketing literary fiction, this research unveils a theoretical research framework for evaluating the repercussions of AIC on B2B marketing capabilities and, subsequently, on CLV. We analyze the suggested research model using partial least squares structural equation modeling (PLS-SEM), leveraging 367 survey replies from Pakistani companies. The outcomes show a significant relationship that describe the ability to leverage AIC to enhance CLV, and also signifies the mediating role of B2B marketing capabilities to enhance CLV by integrating AIC in internet marketing. The findings of this study provide practical implications for marketers to monetize their marketing skills to enhance CLV and researchers with theoretical underpinnings of integration of AIC into marketing.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1451228"},"PeriodicalIF":3.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11688463/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142915785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18eCollection Date: 2024-01-01DOI: 10.3389/frai.2024.1499913
Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan
Introduction: Musical instrument recognition is a critical component of music information retrieval (MIR), aimed at identifying and classifying instruments from audio recordings. This task poses significant challenges due to the complexity and variability of musical signals.
Methods: In this study, we employed convolutional neural networks (CNNs) to analyze the contributions of various spectrogram representations-STFT, Log-Mel, MFCC, Chroma, Spectral Contrast, and Tonnetz-to the classification of ten different musical instruments. The NSynth database was used for training and evaluation. Visual heatmap analysis and statistical metrics, including Difference Mean, KL Divergence, JS Divergence, and Earth Mover's Distance, were utilized to assess feature importance and model interpretability.
Results: Our findings highlight the strengths and limitations of each spectrogram type in capturing distinctive features of different instruments. MFCC and Log-Mel spectrograms demonstrated superior performance across most instruments, while others provided insights into specific characteristics.
Discussion: This analysis provides some insights into optimizing spectrogram-based approaches for musical instrument recognition, offering guidance for future model development and improving interpretability through statistical and visual analyses.
{"title":"Interpreting CNN models for musical instrument recognition using multi-spectrogram heatmap analysis: a preliminary study.","authors":"Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan","doi":"10.3389/frai.2024.1499913","DOIUrl":"10.3389/frai.2024.1499913","url":null,"abstract":"<p><strong>Introduction: </strong>Musical instrument recognition is a critical component of music information retrieval (MIR), aimed at identifying and classifying instruments from audio recordings. This task poses significant challenges due to the complexity and variability of musical signals.</p><p><strong>Methods: </strong>In this study, we employed convolutional neural networks (CNNs) to analyze the contributions of various spectrogram representations-STFT, Log-Mel, MFCC, Chroma, Spectral Contrast, and Tonnetz-to the classification of ten different musical instruments. The NSynth database was used for training and evaluation. Visual heatmap analysis and statistical metrics, including Difference Mean, KL Divergence, JS Divergence, and Earth Mover's Distance, were utilized to assess feature importance and model interpretability.</p><p><strong>Results: </strong>Our findings highlight the strengths and limitations of each spectrogram type in capturing distinctive features of different instruments. MFCC and Log-Mel spectrograms demonstrated superior performance across most instruments, while others provided insights into specific characteristics.</p><p><strong>Discussion: </strong>This analysis provides some insights into optimizing spectrogram-based approaches for musical instrument recognition, offering guidance for future model development and improving interpretability through statistical and visual analyses.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1499913"},"PeriodicalIF":3.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11688478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142915786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}