Frontiers in Artificial Intelligence最新文献_第9页

Visceral condition assessment through digital tongue image analysis. 基于数字舌图像分析的内脏状况评估。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2025-01-06 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1501184

Siu Cheong Ho, Yiliang Chen, Yao Jie Xie, Wing-Fai Yeung, Shu-Cheng Chen, Jing Qin

Traditional Chinese medicine (TCM) has long utilized tongue diagnosis as a crucial method for assessing internal visceral condition. This study aims to modernize this ancient practice by developing an automated system for analyzing tongue images in relation to the five organs, corresponding to the heart, liver, spleen, lung, and kidney-collectively known as the "five viscera" in TCM. We propose a novel tongue image partitioning algorithm that divides the tongue into four regions associated with these specific organs, according to TCM principles. These partitioned regions are then processed by our newly developed OrganNet, a specialized neural network designed to focus on organ-specific features. Our method simulates the TCM diagnostic process while leveraging modern machine learning techniques. To support this research, we have created a comprehensive tongue image dataset specifically tailored for these five visceral pattern assessment. Results demonstrate the effectiveness of our approach in accurately identifying correlations between tongue regions and visceral conditions. This study bridges TCM practices with contemporary technology, potentially enhancing diagnostic accuracy and efficiency in both TCM and modern medical contexts.

长期以来，中医一直将舌诊作为评估内脏疾病的重要方法。本研究旨在通过开发一个自动化系统来分析与五器官相关的舌头图像，从而使这一古老的做法现代化，这五器官分别对应于心、肝、脾、肺和肾，在中医中统称为“五脏”。我们提出了一种新的舌头图像分割算法，根据中医原理将舌头划分为与这些特定器官相关的四个区域。这些划分的区域然后由我们新开发的OrganNet进行处理，这是一种专门的神经网络，旨在专注于器官的特定特征。我们的方法模拟中医诊断过程，同时利用现代机器学习技术。为了支持这项研究，我们专门为这五种内脏模式评估创建了一个全面的舌头图像数据集。结果证明了我们的方法在准确识别舌区和内脏条件之间的相关性方面的有效性。这项研究将中医实践与现代技术联系起来，有可能提高中医和现代医学背景下诊断的准确性和效率。

{"title":"Visceral condition assessment through digital tongue image analysis.","authors":"Siu Cheong Ho, Yiliang Chen, Yao Jie Xie, Wing-Fai Yeung, Shu-Cheng Chen, Jing Qin","doi":"10.3389/frai.2024.1501184","DOIUrl":"10.3389/frai.2024.1501184","url":null,"abstract":"Traditional Chinese medicine (TCM) has long utilized tongue diagnosis as a crucial method for assessing internal visceral condition. This study aims to modernize this ancient practice by developing an automated system for analyzing tongue images in relation to the five organs, corresponding to the heart, liver, spleen, lung, and kidney-collectively known as the \"five viscera\" in TCM. We propose a novel tongue image partitioning algorithm that divides the tongue into four regions associated with these specific organs, according to TCM principles. These partitioned regions are then processed by our newly developed OrganNet, a specialized neural network designed to focus on organ-specific features. Our method simulates the TCM diagnostic process while leveraging modern machine learning techniques. To support this research, we have created a comprehensive tongue image dataset specifically tailored for these five visceral pattern assessment. Results demonstrate the effectiveness of our approach in accurately identifying correlations between tongue regions and visceral conditions. This study bridges TCM practices with contemporary technology, potentially enhancing diagnostic accuracy and efficiency in both TCM and modern medical contexts.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1501184"},"PeriodicalIF":3.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating the role of generative AI and color patterns in the dissemination of war imagery and disinformation on social media. 评估生成人工智能和彩色图案在社交媒体上传播战争图像和虚假信息中的作用。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2025-01-06 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1457247

Estibaliz García-Huete, Sara Ignacio-Cerrato, David Pacios, José Luis Vázquez-Poletti, María José Pérez-Serrano, Andrea Donofrio, Clemente Cesarano, Nikolaos Schetakis, Alessio Di Iorio

This study explores the evolving role of social media in the spread of misinformation during the Ukraine-Russia conflict, with a focus on how artificial intelligence (AI) contributes to the creation of deceptive war imagery. Specifically, the research examines the relationship between color patterns (LUTs) in war-related visuals and their perceived authenticity, highlighting the economic, political, and social ramifications of such manipulative practices. AI technologies have significantly advanced the production of highly convincing, yet artificial, war imagery, blurring the line between fact and fiction. An experimental project is proposed to train a generative AI model capable of creating war imagery that mimics real-life footage. By analyzing the success of this experiment, the study aims to establish a link between specific color patterns and the likelihood of images being perceived as authentic. This could shed light on the mechanics of visual misinformation and manipulation. Additionally, the research investigates the potential of a serverless AI framework to advance both the generation and detection of fake news, marking a pivotal step in the fight against digital misinformation. Ultimately, the study seeks to contribute to ongoing debates on the ethical implications of AI in information manipulation and to propose strategies to combat these challenges in the digital era.

本研究探讨了在乌克兰-俄罗斯冲突期间，社交媒体在错误信息传播中的不断演变的作用，重点是人工智能（AI）如何有助于创造欺骗性的战争图像。具体而言，该研究考察了战争相关视觉图像中的颜色模式（lut）与其感知真实性之间的关系，强调了这种操纵行为的经济、政治和社会后果。人工智能技术极大地推动了高可信度但人为的战争图像的制作，模糊了事实与虚构之间的界限。提出了一个实验项目，以训练能够创建模拟真实镜头的战争图像的生成人工智能模型。通过分析这个实验的成功，该研究旨在建立特定颜色模式和图像被认为是真实的可能性之间的联系。这可能会揭示视觉错误信息和操纵的机制。此外，该研究还调查了无服务器人工智能框架在推进假新闻生成和检测方面的潜力，标志着打击数字错误信息的关键一步。最终，该研究旨在为正在进行的关于人工智能在信息操纵中的伦理影响的辩论做出贡献，并提出应对数字时代这些挑战的策略。

{"title":"Evaluating the role of generative AI and color patterns in the dissemination of war imagery and disinformation on social media.","authors":"Estibaliz García-Huete, Sara Ignacio-Cerrato, David Pacios, José Luis Vázquez-Poletti, María José Pérez-Serrano, Andrea Donofrio, Clemente Cesarano, Nikolaos Schetakis, Alessio Di Iorio","doi":"10.3389/frai.2024.1457247","DOIUrl":"10.3389/frai.2024.1457247","url":null,"abstract":"This study explores the evolving role of social media in the spread of misinformation during the Ukraine-Russia conflict, with a focus on how artificial intelligence (AI) contributes to the creation of deceptive war imagery. Specifically, the research examines the relationship between color patterns (LUTs) in war-related visuals and their perceived authenticity, highlighting the economic, political, and social ramifications of such manipulative practices. AI technologies have significantly advanced the production of highly convincing, yet artificial, war imagery, blurring the line between fact and fiction. An experimental project is proposed to train a generative AI model capable of creating war imagery that mimics real-life footage. By analyzing the success of this experiment, the study aims to establish a link between specific color patterns and the likelihood of images being perceived as authentic. This could shed light on the mechanics of visual misinformation and manipulation. Additionally, the research investigates the potential of a serverless AI framework to advance both the generation and detection of fake news, marking a pivotal step in the fight against digital misinformation. Ultimately, the study seeks to contribute to ongoing debates on the ethical implications of AI in information manipulation and to propose strategies to combat these challenges in the digital era.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1457247"},"PeriodicalIF":3.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ocular Biometry OCR: a machine learning algorithm leveraging optical character recognition to extract intra ocular lens biometry measurements. 眼部生物测量OCR：一种利用光学字符识别来提取晶状体内生物测量数据的机器学习算法。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2025-01-06 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1428716

Anish Salvi, Leo Arnal, Kevin Ly, Gabriel Ferreira, Sophia Y Wang, Curtis Langlotz, Vinit Mahajan, Chase A Ludwig

Given close relationships between ocular structure and ophthalmic disease, ocular biometry measurements (including axial length, lens thickness, anterior chamber depth, and keratometry values) may be leveraged as features in the prediction of eye diseases. However, ocular biometry measurements are often stored as PDFs rather than as structured data in electronic health records. Thus, time-consuming and laborious manual data entry is required for using biometry data as a disease predictor. Herein, we used two separate models, PaddleOCR and Gemini, to extract eye specific biometric measurements from 2,965 Lenstar, 104 IOL Master 500, and 3,616 IOL Master 700 optical biometry reports. For each patient eye, our text extraction pipeline, referred to as Ocular Biometry OCR, involves 1) cropping the report to the biometric data, 2) extracting the text via the optical character recognition model, 3) post-processing the metrics and values into key value pairs, 4) correcting erroneous angles within the pairs, 5) computing the number of errors or missing values, and 6) selecting the window specific results with fewest errors or missing values. To ensure the models' predictions could be put into a machine learning-ready format, artifacts were removed from categorical text data through manual modification where necessary. Performance was evaluated by scoring PaddleOCR and Gemini results. In the absence of ground truth, higher scoring indicated greater inter-model reliability, assuming an equal value between models indicated an accurate result. The detection scores, measuring the number of valid values (i.e., not missing or erroneous), were Lenstar: 0.990, IOLM 500: 1.000, and IOLM 700: 0.998. The similarity scores, measuring the number of equal values, were Lenstar: 0.995, IOLM 500: 0.999, and IOLM 700: 0.999. The agreement scores, combining detection and similarity scores, were Lenstar: 0.985, IOLM 500: 0.999, and IOLM 700: 0.998. IOLM 500 was annotated for ground truths; in this case, higher scoring indicated greater model-to-annotator accuracy. PaddleOCR-to-Annotator achieved scores of detection: 1.000, similarity: 0.999, and agreement: 0.999. Gemini-to-Annotator achieved scores of detection: 1.000, similarity: 1.000, and agreement: 1.000. Scores range from 0 to 1. While PaddleOCR and Gemini demonstrated high agreement, PaddleOCR offered slightly better performance upon reviewing quantitative and qualitative results.

鉴于眼部结构与眼部疾病之间的密切关系，眼生物测量（包括眼轴长度、晶状体厚度、前房深度和角膜测量值）可作为预测眼部疾病的特征。然而，眼部生物测量通常以pdf格式存储，而不是以电子健康记录中的结构化数据存储。因此，使用生物计量数据作为疾病预测器需要进行耗时和费力的手动数据输入。在此，我们使用两个独立的模型，PaddleOCR和Gemini，从2,965份Lenstar、104份IOL Master 500和3,616份IOL Master 700光学生物计量报告中提取眼部特异性生物计量数据。对于每只患者的眼睛，我们的文本提取管道，即眼部生物测量OCR，包括1)将报告裁剪为生物特征数据，2)通过光学字符识别模型提取文本，3)将指标和值后处理为关键值对，4)纠正对内的错误角度，5)计算错误或缺失值的数量，6)选择错误或缺失值最少的窗口特定结果。为了确保模型的预测可以转换为机器学习的格式，在必要时通过手动修改从分类文本数据中删除工件。通过评分PaddleOCR和Gemini结果来评估性能。在没有基础真值的情况下，得分越高表明模型间的可靠性越高，假设模型之间的值相等表明结果准确。检测分数，测量有效值的数量（即没有丢失或错误），为Lenstar: 0.990, IOLM 500: 1.000, IOLM 700: 0.998。相似度得分（衡量相等值的数量）分别为：Lenstar: 0.995, IOLM 500: 0.999, IOLM 700: 0.999。结合检测和相似度得分，一致性得分为Lenstar: 0.985, IOLM 500: 0.999, IOLM 700: 0.998。IOLM 500对基本事实进行了注释；在这种情况下，得分越高表示模型到注释者的准确性越高。PaddleOCR-to-Annotator的检测得分为1.000，相似度为0.999，一致性为0.999。Gemini-to-Annotator的检测得分为1.000，相似度为1.000，一致性为1.000。得分范围从0到1。虽然PaddleOCR和Gemini表现出很高的一致性，但在评估定量和定性结果时，PaddleOCR的表现略好一些。

{"title":"Ocular Biometry OCR: a machine learning algorithm leveraging optical character recognition to extract intra ocular lens biometry measurements.","authors":"Anish Salvi, Leo Arnal, Kevin Ly, Gabriel Ferreira, Sophia Y Wang, Curtis Langlotz, Vinit Mahajan, Chase A Ludwig","doi":"10.3389/frai.2024.1428716","DOIUrl":"https://doi.org/10.3389/frai.2024.1428716","url":null,"abstract":"Given close relationships between ocular structure and ophthalmic disease, ocular biometry measurements (including axial length, lens thickness, anterior chamber depth, and keratometry values) may be leveraged as features in the prediction of eye diseases. However, ocular biometry measurements are often stored as PDFs rather than as structured data in electronic health records. Thus, time-consuming and laborious manual data entry is required for using biometry data as a disease predictor. Herein, we used two separate models, PaddleOCR and Gemini, to extract eye specific biometric measurements from 2,965 Lenstar, 104 IOL Master 500, and 3,616 IOL Master 700 optical biometry reports. For each patient eye, our text extraction pipeline, referred to as Ocular Biometry OCR, involves 1) cropping the report to the biometric data, 2) extracting the text via the optical character recognition model, 3) post-processing the metrics and values into key value pairs, 4) correcting erroneous angles within the pairs, 5) computing the number of errors or missing values, and 6) selecting the window specific results with fewest errors or missing values. To ensure the models' predictions could be put into a machine learning-ready format, artifacts were removed from categorical text data through manual modification where necessary. Performance was evaluated by scoring PaddleOCR and Gemini results. In the absence of ground truth, higher scoring indicated greater inter-model reliability, assuming an equal value between models indicated an accurate result. The detection scores, measuring the number of valid values (i.e., not missing or erroneous), were Lenstar: 0.990, IOLM 500: 1.000, and IOLM 700: 0.998. The similarity scores, measuring the number of equal values, were Lenstar: 0.995, IOLM 500: 0.999, and IOLM 700: 0.999. The agreement scores, combining detection and similarity scores, were Lenstar: 0.985, IOLM 500: 0.999, and IOLM 700: 0.998. IOLM 500 was annotated for ground truths; in this case, higher scoring indicated greater model-to-annotator accuracy. PaddleOCR-to-Annotator achieved scores of detection: 1.000, similarity: 0.999, and agreement: 0.999. Gemini-to-Annotator achieved scores of detection: 1.000, similarity: 1.000, and agreement: 1.000. Scores range from 0 to 1. While PaddleOCR and Gemini demonstrated high agreement, PaddleOCR offered slightly better performance upon reviewing quantitative and qualitative results.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1428716"},"PeriodicalIF":3.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11743993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reader's digest version of scientific writing: comparative evaluation of summarization capacity between large language models and medical students in analyzing scientific writing in sleep medicine. 读者文摘版科学写作：大语言模型与医学生在分析睡眠医学科学写作中总结能力的比较评价

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-24 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1477535

Jacob Matalon, August Spurzem, Sana Ahsan, Elizabeth White, Ronik Kothari, Madhu Varma

Introduction: As artificial intelligence systems like large language models (LLM) and natural language processing advance, the need to evaluate their utility within medicine and medical education grows. As medical research publications continue to grow exponentially, AI systems offer valuable opportunities to condense and synthesize information, especially in underrepresented areas such as Sleep Medicine. The present study aims to compare summarization capacity between LLM generated summaries of sleep medicine research article abstracts, to summaries generated by Medical Student (humans) and to evaluate if the research content, and literary readability summarized is retained comparably.

Methods: A collection of three AI-generated and human-generated summaries of sleep medicine research article abstracts were shared with 19 study participants (medical students) attending a sleep medicine conference. Participants were blind as to which summary was human or LLM generated. After reading both human and AI-generated research summaries participants completed a 1-5 Likert scale survey on the readability of the extracted writings. Participants also answered article-specific multiple-choice questions evaluating their comprehension of the summaries, as a representation of the quality of content retained by the AI-generated summaries.

Results: An independent sample t-test between the AI-generated and human-generated summaries comprehension by study participants revealed no significant difference between the Likert readability ratings (p = 0.702). A chi-squared test of proportions revealed no significant association (χ ² = 1.485, p = 0.223), and a McNemar test revealed no significant association between summary type and the proportion of correct responses to the comprehension multiple choice questions (p = 0.289).

Discussion: Some limitations in this study were a small number of participants and user bias. Participants attended at a sleep conference and study summaries were all from sleep medicine journals. Lastly the summaries did not include graphs, numbers, and pictures, and thus were limited in material extraction. While the present analysis did not demonstrate a significant difference among the readability and content quality between the AI and human-generated summaries, limitations in the present study indicate that more research is needed to objectively measure, and further define strengths and weaknesses of AI models in condensing medical literature into efficient and accurate summaries.

导论：随着大型语言模型（LLM）和自然语言处理等人工智能系统的发展，评估它们在医学和医学教育中的应用的需求也在增长。随着医学研究出版物呈指数级增长，人工智能系统为浓缩和综合信息提供了宝贵的机会，特别是在睡眠医学等代表性不足的领域。本研究旨在比较LLM生成的睡眠医学研究论文摘要与医学生（人类）生成的摘要的总结能力，并评估总结的研究内容和文学可读性是否具有可比性。方法：与参加睡眠医学会议的19名研究参与者（医学生）分享3篇人工智能生成和人工生成的睡眠医学研究文章摘要。参与者不知道哪个摘要是人工生成的还是LLM生成的。在阅读了人类和人工智能生成的研究摘要后，参与者完成了一项1-5李克特量表调查，以评估提取的文章的可读性。参与者还回答了特定于文章的多项选择题，以评估他们对摘要的理解，作为人工智能生成的摘要保留的内容质量的代表。结果：研究参与者对人工智能生成的摘要理解和人类生成的摘要理解之间的独立样本t检验显示，Likert可读性评级之间没有显著差异（p = 0.702）。比例的卡方检验显示无显著相关性（χ 2 = 1.485,p = 0.223），McNemar检验显示总结类型与理解选择题的正确回答比例之间无显著相关性（p = 0.289）。讨论：本研究的一些局限性是参与者数量少和用户偏见。参与者参加了一个睡眠会议，研究总结都来自睡眠医学期刊。最后，摘要没有包含图形、数字和图片，因此在材料提取上受到限制。虽然本分析并未证明人工智能和人类生成的摘要在可读性和内容质量方面存在显著差异，但本研究的局限性表明，需要更多的研究来客观衡量，并进一步界定人工智能模型在将医学文献浓缩为高效、准确的摘要方面的优势和劣势。

{"title":"Reader's digest version of scientific writing: comparative evaluation of summarization capacity between large language models and medical students in analyzing scientific writing in sleep medicine.","authors":"Jacob Matalon, August Spurzem, Sana Ahsan, Elizabeth White, Ronik Kothari, Madhu Varma","doi":"10.3389/frai.2024.1477535","DOIUrl":"https://doi.org/10.3389/frai.2024.1477535","url":null,"abstract":"Introduction: As artificial intelligence systems like large language models (LLM) and natural language processing advance, the need to evaluate their utility within medicine and medical education grows. As medical research publications continue to grow exponentially, AI systems offer valuable opportunities to condense and synthesize information, especially in underrepresented areas such as Sleep Medicine. The present study aims to compare summarization capacity between LLM generated summaries of sleep medicine research article abstracts, to summaries generated by Medical Student (humans) and to evaluate if the research content, and literary readability summarized is retained comparably.Methods: A collection of three AI-generated and human-generated summaries of sleep medicine research article abstracts were shared with 19 study participants (medical students) attending a sleep medicine conference. Participants were blind as to which summary was human or LLM generated. After reading both human and AI-generated research summaries participants completed a 1-5 Likert scale survey on the readability of the extracted writings. Participants also answered article-specific multiple-choice questions evaluating their comprehension of the summaries, as a representation of the quality of content retained by the AI-generated summaries.Results: An independent sample t-test between the AI-generated and human-generated summaries comprehension by study participants revealed no significant difference between the Likert readability ratings (p = 0.702). A chi-squared test of proportions revealed no significant association (χ 2 = 1.485, p = 0.223), and a McNemar test revealed no significant association between summary type and the proportion of correct responses to the comprehension multiple choice questions (p = 0.289).Discussion: Some limitations in this study were a small number of participants and user bias. Participants attended at a sleep conference and study summaries were all from sleep medicine journals. Lastly the summaries did not include graphs, numbers, and pictures, and thus were limited in material extraction. While the present analysis did not demonstrate a significant difference among the readability and content quality between the AI and human-generated summaries, limitations in the present study indicate that more research is needed to objectively measure, and further define strengths and weaknesses of AI models in condensing medical literature into efficient and accurate summaries.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1477535"},"PeriodicalIF":3.0,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction of PD-L1 tumor positive score in lung squamous cell carcinoma with H&E staining images and deep learning. H&E染色图像和深度学习预测肺鳞癌PD-L1肿瘤阳性评分。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-20 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1452563

Qiushi Wang, Xixiang Deng, Pan Huang, Qiang Ma, Lianhua Zhao, Yangyang Feng, Yiying Wang, Yuan Zhao, Yan Chen, Peng Zhong, Peng He, Mingrui Ma, Peng Feng, Hualiang Xiao

Background: Detecting programmed death ligand 1 (PD-L1) expression based on immunohistochemical (IHC) staining is an important guide for the treatment of lung cancer with immune checkpoint inhibitors. However, this method has problems such as high staining costs, tumor heterogeneity, and subjective differences among pathologists. Therefore, the application of deep learning models to segment and quantitatively predict PD-L1 expression in digital sections of Hematoxylin and eosin (H&E) stained lung squamous cell carcinoma is of great significance.

Methods: We constructed a dataset comprising H&E-stained digital sections of lung squamous cell carcinoma and used a Transformer Unet (TransUnet) deep learning network with an encoder-decoder design to segment PD-L1 negative and positive regions and quantitatively predict the tumor cell positive score (TPS).

Results: The results showed that the dice similarity coefficient (DSC) and intersection overunion (IoU) of deep learning for PD-L1 expression segmentation of H&E-stained digital slides of lung squamous cell carcinoma were 80 and 72%, respectively, which were better than the other seven cutting-edge segmentation models. The root mean square error (RMSE) of quantitative prediction TPS was 26.8, and the intra-group correlation coefficients with the gold standard was 0.92 (95% CI: 0.90-0.93), which was better than the consistency between the results of five pathologists and the gold standard.

Conclusion: The deep learning model is capable of segmenting and quantitatively predicting PD-L1 expression in H&E-stained digital sections of lung squamous cell carcinoma, which has significant implications for the application and guidance of immune checkpoint inhibitor treatments. And the link to the code is https://github.com/Baron-Huang/PD-L1-prediction-via-HE-image.

背景：基于免疫组化（IHC）染色检测程序性死亡配体1 （PD-L1）的表达是免疫检查点抑制剂治疗肺癌的重要指导。但该方法存在染色成本高、肿瘤异质性、病理医师主观差异等问题。因此，应用深度学习模型对苏木精和伊红（H&E）染色肺鳞癌数字切片中PD-L1的表达进行分割和定量预测具有重要意义。方法：我们构建了一个包含h&e染色肺鳞癌数字切片的数据集，并使用具有编码器-解码器设计的Transformer Unet （TransUnet）深度学习网络来分割PD-L1阴性和阳性区域，并定量预测肿瘤细胞阳性评分（TPS）。结果：结果显示，深度学习对肺鳞癌h&e染色数字切片PD-L1表达分割的骰子相似系数（DSC）和交叉过union （IoU）分别为80和72%，优于其他7种前沿分割模型。定量预测TPS的均方根误差（RMSE）为26.8，与金标准的组内相关系数为0.92 (95% CI: 0.90 ~ 0.93)，优于5位病理医师结果与金标准的一致性。结论：深度学习模型能够对肺鳞状细胞癌h&e染色数字切片中PD-L1的表达进行分割和定量预测，对免疫检查点抑制剂治疗的应用和指导具有重要意义。代码的链接是https://github.com/Baron-Huang/PD-L1-prediction-via-HE-image。

{"title":"Prediction of PD-L1 tumor positive score in lung squamous cell carcinoma with H&E staining images and deep learning.","authors":"Qiushi Wang, Xixiang Deng, Pan Huang, Qiang Ma, Lianhua Zhao, Yangyang Feng, Yiying Wang, Yuan Zhao, Yan Chen, Peng Zhong, Peng He, Mingrui Ma, Peng Feng, Hualiang Xiao","doi":"10.3389/frai.2024.1452563","DOIUrl":"https://doi.org/10.3389/frai.2024.1452563","url":null,"abstract":"Background: Detecting programmed death ligand 1 (PD-L1) expression based on immunohistochemical (IHC) staining is an important guide for the treatment of lung cancer with immune checkpoint inhibitors. However, this method has problems such as high staining costs, tumor heterogeneity, and subjective differences among pathologists. Therefore, the application of deep learning models to segment and quantitatively predict PD-L1 expression in digital sections of Hematoxylin and eosin (H&E) stained lung squamous cell carcinoma is of great significance.Methods: We constructed a dataset comprising H&E-stained digital sections of lung squamous cell carcinoma and used a Transformer Unet (TransUnet) deep learning network with an encoder-decoder design to segment PD-L1 negative and positive regions and quantitatively predict the tumor cell positive score (TPS).Results: The results showed that the dice similarity coefficient (DSC) and intersection overunion (IoU) of deep learning for PD-L1 expression segmentation of H&E-stained digital slides of lung squamous cell carcinoma were 80 and 72%, respectively, which were better than the other seven cutting-edge segmentation models. The root mean square error (RMSE) of quantitative prediction TPS was 26.8, and the intra-group correlation coefficients with the gold standard was 0.92 (95% CI: 0.90-0.93), which was better than the consistency between the results of five pathologists and the gold standard.Conclusion: The deep learning model is capable of segmenting and quantitatively predicting PD-L1 expression in H&E-stained digital sections of lung squamous cell carcinoma, which has significant implications for the application and guidance of immune checkpoint inhibitor treatments. And the link to the code is https://github.com/Baron-Huang/PD-L1-prediction-via-HE-image.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1452563"},"PeriodicalIF":3.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695341/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142932821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A graph neural architecture search approach for identifying bots in social media. 一种用于识别社交媒体机器人的图神经架构搜索方法。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-20 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1509179

Georgios Tzoumanekas, Michail Chatzianastasis, Loukas Ilias, George Kiokes, John Psarras, Dimitris Askounis

Social media platforms, including X, Facebook, and Instagram, host millions of daily users, giving rise to bots automated programs disseminating misinformation and ideologies with tangible real-world consequences. While bot detection in platform X has been the area of many deep learning models with adequate results, most approaches neglect the graph structure of social media relationships and often rely on hand-engineered architectures. Our work introduces the implementation of a Neural Architecture Search (NAS) technique, namely Deep and Flexible Graph Neural Architecture Search (DFG-NAS), tailored to Relational Graph Convolutional Neural Networks (RGCNs) in the task of bot detection in platform X. Our model constructs a graph that incorporates both the user relationships and their metadata. Then, DFG-NAS is adapted to automatically search for the optimal configuration of Propagation and Transformation functions in the RGCNs. Our experiments are conducted on the TwiBot-20 dataset, constructing a graph with 229,580 nodes and 227,979 edges. We study the five architectures with the highest performance during the search and achieve an accuracy of 85.7%, surpassing state-of-the-art models. Our approach not only addresses the bot detection challenge but also advocates for the broader implementation of NAS models in neural network design automation.

包括X、Facebook和Instagram在内的社交媒体平台每天都有数百万用户，这催生了机器人自动程序，传播错误信息和意识形态，对现实世界产生了切实的影响。虽然X平台上的机器人检测已经成为许多深度学习模型的领域，并取得了足够的结果，但大多数方法都忽略了社交媒体关系的图结构，并且通常依赖于手工设计的架构。我们的工作介绍了一种神经架构搜索（NAS）技术的实现，即深度和灵活的图神经架构搜索（DFG-NAS），专门针对关系图卷积神经网络（RGCNs）在x平台上的机器人检测任务。我们的模型构建了一个包含用户关系及其元数据的图。然后，利用DFG-NAS自动搜索RGCNs中传播和转换函数的最优配置。我们的实验是在TwiBot-20数据集上进行的，构建了一个有229,580个节点和227,979条边的图。我们在搜索过程中研究了具有最高性能的五种架构，并实现了85.7%的准确率，超过了最先进的模型。我们的方法不仅解决了机器人检测的挑战，而且倡导在神经网络设计自动化中更广泛地实施NAS模型。

{"title":"A graph neural architecture search approach for identifying bots in social media.","authors":"Georgios Tzoumanekas, Michail Chatzianastasis, Loukas Ilias, George Kiokes, John Psarras, Dimitris Askounis","doi":"10.3389/frai.2024.1509179","DOIUrl":"https://doi.org/10.3389/frai.2024.1509179","url":null,"abstract":"Social media platforms, including X, Facebook, and Instagram, host millions of daily users, giving rise to bots automated programs disseminating misinformation and ideologies with tangible real-world consequences. While bot detection in platform X has been the area of many deep learning models with adequate results, most approaches neglect the graph structure of social media relationships and often rely on hand-engineered architectures. Our work introduces the implementation of a Neural Architecture Search (NAS) technique, namely Deep and Flexible Graph Neural Architecture Search (DFG-NAS), tailored to Relational Graph Convolutional Neural Networks (RGCNs) in the task of bot detection in platform X. Our model constructs a graph that incorporates both the user relationships and their metadata. Then, DFG-NAS is adapted to automatically search for the optimal configuration of Propagation and Transformation functions in the RGCNs. Our experiments are conducted on the TwiBot-20 dataset, constructing a graph with 229,580 nodes and 227,979 edges. We study the five architectures with the highest performance during the search and achieve an accuracy of 85.7%, surpassing state-of-the-art models. Our approach not only addresses the bot detection challenge but also advocates for the broader implementation of NAS models in neural network design automation.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1509179"},"PeriodicalIF":3.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142932805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial intelligence: clinical applications and future advancement in gastrointestinal cancers.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-20 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1446693

Abolfazl Akbari, Maryam Adabi, Mohsen Masoodi, Abolfazl Namazi, Fatemeh Mansouri, Seidamir Pasha Tabaeian, Zahra Shokati Eshkiki

One of the foremost causes of global healthcare burden is cancer of the gastrointestinal tract. The medical records, lab results, radiographs, endoscopic images, tissue samples, and medical histories of patients with gastrointestinal malignancies provide an enormous amount of medical data. There are encouraging signs that the advent of artificial intelligence could enhance the treatment of gastrointestinal issues with this data. Deep learning algorithms can swiftly and effectively analyze unstructured, high-dimensional data, including texts, images, and waveforms, while advanced machine learning approaches could reveal new insights into disease risk factors and phenotypes. In summary, artificial intelligence has the potential to revolutionize various features of gastrointestinal cancer care, such as early detection, diagnosis, therapy, and prognosis. This paper highlights some of the many potential applications of artificial intelligence in this domain. Additionally, we discuss the present state of the discipline and its potential future developments.

{"title":"Artificial intelligence: clinical applications and future advancement in gastrointestinal cancers.","authors":"Abolfazl Akbari, Maryam Adabi, Mohsen Masoodi, Abolfazl Namazi, Fatemeh Mansouri, Seidamir Pasha Tabaeian, Zahra Shokati Eshkiki","doi":"10.3389/frai.2024.1446693","DOIUrl":"https://doi.org/10.3389/frai.2024.1446693","url":null,"abstract":"One of the foremost causes of global healthcare burden is cancer of the gastrointestinal tract. The medical records, lab results, radiographs, endoscopic images, tissue samples, and medical histories of patients with gastrointestinal malignancies provide an enormous amount of medical data. There are encouraging signs that the advent of artificial intelligence could enhance the treatment of gastrointestinal issues with this data. Deep learning algorithms can swiftly and effectively analyze unstructured, high-dimensional data, including texts, images, and waveforms, while advanced machine learning approaches could reveal new insights into disease risk factors and phenotypes. In summary, artificial intelligence has the potential to revolutionize various features of gastrointestinal cancer care, such as early detection, diagnosis, therapy, and prognosis. This paper highlights some of the many potential applications of artificial intelligence in this domain. Additionally, we discuss the present state of the discipline and its potential future developments.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1446693"},"PeriodicalIF":3.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dense Paraphrasing for multimodal dialogue interpretation. 多模态对话解释的密集释义。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-19 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1479905

Jingxuan Tu, Kyeongmin Rim, Bingyang Ye, Kenneth Lai, James Pustejovsky

Multimodal dialogue involving multiple participants presents complex computational challenges, primarily due to the rich interplay of diverse communicative modalities including speech, gesture, action, and gaze. These modalities interact in complex ways that traditional dialogue systems often struggle to accurately track and interpret. To address these challenges, we extend the textual enrichment strategy of Dense Paraphrasing (DP), by translating each nonverbal modality into linguistic expressions. By normalizing multimodal information into a language-based form, we hope to both simplify the representation for and enhance the computational understanding of situated dialogues. We show the effectiveness of the dense paraphrased language form by evaluating instruction-tuned Large Language Models (LLMs) against the Common Ground Tracking (CGT) problem using a publicly available collaborative problem-solving dialogue dataset. Instead of using multimodal LLMs, the dense paraphrasing technique represents the dialogue information from multiple modalities in a compact and structured machine-readable text format that can be directly processed by the language-only models. We leverage the capability of LLMs to transform machine-readable paraphrases into human-readable paraphrases, and show that this process can further improve the result on the CGT task. Overall, the results show that augmenting the context with dense paraphrasing effectively facilitates the LLMs' alignment of information from multiple modalities, and in turn largely improves the performance of common ground reasoning over the baselines. Our proposed pipeline with original utterances as input context already achieves comparable results to the baseline that utilized decontextualized utterances which contain rich coreference information. When also using the decontextualized input, our pipeline largely improves the performance of common ground reasoning over the baselines. We discuss the potential of DP to create a robust model that can effectively interpret and integrate the subtleties of multimodal communication, thereby improving dialogue system performance in real-world settings.

涉及多个参与者的多模态对话提出了复杂的计算挑战，主要是由于多种交流模式（包括语音、手势、动作和凝视）的丰富相互作用。这些模式以复杂的方式相互作用，传统对话系统往往难以准确跟踪和解释。为了解决这些挑战，我们扩展了密集意译（DP）的文本丰富策略，通过将每种非语言情态翻译成语言表达。通过将多模态信息规范化为基于语言的形式，我们希望简化对情境对话的表示并增强对情境对话的计算理解。我们通过使用公开可用的协作解决问题的对话数据集评估指令调优的大型语言模型（llm）针对Common Ground Tracking （CGT）问题，展示了密集意译语言形式的有效性。与使用多模态llm不同，密集释义技术将来自多个模态的对话信息以紧凑和结构化的机器可读文本格式表示出来，这种格式可以由纯语言模型直接处理。我们利用llm的能力将机器可读的释义转换为人类可读的释义，并表明该过程可以进一步改善CGT任务的结果。总体而言，结果表明，用密集的释义增强上下文有效地促进了llm对来自多个模式的信息的对齐，从而在很大程度上提高了基线上共同点推理的性能。我们提出的以原始话语作为输入上下文的管道已经取得了与使用包含丰富共同参考信息的去语境化话语的基线相当的结果。当还使用非上下文化输入时，我们的管道在很大程度上提高了基线上公共基础推理的性能。我们讨论了DP创建一个健壮模型的潜力，该模型可以有效地解释和整合多模态通信的微妙之处，从而提高现实环境中的对话系统性能。

{"title":"Dense Paraphrasing for multimodal dialogue interpretation.","authors":"Jingxuan Tu, Kyeongmin Rim, Bingyang Ye, Kenneth Lai, James Pustejovsky","doi":"10.3389/frai.2024.1479905","DOIUrl":"10.3389/frai.2024.1479905","url":null,"abstract":"Multimodal dialogue involving multiple participants presents complex computational challenges, primarily due to the rich interplay of diverse communicative modalities including speech, gesture, action, and gaze. These modalities interact in complex ways that traditional dialogue systems often struggle to accurately track and interpret. To address these challenges, we extend the textual enrichment strategy of Dense Paraphrasing (DP), by translating each nonverbal modality into linguistic expressions. By normalizing multimodal information into a language-based form, we hope to both simplify the representation for and enhance the computational understanding of situated dialogues. We show the effectiveness of the dense paraphrased language form by evaluating instruction-tuned Large Language Models (LLMs) against the Common Ground Tracking (CGT) problem using a publicly available collaborative problem-solving dialogue dataset. Instead of using multimodal LLMs, the dense paraphrasing technique represents the dialogue information from multiple modalities in a compact and structured machine-readable text format that can be directly processed by the language-only models. We leverage the capability of LLMs to transform machine-readable paraphrases into human-readable paraphrases, and show that this process can further improve the result on the CGT task. Overall, the results show that augmenting the context with dense paraphrasing effectively facilitates the LLMs' alignment of information from multiple modalities, and in turn largely improves the performance of common ground reasoning over the baselines. Our proposed pipeline with original utterances as input context already achieves comparable results to the baseline that utilized decontextualized utterances which contain rich coreference information. When also using the decontextualized input, our pipeline largely improves the performance of common ground reasoning over the baselines. We discuss the potential of DP to create a robust model that can effectively interpret and integrate the subtleties of multimodal communication, thereby improving dialogue system performance in real-world settings.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1479905"},"PeriodicalIF":3.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11693678/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142923533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How AI competencies can make B2B marketing smarter: strategies to boost customer lifetime value. 人工智能能力如何使B2B营销更智能：提高客户终身价值的策略。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-18 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1451228

Tayyeba Bashir, Tan Zhongfu, Burhan Sadiq, Ammara Naseem

There has been a rapid rise in utilization of artificial intelligence (AI) in many different sectors in the last several years. However, business-to-business (B2B) marketing is one of the more notable examples. The initial assessments emphasize the significant advantages of AI in B2B marketing, including its knack for yielding unique understandings into consumer behaviors, recognizing crucial market trends, and improving operational efficiency. However, there seems to be a limited grasp of the optimal way to develop artificial intelligence competencies (AIC) for B2B marketing and how these attributes inevitably affect customer lifetime value (CLV). Equipped with AIC and B2B marketing literary fiction, this research unveils a theoretical research framework for evaluating the repercussions of AIC on B2B marketing capabilities and, subsequently, on CLV. We analyze the suggested research model using partial least squares structural equation modeling (PLS-SEM), leveraging 367 survey replies from Pakistani companies. The outcomes show a significant relationship that describe the ability to leverage AIC to enhance CLV, and also signifies the mediating role of B2B marketing capabilities to enhance CLV by integrating AIC in internet marketing. The findings of this study provide practical implications for marketers to monetize their marketing skills to enhance CLV and researchers with theoretical underpinnings of integration of AIC into marketing.

在过去几年中，人工智能（AI）在许多不同领域的应用迅速增加。然而，企业对企业（B2B）营销是一个更显著的例子。初步评估强调了人工智能在B2B营销中的显著优势，包括其对消费者行为的独特理解、识别关键市场趋势以及提高运营效率的能力。然而，对于为B2B营销开发人工智能能力（AIC）的最佳方式，以及这些属性如何不可避免地影响客户终身价值（CLV），似乎掌握有限。本研究结合AIC和B2B营销文学小说，揭示了一个理论研究框架，以评估AIC对B2B营销能力的影响，进而对CLV的影响。我们使用偏最小二乘结构方程模型（PLS-SEM）分析了建议的研究模型，利用了来自巴基斯坦公司的367份调查回复。结果显示了显著的关系，描述了利用AIC提高CLV的能力，也表明了B2B营销能力通过将AIC整合到网络营销中来提高CLV的中介作用。本研究结果为营销人员将营销技巧货币化以提高CLV提供了实践启示，也为研究人员将AIC整合到营销中提供了理论基础。

{"title":"How AI competencies can make B2B marketing smarter: strategies to boost customer lifetime value.","authors":"Tayyeba Bashir, Tan Zhongfu, Burhan Sadiq, Ammara Naseem","doi":"10.3389/frai.2024.1451228","DOIUrl":"10.3389/frai.2024.1451228","url":null,"abstract":"There has been a rapid rise in utilization of artificial intelligence (AI) in many different sectors in the last several years. However, business-to-business (B2B) marketing is one of the more notable examples. The initial assessments emphasize the significant advantages of AI in B2B marketing, including its knack for yielding unique understandings into consumer behaviors, recognizing crucial market trends, and improving operational efficiency. However, there seems to be a limited grasp of the optimal way to develop artificial intelligence competencies (AIC) for B2B marketing and how these attributes inevitably affect customer lifetime value (CLV). Equipped with AIC and B2B marketing literary fiction, this research unveils a theoretical research framework for evaluating the repercussions of AIC on B2B marketing capabilities and, subsequently, on CLV. We analyze the suggested research model using partial least squares structural equation modeling (PLS-SEM), leveraging 367 survey replies from Pakistani companies. The outcomes show a significant relationship that describe the ability to leverage AIC to enhance CLV, and also signifies the mediating role of B2B marketing capabilities to enhance CLV by integrating AIC in internet marketing. The findings of this study provide practical implications for marketers to monetize their marketing skills to enhance CLV and researchers with theoretical underpinnings of integration of AIC into marketing.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1451228"},"PeriodicalIF":3.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11688463/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142915785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpreting CNN models for musical instrument recognition using multi-spectrogram heatmap analysis: a preliminary study. 用多谱图热图分析解释CNN乐器识别模型：初步研究。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence

Pub Date : 2024-12-18 eCollection Date: 2024-01-01 DOI: 10.3389/frai.2024.1499913

Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan

Introduction: Musical instrument recognition is a critical component of music information retrieval (MIR), aimed at identifying and classifying instruments from audio recordings. This task poses significant challenges due to the complexity and variability of musical signals.

Methods: In this study, we employed convolutional neural networks (CNNs) to analyze the contributions of various spectrogram representations-STFT, Log-Mel, MFCC, Chroma, Spectral Contrast, and Tonnetz-to the classification of ten different musical instruments. The NSynth database was used for training and evaluation. Visual heatmap analysis and statistical metrics, including Difference Mean, KL Divergence, JS Divergence, and Earth Mover's Distance, were utilized to assess feature importance and model interpretability.

Results: Our findings highlight the strengths and limitations of each spectrogram type in capturing distinctive features of different instruments. MFCC and Log-Mel spectrograms demonstrated superior performance across most instruments, while others provided insights into specific characteristics.

Discussion: This analysis provides some insights into optimizing spectrogram-based approaches for musical instrument recognition, offering guidance for future model development and improving interpretability through statistical and visual analyses.

乐器识别是音乐信息检索（MIR）的一个重要组成部分，旨在从录音中识别和分类乐器。由于音乐信号的复杂性和可变性，这项任务带来了巨大的挑战。方法：在本研究中，我们使用卷积神经网络（cnn）来分析各种频谱图表示- stft， Log-Mel， MFCC, Chroma， Spectral Contrast和tonnetz -对10种不同乐器分类的贡献。使用NSynth数据库进行培训和评估。可视化热图分析和统计指标，包括差分均值、KL散度、JS散度和土动者距离，用于评估特征重要性和模型可解释性。结果：我们的研究结果突出了每种谱图类型在捕捉不同仪器的独特特征方面的优势和局限性。MFCC和Log-Mel谱图在大多数仪器中表现出优异的性能，而其他仪器则提供了对特定特性的见解。讨论：该分析为优化基于谱图的乐器识别方法提供了一些见解，为未来的模型开发提供了指导，并通过统计和可视化分析提高了可解释性。

{"title":"Interpreting CNN models for musical instrument recognition using multi-spectrogram heatmap analysis: a preliminary study.","authors":"Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan","doi":"10.3389/frai.2024.1499913","DOIUrl":"10.3389/frai.2024.1499913","url":null,"abstract":"Introduction: Musical instrument recognition is a critical component of music information retrieval (MIR), aimed at identifying and classifying instruments from audio recordings. This task poses significant challenges due to the complexity and variability of musical signals.Methods: In this study, we employed convolutional neural networks (CNNs) to analyze the contributions of various spectrogram representations-STFT, Log-Mel, MFCC, Chroma, Spectral Contrast, and Tonnetz-to the classification of ten different musical instruments. The NSynth database was used for training and evaluation. Visual heatmap analysis and statistical metrics, including Difference Mean, KL Divergence, JS Divergence, and Earth Mover's Distance, were utilized to assess feature importance and model interpretability.Results: Our findings highlight the strengths and limitations of each spectrogram type in capturing distinctive features of different instruments. MFCC and Log-Mel spectrograms demonstrated superior performance across most instruments, while others provided insights into specific characteristics.Discussion: This analysis provides some insights into optimizing spectrogram-based approaches for musical instrument recognition, offering guidance for future model development and improving interpretability through statistical and visual analyses.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1499913"},"PeriodicalIF":3.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11688478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142915786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0