Pub Date : 2024-04-04DOI: 10.3389/frai.2024.1290491
Zhonglin Ye, Zhuoran Li, Gege Li, Haixing Zhao
The dual-channel graph convolutional neural networks based on hybrid features jointly model the different features of networks, so that the features can learn each other and improve the performance of various subsequent machine learning tasks. However, current dual-channel graph convolutional neural networks are limited by the number of convolution layers, which hinders the performance improvement of the models. Graph convolutional neural networks superimpose multi-layer graph convolution operations, which would occur in smoothing phenomena, resulting in performance decreasing as the increasing number of graph convolutional layers. Inspired by the success of residual connections on convolutional neural networks, this paper applies residual connections to dual-channel graph convolutional neural networks, and increases the depth of dual-channel graph convolutional neural networks. Thus, a dual-channel deep graph convolutional neural network (D2GCN) is proposed, which can effectively avoid over-smoothing and improve model performance. D2GCN is verified on CiteSeer, DBLP, and SDBLP datasets, the results show that D2GCN performs better than the comparison algorithms used in node classification tasks.
{"title":"Dual-channel deep graph convolutional neural networks","authors":"Zhonglin Ye, Zhuoran Li, Gege Li, Haixing Zhao","doi":"10.3389/frai.2024.1290491","DOIUrl":"https://doi.org/10.3389/frai.2024.1290491","url":null,"abstract":"The dual-channel graph convolutional neural networks based on hybrid features jointly model the different features of networks, so that the features can learn each other and improve the performance of various subsequent machine learning tasks. However, current dual-channel graph convolutional neural networks are limited by the number of convolution layers, which hinders the performance improvement of the models. Graph convolutional neural networks superimpose multi-layer graph convolution operations, which would occur in smoothing phenomena, resulting in performance decreasing as the increasing number of graph convolutional layers. Inspired by the success of residual connections on convolutional neural networks, this paper applies residual connections to dual-channel graph convolutional neural networks, and increases the depth of dual-channel graph convolutional neural networks. Thus, a dual-channel deep graph convolutional neural network (D2GCN) is proposed, which can effectively avoid over-smoothing and improve model performance. D2GCN is verified on CiteSeer, DBLP, and SDBLP datasets, the results show that D2GCN performs better than the comparison algorithms used in node classification tasks.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"10 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140744253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-16DOI: 10.3389/frai.2024.1349668
Khansa Chemnad, Achraf Othman
Digital accessibility involves designing digital systems and services to enable access for individuals, including those with disabilities, including visual, auditory, motor, or cognitive impairments. Artificial intelligence (AI) has the potential to enhance accessibility for people with disabilities and improve their overall quality of life.This systematic review, covering academic articles from 2018 to 2023, focuses on AI applications for digital accessibility. Initially, 3,706 articles were screened from five scholarly databases—ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus, and Springer.The analysis narrowed down to 43 articles, presenting a classification framework based on applications, challenges, AI methodologies, and accessibility standards.This research emphasizes the predominant focus on AI-driven digital accessibility for visual impairments, revealing a critical gap in addressing speech and hearing impairments, autism spectrum disorder, neurological disorders, and motor impairments. This highlights the need for a more balanced research distribution to ensure equitable support for all communities with disabilities. The study also pointed out a lack of adherence to accessibility standards in existing systems, stressing the urgency for a fundamental shift in designing solutions for people with disabilities. Overall, this research underscores the vital role of accessible AI in preventing exclusion and discrimination, urging a comprehensive approach to digital accessibility to cater to diverse disability needs.
数字无障碍涉及设计数字系统和服务,使个人(包括有视觉、听觉、运动或认知障碍的残疾人)能够使用。人工智能(AI)有可能提高残疾人的无障碍程度,改善他们的整体生活质量。本系统性综述涵盖2018年至2023年的学术文章,重点关注人工智能在数字无障碍方面的应用。最初,我们从五个学术数据库--ACM Digital Library、IEEE Xplore、ScienceDirect、Scopus 和 Springer--筛选了 3706 篇文章。分析筛选出 43 篇文章,提出了一个基于应用、挑战、人工智能方法和无障碍标准的分类框架。这项研究强调,人工智能驱动的数字无障碍主要侧重于视觉障碍,揭示了在解决语言和听力障碍、自闭症谱系障碍、神经系统疾病和运动障碍方面的关键差距。这凸显出需要更加均衡的研究分布,以确保为所有残疾群体提供公平的支持。研究还指出,现有系统缺乏对无障碍标准的遵守,强调了在为残疾人设计解决方案时进行根本性转变的紧迫性。总之,这项研究强调了无障碍人工智能在防止排斥和歧视方面的重要作用,并敦促采用全面的数字无障碍方法来满足不同的残疾需求。
{"title":"Digital accessibility in the era of artificial intelligence—Bibliometric analysis and systematic review","authors":"Khansa Chemnad, Achraf Othman","doi":"10.3389/frai.2024.1349668","DOIUrl":"https://doi.org/10.3389/frai.2024.1349668","url":null,"abstract":"Digital accessibility involves designing digital systems and services to enable access for individuals, including those with disabilities, including visual, auditory, motor, or cognitive impairments. Artificial intelligence (AI) has the potential to enhance accessibility for people with disabilities and improve their overall quality of life.This systematic review, covering academic articles from 2018 to 2023, focuses on AI applications for digital accessibility. Initially, 3,706 articles were screened from five scholarly databases—ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus, and Springer.The analysis narrowed down to 43 articles, presenting a classification framework based on applications, challenges, AI methodologies, and accessibility standards.This research emphasizes the predominant focus on AI-driven digital accessibility for visual impairments, revealing a critical gap in addressing speech and hearing impairments, autism spectrum disorder, neurological disorders, and motor impairments. This highlights the need for a more balanced research distribution to ensure equitable support for all communities with disabilities. The study also pointed out a lack of adherence to accessibility standards in existing systems, stressing the urgency for a fundamental shift in designing solutions for people with disabilities. Overall, this research underscores the vital role of accessible AI in preventing exclusion and discrimination, urging a comprehensive approach to digital accessibility to cater to diverse disability needs.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"59 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139960785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-16DOI: 10.3389/frai.2024.1302860
Yehuda Nissenbaum, Amichai Painsky
Multi-target learning (MTL) is a popular machine learning technique which considers simultaneous prediction of multiple targets. MTL schemes utilize a variety of methods, from traditional linear models to more contemporary deep neural networks. In this work we introduce a novel, highly interpretable, tree-based MTL scheme which exploits the correlation between the targets to obtain improved prediction accuracy. Our suggested scheme applies cross-validated splitting criterion to identify correlated targets at every node of the tree. This allows us to benefit from the correlation among the targets while avoiding overfitting. We demonstrate the performance of our proposed scheme in a variety of synthetic and real-world experiments, showing a significant improvement over alternative methods. An implementation of the proposed method is publicly available at the first author's webpage.
{"title":"Cross-validated tree-based models for multi-target learning","authors":"Yehuda Nissenbaum, Amichai Painsky","doi":"10.3389/frai.2024.1302860","DOIUrl":"https://doi.org/10.3389/frai.2024.1302860","url":null,"abstract":"Multi-target learning (MTL) is a popular machine learning technique which considers simultaneous prediction of multiple targets. MTL schemes utilize a variety of methods, from traditional linear models to more contemporary deep neural networks. In this work we introduce a novel, highly interpretable, tree-based MTL scheme which exploits the correlation between the targets to obtain improved prediction accuracy. Our suggested scheme applies cross-validated splitting criterion to identify correlated targets at every node of the tree. This allows us to benefit from the correlation among the targets while avoiding overfitting. We demonstrate the performance of our proposed scheme in a variety of synthetic and real-world experiments, showing a significant improvement over alternative methods. An implementation of the proposed method is publicly available at the first author's webpage.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"45 30","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139961596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-14DOI: 10.3389/frai.2024.1346684
Erich Robbi, Marco Bronzini, P. Viappiani, Andrea Passerini
Bundle recommendation aims to generate bundles of associated products that users tend to consume as a whole under certain circumstances. Modeling the bundle utility for users is a non-trivial task, as it requires to account for the potential interdependencies between bundle attributes. To address this challenge, we introduce a new preference-based approach for bundle recommendation exploiting the Choquet integral. This allows us to formalize preferences for coalitions of environmental-related attributes, thus recommending product bundles accounting for synergies among product attributes. An experimental evaluation of a dataset of local food products in Northern Italy shows how the Choquet integral allows the natural formalization of a sensible notion of environmental friendliness and that standard approaches based on weighted sums of attributes end up recommending bundles with lower environmental friendliness even if weights are explicitly learned to maximize it. We further show how preference elicitation strategies can be leveraged to acquire weights of the Choquet integral from user feedback in terms of preferences over candidate bundles, and show how a handful of queries allow to recommend optimal bundles for a diverse set of user prototypes.
{"title":"Personalized bundle recommendation using preference elicitation and the Choquet integral","authors":"Erich Robbi, Marco Bronzini, P. Viappiani, Andrea Passerini","doi":"10.3389/frai.2024.1346684","DOIUrl":"https://doi.org/10.3389/frai.2024.1346684","url":null,"abstract":"Bundle recommendation aims to generate bundles of associated products that users tend to consume as a whole under certain circumstances. Modeling the bundle utility for users is a non-trivial task, as it requires to account for the potential interdependencies between bundle attributes. To address this challenge, we introduce a new preference-based approach for bundle recommendation exploiting the Choquet integral. This allows us to formalize preferences for coalitions of environmental-related attributes, thus recommending product bundles accounting for synergies among product attributes. An experimental evaluation of a dataset of local food products in Northern Italy shows how the Choquet integral allows the natural formalization of a sensible notion of environmental friendliness and that standard approaches based on weighted sums of attributes end up recommending bundles with lower environmental friendliness even if weights are explicitly learned to maximize it. We further show how preference elicitation strategies can be leveraged to acquire weights of the Choquet integral from user feedback in terms of preferences over candidate bundles, and show how a handful of queries allow to recommend optimal bundles for a diverse set of user prototypes.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"583 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139839026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-14DOI: 10.3389/frai.2024.1346684
Erich Robbi, Marco Bronzini, P. Viappiani, Andrea Passerini
Bundle recommendation aims to generate bundles of associated products that users tend to consume as a whole under certain circumstances. Modeling the bundle utility for users is a non-trivial task, as it requires to account for the potential interdependencies between bundle attributes. To address this challenge, we introduce a new preference-based approach for bundle recommendation exploiting the Choquet integral. This allows us to formalize preferences for coalitions of environmental-related attributes, thus recommending product bundles accounting for synergies among product attributes. An experimental evaluation of a dataset of local food products in Northern Italy shows how the Choquet integral allows the natural formalization of a sensible notion of environmental friendliness and that standard approaches based on weighted sums of attributes end up recommending bundles with lower environmental friendliness even if weights are explicitly learned to maximize it. We further show how preference elicitation strategies can be leveraged to acquire weights of the Choquet integral from user feedback in terms of preferences over candidate bundles, and show how a handful of queries allow to recommend optimal bundles for a diverse set of user prototypes.
{"title":"Personalized bundle recommendation using preference elicitation and the Choquet integral","authors":"Erich Robbi, Marco Bronzini, P. Viappiani, Andrea Passerini","doi":"10.3389/frai.2024.1346684","DOIUrl":"https://doi.org/10.3389/frai.2024.1346684","url":null,"abstract":"Bundle recommendation aims to generate bundles of associated products that users tend to consume as a whole under certain circumstances. Modeling the bundle utility for users is a non-trivial task, as it requires to account for the potential interdependencies between bundle attributes. To address this challenge, we introduce a new preference-based approach for bundle recommendation exploiting the Choquet integral. This allows us to formalize preferences for coalitions of environmental-related attributes, thus recommending product bundles accounting for synergies among product attributes. An experimental evaluation of a dataset of local food products in Northern Italy shows how the Choquet integral allows the natural formalization of a sensible notion of environmental friendliness and that standard approaches based on weighted sums of attributes end up recommending bundles with lower environmental friendliness even if weights are explicitly learned to maximize it. We further show how preference elicitation strategies can be leveraged to acquire weights of the Choquet integral from user feedback in terms of preferences over candidate bundles, and show how a handful of queries allow to recommend optimal bundles for a diverse set of user prototypes.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"46 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139779151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-12DOI: 10.3389/frai.2024.1329185
Thilagavathi Ramamoorthy, V. Kulothungan, Bagavandas Mappillairaju
The utilization of social media presents a promising avenue for the prevention and management of diabetes. To effectively cater to the diabetes-related knowledge, support, and intervention needs of the community, it is imperative to attain a deeper understanding of the extent and content of discussions pertaining to this health issue. This study aims to assess and compare various topic modeling techniques to determine the most effective model for identifying the core themes in diabetes-related tweets, the sources responsible for disseminating this information, the reach of these themes, and the influential individuals within the Twitter community in India.Twitter messages from India, dated between 7 November 2022 and 28 February 2023, were collected using the Twitter API. The unsupervised machine learning topic models, namely, Latent Dirichlet Allocation (LDA), non-negative matrix factorization (NMF), BERTopic, and Top2Vec, were compared, and the best-performing model was used to identify common diabetes-related topics. Influential users were identified through social network analysis.The NMF model outperformed the LDA model, whereas BERTopic performed better than Top2Vec. Diabetes-related conversations revolved around eight topics, namely, promotion, management, drug and personal story, consequences, risk factors and research, raising awareness and providing support, diet, and opinion and lifestyle changes. The influential nodes identified were mainly health professionals and healthcare organizations.The study identified important topics of discussion along with health professionals and healthcare organizations involved in sharing diabetes-related information with the public. Collaborations among influential healthcare organizations, health professionals, and the government can foster awareness and prevent noncommunicable diseases.
{"title":"Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India","authors":"Thilagavathi Ramamoorthy, V. Kulothungan, Bagavandas Mappillairaju","doi":"10.3389/frai.2024.1329185","DOIUrl":"https://doi.org/10.3389/frai.2024.1329185","url":null,"abstract":"The utilization of social media presents a promising avenue for the prevention and management of diabetes. To effectively cater to the diabetes-related knowledge, support, and intervention needs of the community, it is imperative to attain a deeper understanding of the extent and content of discussions pertaining to this health issue. This study aims to assess and compare various topic modeling techniques to determine the most effective model for identifying the core themes in diabetes-related tweets, the sources responsible for disseminating this information, the reach of these themes, and the influential individuals within the Twitter community in India.Twitter messages from India, dated between 7 November 2022 and 28 February 2023, were collected using the Twitter API. The unsupervised machine learning topic models, namely, Latent Dirichlet Allocation (LDA), non-negative matrix factorization (NMF), BERTopic, and Top2Vec, were compared, and the best-performing model was used to identify common diabetes-related topics. Influential users were identified through social network analysis.The NMF model outperformed the LDA model, whereas BERTopic performed better than Top2Vec. Diabetes-related conversations revolved around eight topics, namely, promotion, management, drug and personal story, consequences, risk factors and research, raising awareness and providing support, diet, and opinion and lifestyle changes. The influential nodes identified were mainly health professionals and healthcare organizations.The study identified important topics of discussion along with health professionals and healthcare organizations involved in sharing diabetes-related information with the public. Collaborations among influential healthcare organizations, health professionals, and the government can foster awareness and prevent noncommunicable diseases.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"121 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139785047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-12DOI: 10.3389/frai.2024.1329185
Thilagavathi Ramamoorthy, V. Kulothungan, Bagavandas Mappillairaju
The utilization of social media presents a promising avenue for the prevention and management of diabetes. To effectively cater to the diabetes-related knowledge, support, and intervention needs of the community, it is imperative to attain a deeper understanding of the extent and content of discussions pertaining to this health issue. This study aims to assess and compare various topic modeling techniques to determine the most effective model for identifying the core themes in diabetes-related tweets, the sources responsible for disseminating this information, the reach of these themes, and the influential individuals within the Twitter community in India.Twitter messages from India, dated between 7 November 2022 and 28 February 2023, were collected using the Twitter API. The unsupervised machine learning topic models, namely, Latent Dirichlet Allocation (LDA), non-negative matrix factorization (NMF), BERTopic, and Top2Vec, were compared, and the best-performing model was used to identify common diabetes-related topics. Influential users were identified through social network analysis.The NMF model outperformed the LDA model, whereas BERTopic performed better than Top2Vec. Diabetes-related conversations revolved around eight topics, namely, promotion, management, drug and personal story, consequences, risk factors and research, raising awareness and providing support, diet, and opinion and lifestyle changes. The influential nodes identified were mainly health professionals and healthcare organizations.The study identified important topics of discussion along with health professionals and healthcare organizations involved in sharing diabetes-related information with the public. Collaborations among influential healthcare organizations, health professionals, and the government can foster awareness and prevent noncommunicable diseases.
{"title":"Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India","authors":"Thilagavathi Ramamoorthy, V. Kulothungan, Bagavandas Mappillairaju","doi":"10.3389/frai.2024.1329185","DOIUrl":"https://doi.org/10.3389/frai.2024.1329185","url":null,"abstract":"The utilization of social media presents a promising avenue for the prevention and management of diabetes. To effectively cater to the diabetes-related knowledge, support, and intervention needs of the community, it is imperative to attain a deeper understanding of the extent and content of discussions pertaining to this health issue. This study aims to assess and compare various topic modeling techniques to determine the most effective model for identifying the core themes in diabetes-related tweets, the sources responsible for disseminating this information, the reach of these themes, and the influential individuals within the Twitter community in India.Twitter messages from India, dated between 7 November 2022 and 28 February 2023, were collected using the Twitter API. The unsupervised machine learning topic models, namely, Latent Dirichlet Allocation (LDA), non-negative matrix factorization (NMF), BERTopic, and Top2Vec, were compared, and the best-performing model was used to identify common diabetes-related topics. Influential users were identified through social network analysis.The NMF model outperformed the LDA model, whereas BERTopic performed better than Top2Vec. Diabetes-related conversations revolved around eight topics, namely, promotion, management, drug and personal story, consequences, risk factors and research, raising awareness and providing support, diet, and opinion and lifestyle changes. The influential nodes identified were mainly health professionals and healthcare organizations.The study identified important topics of discussion along with health professionals and healthcare organizations involved in sharing diabetes-related information with the public. Collaborations among influential healthcare organizations, health professionals, and the government can foster awareness and prevent noncommunicable diseases.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"50 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139845016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.3389/frai.2024.1287877
Julio Cesar Cavalcanti, Ronaldo Rodrigues da Silva, Anders Eriksson, P. Barbosa
This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.
{"title":"Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks","authors":"Julio Cesar Cavalcanti, Ronaldo Rodrigues da Silva, Anders Eriksson, P. Barbosa","doi":"10.3389/frai.2024.1287877","DOIUrl":"https://doi.org/10.3389/frai.2024.1287877","url":null,"abstract":"This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":" 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139792092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.3389/frai.2024.1285026
Sabine Wehnert, Praneeth Chedella, Jonas Asche, Ernesto William De Luca
In this study, we propose a visualization technique to explore and visualize concept hierarchies generated from a textbook in the legal domain. Through a human-centered design process, we developed a tool that allows users to effectively navigate through and explore complex hierarchical concepts in three kinds of traversal techniques: top-down, middle-out, and bottom-up. Our concept hierarchies offer an overview over a given domain, with increasing level of detail toward the bottom of the hierarchy which is consisting of entities. In the legal use case we considered, the concepts were adapted from section headings in a legal textbook, whereas references to law or legal cases inside the textbook became entities. The design of this tool is refined following various steps such as gathering user needs, pain points of an existing visualization, prototyping, testing, and refining. The resulting interface offers users several key features such as dynamic search and filter, explorable concept nodes, and a preview of leaf nodes at every stage. A high-fidelity prototype was created to test our theory and design. To test our concept, we used the System Usability Scale as a way to measure the prototype's usability, a task-based survey to asses the tool's ability in assisting users in gathering information and interacting with the prototype, and finally mouse tracking to understand user interaction patterns. Along with this, we gathered audio and video footage of users when participating in the study. This footage also helped us in getting feedback when the survey responses required further information. The data collected provided valuable insights to set the directions for extending this study. As a result, we have accounted for varying hierarchy depths, longer text spans than only one to two words in the elements of the hierarchy, searchability, and exploration of the hierarchies. At the same time, we aimed for minimizing visual clutter and cognitive overload. We show that existing approaches are not suitable to visualize the type of data which we support with our visualization.
{"title":"A dynamic approach for visualizing and exploring concept hierarchies from textbooks","authors":"Sabine Wehnert, Praneeth Chedella, Jonas Asche, Ernesto William De Luca","doi":"10.3389/frai.2024.1285026","DOIUrl":"https://doi.org/10.3389/frai.2024.1285026","url":null,"abstract":"In this study, we propose a visualization technique to explore and visualize concept hierarchies generated from a textbook in the legal domain. Through a human-centered design process, we developed a tool that allows users to effectively navigate through and explore complex hierarchical concepts in three kinds of traversal techniques: top-down, middle-out, and bottom-up. Our concept hierarchies offer an overview over a given domain, with increasing level of detail toward the bottom of the hierarchy which is consisting of entities. In the legal use case we considered, the concepts were adapted from section headings in a legal textbook, whereas references to law or legal cases inside the textbook became entities. The design of this tool is refined following various steps such as gathering user needs, pain points of an existing visualization, prototyping, testing, and refining. The resulting interface offers users several key features such as dynamic search and filter, explorable concept nodes, and a preview of leaf nodes at every stage. A high-fidelity prototype was created to test our theory and design. To test our concept, we used the System Usability Scale as a way to measure the prototype's usability, a task-based survey to asses the tool's ability in assisting users in gathering information and interacting with the prototype, and finally mouse tracking to understand user interaction patterns. Along with this, we gathered audio and video footage of users when participating in the study. This footage also helped us in getting feedback when the survey responses required further information. The data collected provided valuable insights to set the directions for extending this study. As a result, we have accounted for varying hierarchy depths, longer text spans than only one to two words in the elements of the hierarchy, searchability, and exploration of the hierarchies. At the same time, we aimed for minimizing visual clutter and cognitive overload. We show that existing approaches are not suitable to visualize the type of data which we support with our visualization.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":" 38","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139792279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.3389/frai.2024.1337356
M. Hammoud, Melaku N. Getahun, Anna Baldycheva, Andrey Somov
Crying is an inevitable character trait that occurs throughout the growth of infants, under conditions where the caregiver may have difficulty interpreting the underlying cause of the cry. Crying can be treated as an audio signal that carries a message about the infant's state, such as discomfort, hunger, and sickness. The primary infant caregiver requires traditional ways of understanding these feelings. Failing to understand them correctly can cause severe problems. Several methods attempt to solve this problem; however, proper audio feature representation and classifiers are necessary for better results. This study uses time-, frequency-, and time-frequency-domain feature representations to gain in-depth information from the data. The time-domain features include zero-crossing rate (ZCR) and root mean square (RMS), the frequency-domain feature includes the Mel-spectrogram, and the time-frequency-domain feature includes Mel-frequency cepstral coefficients (MFCCs). Moreover, time-series imaging algorithms are applied to transform 20 MFCC features into images using different algorithms: Gramian angular difference fields, Gramian angular summation fields, Markov transition fields, recurrence plots, and RGB GAF. Then, these features are provided to different machine learning classifiers, such as decision tree, random forest, K nearest neighbors, and bagging. The use of MFCCs, ZCR, and RMS as features achieved high performance, outperforming state of the art (SOTA). Optimal parameters are found via the grid search method using 10-fold cross-validation. Our MFCC-based random forest (RF) classifier approach achieved an accuracy of 96.39%, outperforming SOTA, the scalogram-based shuffleNet classifier, which had an accuracy of 95.17%.
{"title":"Machine learning-based infant crying interpretation","authors":"M. Hammoud, Melaku N. Getahun, Anna Baldycheva, Andrey Somov","doi":"10.3389/frai.2024.1337356","DOIUrl":"https://doi.org/10.3389/frai.2024.1337356","url":null,"abstract":"Crying is an inevitable character trait that occurs throughout the growth of infants, under conditions where the caregiver may have difficulty interpreting the underlying cause of the cry. Crying can be treated as an audio signal that carries a message about the infant's state, such as discomfort, hunger, and sickness. The primary infant caregiver requires traditional ways of understanding these feelings. Failing to understand them correctly can cause severe problems. Several methods attempt to solve this problem; however, proper audio feature representation and classifiers are necessary for better results. This study uses time-, frequency-, and time-frequency-domain feature representations to gain in-depth information from the data. The time-domain features include zero-crossing rate (ZCR) and root mean square (RMS), the frequency-domain feature includes the Mel-spectrogram, and the time-frequency-domain feature includes Mel-frequency cepstral coefficients (MFCCs). Moreover, time-series imaging algorithms are applied to transform 20 MFCC features into images using different algorithms: Gramian angular difference fields, Gramian angular summation fields, Markov transition fields, recurrence plots, and RGB GAF. Then, these features are provided to different machine learning classifiers, such as decision tree, random forest, K nearest neighbors, and bagging. The use of MFCCs, ZCR, and RMS as features achieved high performance, outperforming state of the art (SOTA). Optimal parameters are found via the grid search method using 10-fold cross-validation. Our MFCC-based random forest (RF) classifier approach achieved an accuracy of 96.39%, outperforming SOTA, the scalogram-based shuffleNet classifier, which had an accuracy of 95.17%.","PeriodicalId":508738,"journal":{"name":"Frontiers in Artificial Intelligence","volume":" 29","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139793330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}