Pub Date : 2021-05-01Epub Date: 2021-05-12DOI: 10.3389/fcomp.2021.624683
Youxiang Zhu, Xiaohui Liang, John A Batsis, Robert M Roth
Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.
{"title":"Exploring Deep Transfer Learning Techniques for Alzheimer's Dementia Detection.","authors":"Youxiang Zhu, Xiaohui Liang, John A Batsis, Robert M Roth","doi":"10.3389/fcomp.2021.624683","DOIUrl":"10.3389/fcomp.2021.624683","url":null,"abstract":"<p><p>Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.</p>","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":"3 ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153512/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39027802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01Epub Date: 2020-03-17DOI: 10.3389/fcomp.2020.00008
Christian Dietz, Curtis T Rueden, Stefan Helfrich, Ellen T A Dobson, Martin Horn, Jan Eglinger, Edward L Evans, Dalton T McLean, Tatiana Novitskaya, William A Ricke, Nathan M Sherer, Andries Zijlstra, Michael R Berthold, Kevin W Eliceiri
Open-source software tools are often used for analysis of scientific image data due to their flexibility and transparency in dealing with rapidly evolving imaging technologies. The complex nature of image analysis problems frequently requires many tools to be used in conjunction, including image processing and analysis, data processing, machine learning and deep learning, statistical analysis of the results, visualization, correlation to heterogeneous but related data, and more. However, the development, and therefore application, of these computational tools is impeded by a lack of integration across platforms. Integration of tools goes beyond convenience, as it is impractical for one tool to anticipate and accommodate the current and future needs of every user. This problem is emphasized in the field of bioimage analysis, where various rapidly emerging methods are quickly being adopted by researchers. ImageJ is a popular open-source image analysis platform, with contributions from a global community resulting in hundreds of specialized routines for a wide array of scientific tasks. ImageJ's strength lies in its accessibility and extensibility, allowing researchers to easily improve the software to solve their image analysis tasks. However, ImageJ is not designed for development of complex end-to-end image analysis workflows. Scientists are often forced to create highly specialized and hard-to-reproduce scripts to orchestrate individual software fragments and cover the entire life-cycle of an analysis of an image dataset. KNIME Analytics Platform, a user-friendly data integration, analysis, and exploration workflow system, was designed to handle huge amounts of heterogeneous data in a platform-agnostic, computing environment and has been successful in meeting complex end-to-end demands in several communities, such as cheminformatics and mass spectrometry. Similar needs within the bioimage analysis community led to the creation of the KNIME Image Processing extension which integrates ImageJ into KNIME Analytics Platform, enabling researchers to develop reproducible and scalable workflows, integrating a diverse range of analysis tools. Here we present how users and developers alike can leverage the ImageJ ecosystem via the KNIME Image Processing extension to provide robust and extensible image analysis within KNIME workflows. We illustrate the benefits of this integration with examples, as well as representative scientific use cases.
{"title":"Integration of the ImageJ Ecosystem in the KNIME Analytics Platform.","authors":"Christian Dietz, Curtis T Rueden, Stefan Helfrich, Ellen T A Dobson, Martin Horn, Jan Eglinger, Edward L Evans, Dalton T McLean, Tatiana Novitskaya, William A Ricke, Nathan M Sherer, Andries Zijlstra, Michael R Berthold, Kevin W Eliceiri","doi":"10.3389/fcomp.2020.00008","DOIUrl":"https://doi.org/10.3389/fcomp.2020.00008","url":null,"abstract":"<p><p>Open-source software tools are often used for analysis of scientific image data due to their flexibility and transparency in dealing with rapidly evolving imaging technologies. The complex nature of image analysis problems frequently requires many tools to be used in conjunction, including image processing and analysis, data processing, machine learning and deep learning, statistical analysis of the results, visualization, correlation to heterogeneous but related data, and more. However, the development, and therefore application, of these computational tools is impeded by a lack of integration across platforms. Integration of tools goes beyond convenience, as it is impractical for one tool to anticipate and accommodate the current and future needs of every user. This problem is emphasized in the field of bioimage analysis, where various rapidly emerging methods are quickly being adopted by researchers. ImageJ is a popular open-source image analysis platform, with contributions from a global community resulting in hundreds of specialized routines for a wide array of scientific tasks. ImageJ's strength lies in its accessibility and extensibility, allowing researchers to easily improve the software to solve their image analysis tasks. However, ImageJ is not designed for development of complex end-to-end image analysis workflows. Scientists are often forced to create highly specialized and hard-to-reproduce scripts to orchestrate individual software fragments and cover the entire life-cycle of an analysis of an image dataset. KNIME Analytics Platform, a user-friendly data integration, analysis, and exploration workflow system, was designed to handle huge amounts of heterogeneous data in a platform-agnostic, computing environment and has been successful in meeting complex end-to-end demands in several communities, such as cheminformatics and mass spectrometry. Similar needs within the bioimage analysis community led to the creation of the KNIME Image Processing extension which integrates ImageJ into KNIME Analytics Platform, enabling researchers to develop reproducible and scalable workflows, integrating a diverse range of analysis tools. Here we present how users and developers alike can leverage the ImageJ ecosystem via the KNIME Image Processing extension to provide robust and extensible image analysis within KNIME workflows. We illustrate the benefits of this integration with examples, as well as representative scientific use cases.</p>","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":"2 ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3389/fcomp.2020.00008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38359736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-01Epub Date: 2019-11-29DOI: 10.3389/fcomp.2019.00011
Itir Onal Ertugrul, Le Yang, László A Jeni, Jeffrey F Cohn
Facial action units (AUs) relate to specific local facial regions. Recent efforts in automated AU detection have focused on learning the facial patch representations to detect specific AUs. These efforts have encountered three hurdles. First, they implicitly assume that facial patches are robust to head rotation; yet non-frontal rotation is common. Second, mappings between AUs and patches are defined a priori, which ignores co-occurrences among AUs. And third, the dynamics of AUs are either ignored or modeled sequentially rather than simultaneously as in human perception. Inspired by recent advances in human perception, we propose a dynamic patch-attentive deep network, called D-PAttNet, for AU detection that (i) controls for 3D head and face rotation, (ii) learns mappings of patches to AUs, and (iii) models spatiotemporal dynamics. D-PAttNet approach significantly improves upon existing state of the art.
面部动作单元(AU)与特定的局部面部区域有关。最近在自动 AU 检测方面所做的努力主要集中在学习面部斑块表征以检测特定的 AU。这些努力遇到了三个障碍。首先,它们隐含地假定面部补丁对头部旋转具有鲁棒性;然而非正面旋转是很常见的。其次,AU 和斑块之间的映射是先验定义的,忽略了 AU 之间的共现。第三,AUs 的动态要么被忽略,要么被顺序建模,而不是像人类感知那样同时建模。受人类感知领域最新进展的启发,我们提出了一种动态斑块注意力深度网络(称为 D-PAttNet),用于 AU 检测,该网络(i)控制三维头部和面部旋转,(ii)学习斑块到 AU 的映射,(iii)建立时空动态模型。D-PAttNet 方法大大改进了现有的技术水平。
{"title":"D-PAttNet: Dynamic Patch-Attentive Deep Network for Action Unit Detection.","authors":"Itir Onal Ertugrul, Le Yang, László A Jeni, Jeffrey F Cohn","doi":"10.3389/fcomp.2019.00011","DOIUrl":"10.3389/fcomp.2019.00011","url":null,"abstract":"<p><p>Facial action units (AUs) relate to specific local facial regions. Recent efforts in automated AU detection have focused on learning the facial patch representations to detect specific AUs. These efforts have encountered three hurdles. First, they implicitly assume that facial patches are robust to head rotation; yet non-frontal rotation is common. Second, mappings between AUs and patches are defined a priori, which ignores co-occurrences among AUs. And third, the dynamics of AUs are either ignored or modeled sequentially rather than simultaneously as in human perception. Inspired by recent advances in human perception, we propose a dynamic patch-attentive deep network, called D-PAttNet, for AU detection that (i) controls for 3D head and face rotation, (ii) learns mappings of patches to AUs, and (iii) models spatiotemporal dynamics. D-PAttNet approach significantly improves upon existing state of the art.</p>","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":"1 ","pages":""},"PeriodicalIF":2.6,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6953909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37536194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}