Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.
{"title":"Unimodal and Multimodal Representation Training for Relation Extraction","authors":"Ciaran Cooney, Rachel Heyburn, Liam Maddigan, Mairead O'Cuinn, Chloe Thompson, Joana Cavadas","doi":"10.48550/arXiv.2211.06168","DOIUrl":"https://doi.org/10.48550/arXiv.2211.06168","url":null,"abstract":"Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125398576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-04DOI: 10.48550/arXiv.2211.02643
Mirco Ramo, G. Silvestre
The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression trees. A small Transformer model suitable for edge inference was successfully trained to an average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN tree representation for 94% of predictions.
{"title":"A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions","authors":"Mirco Ramo, G. Silvestre","doi":"10.48550/arXiv.2211.02643","DOIUrl":"https://doi.org/10.48550/arXiv.2211.02643","url":null,"abstract":"The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression trees. A small Transformer model suitable for edge inference was successfully trained to an average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN tree representation for 94% of predictions.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115068723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-02DOI: 10.48550/arXiv.2211.00902
Hazrat Ali, Shafaq Murad, Zubair Shah
Generative models are becoming popular for the synthesis of medical images. Recently, neural diffusion models have demonstrated the potential to generate photo-realistic images of objects. However, their potential to generate medical images is not explored yet. In this work, we explore the possibilities of synthesis of medical images using neural diffusion models. First, we use a pre-trained DALLE2 model to generate lungs X-Ray and CT images from an input text prompt. Second, we train a stable diffusion model with 3165 X-Ray images and generate synthetic images. We evaluate the synthetic image data through a qualitative analysis where two independent radiologists label randomly chosen samples from the generated data as real, fake, or unsure. Results demonstrate that images generated with the diffusion model can translate characteristics that are otherwise very specific to certain medical conditions in chest X-Ray or CT images. Careful tuning of the model can be very promising. To the best of our knowledge, this is the first attempt to generate lungs X-Ray and CT images using neural diffusion models. This work aims to introduce a new dimension in artificial intelligence for medical imaging. Given that this is a new topic, the paper will serve as an introduction and motivation for the research community to explore the potential of diffusion models for medical image synthesis. We have released the synthetic images on https://www.kaggle.com/datasets/hazrat/awesomelungs.
{"title":"Spot the fake lungs: Generating Synthetic Medical Images using Neural Diffusion Models","authors":"Hazrat Ali, Shafaq Murad, Zubair Shah","doi":"10.48550/arXiv.2211.00902","DOIUrl":"https://doi.org/10.48550/arXiv.2211.00902","url":null,"abstract":"Generative models are becoming popular for the synthesis of medical images. Recently, neural diffusion models have demonstrated the potential to generate photo-realistic images of objects. However, their potential to generate medical images is not explored yet. In this work, we explore the possibilities of synthesis of medical images using neural diffusion models. First, we use a pre-trained DALLE2 model to generate lungs X-Ray and CT images from an input text prompt. Second, we train a stable diffusion model with 3165 X-Ray images and generate synthetic images. We evaluate the synthetic image data through a qualitative analysis where two independent radiologists label randomly chosen samples from the generated data as real, fake, or unsure. Results demonstrate that images generated with the diffusion model can translate characteristics that are otherwise very specific to certain medical conditions in chest X-Ray or CT images. Careful tuning of the model can be very promising. To the best of our knowledge, this is the first attempt to generate lungs X-Ray and CT images using neural diffusion models. This work aims to introduce a new dimension in artificial intelligence for medical imaging. Given that this is a new topic, the paper will serve as an introduction and motivation for the research community to explore the potential of diffusion models for medical image synthesis. We have released the synthetic images on https://www.kaggle.com/datasets/hazrat/awesomelungs.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114320159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-09DOI: 10.48550/arXiv.2210.06334
Muhammad Muneeb Saad, M. H. Rehmani, Ruairi O'Reilly
Imbalanced image datasets are commonly available in the domain of biomedical image analysis. Biomedical images contain diversified features that are significant in predicting targeted diseases. Generative Adversarial Networks (GANs) are utilized to address the data limitation problem via the generation of synthetic images. Training challenges such as mode collapse, non-convergence, and instability degrade a GAN's performance in synthesizing diversified and high-quality images. In this work, MSG-SAGAN, an attention-guided multi-scale gradient GAN architecture is proposed to model the relationship between long-range dependencies of biomedical image features and improves the training performance using a flow of multi-scale gradients at multiple resolutions in the layers of generator and discriminator models. The intent is to reduce the impact of mode collapse and stabilize the training of GAN using an attention mechanism with multi-scale gradient learning for diversified X-ray image synthesis. Multi-scale Structural Similarity Index Measure (MS-SSIM) and Frechet Inception Distance (FID) are used to identify the occurrence of mode collapse and evaluate the diversity of synthetic images generated. The proposed architecture is compared with the multi-scale gradient GAN (MSG-GAN) to assess the diversity of generated synthetic images. Results indicate that the MSG-SAGAN outperforms MSG-GAN in synthesizing diversified images as evidenced by the MS-SSIM and FID scores.
{"title":"A Self-attention Guided Multi-scale Gradient GAN for Diversified X-ray Image Synthesis","authors":"Muhammad Muneeb Saad, M. H. Rehmani, Ruairi O'Reilly","doi":"10.48550/arXiv.2210.06334","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06334","url":null,"abstract":"Imbalanced image datasets are commonly available in the domain of biomedical image analysis. Biomedical images contain diversified features that are significant in predicting targeted diseases. Generative Adversarial Networks (GANs) are utilized to address the data limitation problem via the generation of synthetic images. Training challenges such as mode collapse, non-convergence, and instability degrade a GAN's performance in synthesizing diversified and high-quality images. In this work, MSG-SAGAN, an attention-guided multi-scale gradient GAN architecture is proposed to model the relationship between long-range dependencies of biomedical image features and improves the training performance using a flow of multi-scale gradients at multiple resolutions in the layers of generator and discriminator models. The intent is to reduce the impact of mode collapse and stabilize the training of GAN using an attention mechanism with multi-scale gradient learning for diversified X-ray image synthesis. Multi-scale Structural Similarity Index Measure (MS-SSIM) and Frechet Inception Distance (FID) are used to identify the occurrence of mode collapse and evaluate the diversity of synthetic images generated. The proposed architecture is compared with the multi-scale gradient GAN (MSG-GAN) to assess the diversity of generated synthetic images. Results indicate that the MSG-SAGAN outperforms MSG-GAN in synthesizing diversified images as evidenced by the MS-SSIM and FID scores.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116570077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-17DOI: 10.48550/arXiv.2207.10811
B. Sudharsan, P. Corcoran, M. Ali
Advancements in semiconductor technology have reduced dimensions and cost while improving the performance and capacity of chipsets. In addition, advancement in the AI frameworks and libraries brings possibilities to accommodate more AI at the resource-constrained edge of consumer IoT devices. Sensors are nowadays an integral part of our environment which provide continuous data streams to build intelligent applications. An example could be a smart home scenario with multiple interconnected devices. In such smart environments, for convenience and quick access to web-based service and personal information such as calendars, notes, emails, reminders, banking, etc, users link third-party skills or skills from the Amazon store to their smart speakers. Also, in current smart home scenarios, several smart home products such as smart security cameras, video doorbells, smart plugs, smart carbon monoxide monitors, and smart door locks, etc. are interlinked to a modern smart speaker via means of custom skill addition. Since smart speakers are linked to such services and devices via the smart speaker user's account. They can be used by anyone with physical access to the smart speaker via voice commands. If done so, the data privacy, home security and other aspects of the user get compromised. Recently launched, Tensor Cam's AI Camera, Toshiba's Symbio, Facebook's Portal are camera-enabled smart speakers with AI functionalities. Although they are camera-enabled, yet they do not have an authentication scheme in addition to calling out the wake-word. This paper provides an overview of cybersecurity risks faced by smart speaker users due to lack of authentication scheme and discusses the development of a state-of-the-art camera-enabled, microphone array-based modern Alexa smart speaker prototype to address these risks.
{"title":"Smart Speaker Design and Implementation with Biometric Authentication and Advanced Voice Interaction Capability","authors":"B. Sudharsan, P. Corcoran, M. Ali","doi":"10.48550/arXiv.2207.10811","DOIUrl":"https://doi.org/10.48550/arXiv.2207.10811","url":null,"abstract":"Advancements in semiconductor technology have reduced dimensions and cost while improving the performance and capacity of chipsets. In addition, advancement in the AI frameworks and libraries brings possibilities to accommodate more AI at the resource-constrained edge of consumer IoT devices. Sensors are nowadays an integral part of our environment which provide continuous data streams to build intelligent applications. An example could be a smart home scenario with multiple interconnected devices. In such smart environments, for convenience and quick access to web-based service and personal information such as calendars, notes, emails, reminders, banking, etc, users link third-party skills or skills from the Amazon store to their smart speakers. Also, in current smart home scenarios, several smart home products such as smart security cameras, video doorbells, smart plugs, smart carbon monoxide monitors, and smart door locks, etc. are interlinked to a modern smart speaker via means of custom skill addition. Since smart speakers are linked to such services and devices via the smart speaker user's account. They can be used by anyone with physical access to the smart speaker via voice commands. If done so, the data privacy, home security and other aspects of the user get compromised. Recently launched, Tensor Cam's AI Camera, Toshiba's Symbio, Facebook's Portal are camera-enabled smart speakers with AI functionalities. Although they are camera-enabled, yet they do not have an authentication scheme in addition to calling out the wake-word. This paper provides an overview of cybersecurity risks faced by smart speaker users due to lack of authentication scheme and discusses the development of a state-of-the-art camera-enabled, microphone array-based modern Alexa smart speaker prototype to address these risks.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122459394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-06DOI: 10.1007/978-3-031-26438-2_1
Zachary Dair, S. Dockray, Ruairi O'Reilly
{"title":"Inter and Intra Signal Variance in Feature Extraction and Classification of Affective State","authors":"Zachary Dair, S. Dockray, Ruairi O'Reilly","doi":"10.1007/978-3-031-26438-2_1","DOIUrl":"https://doi.org/10.1007/978-3-031-26438-2_1","url":null,"abstract":"","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121911297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Moral dumbfounding occurs when people defend a moral judgement even though they cannot provide a reason in support of this judgement. It manifests as an admission of not having reasons, or the use of unsupported declarations (“it’s just wrong”) or tautological reasons (“because it’s incest”) as justifications for a judgment. We test a dual-processes explanation of moral dumbfounding, where moral dumbfounding is an example of conflict between a habitual response (making a judgement) and a response that results from deliberation (providing a reason for the judgement). The dumbfounding paradigm involves three possible responses: (a) providing reasons for a judgement (deliberative/controlled process); (b) accepting the counter-arguments and rating the behaviour as “not wrong” (habitual/automatic process); (c) a dumbfounded response (habitual/automatic process). Cognitive load manipulations have been shown to inhibit deliberative responding. We present 5 studies in which dumbfounded responding was investigated under cognitive load manipulations. We hypothesised that rates of providing reasons would be reduced under cognitive load. The identification of reasons was inhibited in Studies 1 and 3, but not in Studies 2, 4 or 5. The results provide weak evidence for a dual-process explanation of moral dumbfounding. We found some evidence that dumbfounded responding may be linked with Need for Cognition.
{"title":"Manipulating Moral Dumbfounding: Inhibiting the Identification of Reasons","authors":"Cillian McHugh, M. McGann, E. Igou, E. Kinsella","doi":"10.31234/osf.io/e5gj7","DOIUrl":"https://doi.org/10.31234/osf.io/e5gj7","url":null,"abstract":"Moral dumbfounding occurs when people defend a moral judgement even though they cannot provide a reason in support of this judgement. It manifests as an admission of not having reasons, or the use of unsupported declarations (“it’s just wrong”) or tautological reasons (“because it’s incest”) as justifications for a judgment. We test a dual-processes explanation of moral dumbfounding, where moral dumbfounding is an example of conflict between a habitual response (making a judgement) and a response that results from deliberation (providing a reason for the judgement). The dumbfounding paradigm involves three possible responses: (a) providing reasons for a judgement (deliberative/controlled process); (b) accepting the counter-arguments and rating the behaviour as “not wrong” (habitual/automatic process); (c) a dumbfounded response (habitual/automatic process). Cognitive load manipulations have been shown to inhibit deliberative responding. We present 5 studies in which dumbfounded responding was investigated under cognitive load manipulations. We hypothesised that rates of providing reasons would be reduced under cognitive load. The identification of reasons was inhibited in Studies 1 and 3, but not in Studies 2, 4 or 5. The results provide weak evidence for a dual-process explanation of moral dumbfounding. We found some evidence that dumbfounded responding may be linked with Need for Cognition.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124835901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-08-19DOI: 10.1007/978-3-642-17080-5_30
S. Berkovsky, J. Freyne, Mac Coombe
{"title":"Physical Activity Motivating Games","authors":"S. Berkovsky, J. Freyne, Mac Coombe","doi":"10.1007/978-3-642-17080-5_30","DOIUrl":"https://doi.org/10.1007/978-3-642-17080-5_30","url":null,"abstract":"","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121693626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-08-19DOI: 10.1007/978-3-642-17080-5_28
Aidan Waugh, D. Bridge
{"title":"An Evaluation of the GhostWriter System for Case-Based Content Suggestions","authors":"Aidan Waugh, D. Bridge","doi":"10.1007/978-3-642-17080-5_28","DOIUrl":"https://doi.org/10.1007/978-3-642-17080-5_28","url":null,"abstract":"","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126221470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}