Artificial Intelligence (AI) is widely used within the healthcare domain. One of the branches of digital health concerns the design and development of digital assistant solutions. AI-enabled digital assistants highlighted the need to be trustworthy given their intrusiveness within people’s lives. Such solutions aim to provide intelligent tools to ease the management of care pathways or to enhance the capabilities of healthcare organizations in deploying health prevention campaigns by monitoring the lifestyles of healthy people. In this work, we intend to analyze the recent literature concerning integrating AI techniques within digital assistants. We focused on the contribution published during the last ten years and we performed a careful analysis of whether and how trustworthy pillars have been addressed. We also discuss the risks of designing digital assistants without considering trustworthy pillars and present some recommendations to mitigate them.
{"title":"A Review on Trustworthiness of Digital Assistants for Personal Healthcare","authors":"Tania Bailoni, Mauro Dragoni","doi":"10.1145/3714999","DOIUrl":"https://doi.org/10.1145/3714999","url":null,"abstract":"Artificial Intelligence (AI) is widely used within the healthcare domain. One of the branches of digital health concerns the design and development of digital assistant solutions. AI-enabled digital assistants highlighted the need to be trustworthy given their intrusiveness within people’s lives. Such solutions aim to provide intelligent tools to ease the management of care pathways or to enhance the capabilities of healthcare organizations in deploying health prevention campaigns by monitoring the lifestyles of healthy people. In this work, we intend to analyze the recent literature concerning integrating AI techniques within digital assistants. We focused on the contribution published during the last ten years and we performed a careful analysis of whether and how trustworthy pillars have been addressed. We also discuss the risks of designing digital assistants without considering trustworthy pillars and present some recommendations to mitigate them.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"6 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143020701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning tools, especially deep generative models (DGMs), provide opportunities to accelerate and simplify the design of drugs. As drug candidates, peptides are superior to other biomolecules because they combine potency, selectivity, and low toxicity. This review examines the fundamental aspects of current DGMs for designing therapeutic peptide sequences. First, relevant databases in this field are introduced. Next, the current situation of data representation and where it can be optimized are discussed. Then, after introducing the basic principles and variants of diverse DGM algorithms, the applications of these methods to design and optimize peptides are stated. Finally, we present several challenges to devising a powerful model that can meet the requirements of learning the different biological properties of peptides, as well as future research directions to address these challenges.
{"title":"Deep Generative Models for Therapeutic Peptide Discovery: A Comprehensive Review","authors":"Leshan Lai, Yuansheng Liu, Bosheng Song, Keqin Li, Xiangxiang Zeng","doi":"10.1145/3714455","DOIUrl":"https://doi.org/10.1145/3714455","url":null,"abstract":"Deep learning tools, especially deep generative models (DGMs), provide opportunities to accelerate and simplify the design of drugs. As drug candidates, peptides are superior to other biomolecules because they combine potency, selectivity, and low toxicity. This review examines the fundamental aspects of current DGMs for designing therapeutic peptide sequences. First, relevant databases in this field are introduced. Next, the current situation of data representation and where it can be optimized are discussed. Then, after introducing the basic principles and variants of diverse DGM algorithms, the applications of these methods to design and optimize peptides are stated. Finally, we present several challenges to devising a powerful model that can meet the requirements of learning the different biological properties of peptides, as well as future research directions to address these challenges.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"45 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Attributed graphs with both topological information and node information have prevalent applications in the real world, including recommendation systems, biological networks, community analysis, and so on. Recently, with rapid development of information gathering and extraction technology, the sources of data become more extensive and multi-view data attracts growing attention. Consequently, attributed graphs can be divided into two categories: single-view attributed graphs and multi-view attributed graphs. Compared with single-view attributed graphs, multi-view attributed graphs can provide more complementary information but also pose challenges to fusing information of multi-views. Moreover, attributed graph clustering aims to reveal the inherent community structure of the graph, which is widely applied in fraud detection, crime recognition, and recommendation systems. Recently, numerous methods based on various ideas and techniques have appeared to cluster attributed graphs, thus there is an urgent need to summarize related methods. To this end, we make a timely and comprehensive review of recent methods. Furthermore, we provide a novel standard according to fusion results to classify related methods into three categories: Fusion on adjacency matrix methods, Fusion on embedding methods, and Model-based methods. Moreover, to conduct a comprehensive evaluation of existing methods, this paper evaluates these advanced methods with sufficient experimental results and theoretical analysis. Finally, we analyze the challenges and open opportunities to promote the future development of this field.
{"title":"Clustering on Attributed Graphs: From Single-view to Multi-view","authors":"Mengyao Li, Zhibang Yang, Xu Zhou, Yixiang Fang, Kenli Li, Keqin Li","doi":"10.1145/3714407","DOIUrl":"https://doi.org/10.1145/3714407","url":null,"abstract":"Attributed graphs with both topological information and node information have prevalent applications in the real world, including recommendation systems, biological networks, community analysis, and so on. Recently, with rapid development of information gathering and extraction technology, the sources of data become more extensive and multi-view data attracts growing attention. Consequently, attributed graphs can be divided into two categories: single-view attributed graphs and multi-view attributed graphs. Compared with single-view attributed graphs, multi-view attributed graphs can provide more complementary information but also pose challenges to fusing information of multi-views. Moreover, attributed graph clustering aims to reveal the inherent community structure of the graph, which is widely applied in fraud detection, crime recognition, and recommendation systems. Recently, numerous methods based on various ideas and techniques have appeared to cluster attributed graphs, thus there is an urgent need to summarize related methods. To this end, we make a timely and comprehensive review of recent methods. Furthermore, we provide a novel standard according to fusion results to classify related methods into three categories: Fusion on adjacency matrix methods, Fusion on embedding methods, and Model-based methods. Moreover, to conduct a comprehensive evaluation of existing methods, this paper evaluates these advanced methods with sufficient experimental results and theoretical analysis. Finally, we analyze the challenges and open opportunities to promote the future development of this field.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"81 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Mustafa Haider Rizvi, Ramsha Imran, Arif Mahmood
Text classification is a quintessential and practical problem in natural language processing with applications in diverse domains such as sentiment analysis, fake news detection, medical diagnosis, and document classification. A sizable body of recent works exists where researchers have studied and tackled text classification from different angles with varying degrees of success. Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade with many implementations achieving state-of-the-art performance in more recent literature and thus, warranting the need for an updated survey. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision. It identifies their strengths and limitations and compares their performance on various benchmark datasets. We also discuss future research directions and the challenges that exist in this domain.
{"title":"Text Classification Using Graph Convolutional Networks: A Comprehensive Survey","authors":"Syed Mustafa Haider Rizvi, Ramsha Imran, Arif Mahmood","doi":"10.1145/3714456","DOIUrl":"https://doi.org/10.1145/3714456","url":null,"abstract":"Text classification is a quintessential and practical problem in natural language processing with applications in diverse domains such as sentiment analysis, fake news detection, medical diagnosis, and document classification. A sizable body of recent works exists where researchers have studied and tackled text classification from different angles with varying degrees of success. Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade with many implementations achieving state-of-the-art performance in more recent literature and thus, warranting the need for an updated survey. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision. It identifies their strengths and limitations and compares their performance on various benchmark datasets. We also discuss future research directions and the challenges that exist in this domain.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"29 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: emerging application domains, differentially private generative models, auditing and evaluation methods for private models, protection against a broad range of threats and attacks, and improvements of privacy-utility trade-offs.
{"title":"Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey","authors":"Lea Demelius, Roman Kern, Andreas Trügler","doi":"10.1145/3712000","DOIUrl":"https://doi.org/10.1145/3712000","url":null,"abstract":"Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: emerging application domains, differentially private generative models, auditing and evaluation methods for private models, protection against a broad range of threats and attacks, and improvements of privacy-utility trade-offs.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"74 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143020696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huayu Chen, Junxiang Li, Huanhuan He, Jing Zhu, Shuting Sun, Xiaowei Li, Bin Hu
Electroencephalogram(EEG)-based affective computing aims to recognize the emotional state, which is the core technology of affective brain-computer interface(aBCI). This concept encompasses aspects of physiological computing, human-computer interaction(HCI), mental health care, and brain-computer interfaces(BCI), presenting significant theoretical and practical value. However, the field reached a bottleneck stage due to EEG individual difference issues, causing various challenges to achieve a fundamental aBCI. In this review, we collected some representative works from 2019 to 2023. Combining the historical exploration process and research approaches of EEG-based emotion recognition, a comprehensive understand of current research status was conducted. Furthermore, we analyzed the main obstacles for emotion recognition modeling. To construct a reasonable aBCI, we envisioned the working scenarios, developmental stages, and key impact factors based on the existing EEG physiology knowledge. From the practical application perspective, we evaluated the theoretical significance, implementation difficulty, and real-world limitations of different approaches. By synthesizing the merits and drawbacks of various techniques, we proposed a theoretically feasible aBCI framework under the restrictions of real-world application scenarios. Finally, we suggested several research topics that have not been thoroughly investigated to broaden the research scope and accelerate the development of aBCIs.
{"title":"Toward the Construction of Affective Brain-Computer Interface: A Systematic Review","authors":"Huayu Chen, Junxiang Li, Huanhuan He, Jing Zhu, Shuting Sun, Xiaowei Li, Bin Hu","doi":"10.1145/3712259","DOIUrl":"https://doi.org/10.1145/3712259","url":null,"abstract":"Electroencephalogram(EEG)-based affective computing aims to recognize the emotional state, which is the core technology of affective brain-computer interface(aBCI). This concept encompasses aspects of physiological computing, human-computer interaction(HCI), mental health care, and brain-computer interfaces(BCI), presenting significant theoretical and practical value. However, the field reached a bottleneck stage due to EEG individual difference issues, causing various challenges to achieve a fundamental aBCI. In this review, we collected some representative works from 2019 to 2023. Combining the historical exploration process and research approaches of EEG-based emotion recognition, a comprehensive understand of current research status was conducted. Furthermore, we analyzed the main obstacles for emotion recognition modeling. To construct a reasonable aBCI, we envisioned the working scenarios, developmental stages, and key impact factors based on the existing EEG physiology knowledge. From the practical application perspective, we evaluated the theoretical significance, implementation difficulty, and real-world limitations of different approaches. By synthesizing the merits and drawbacks of various techniques, we proposed a theoretically feasible aBCI framework under the restrictions of real-world application scenarios. Finally, we suggested several research topics that have not been thoroughly investigated to broaden the research scope and accelerate the development of aBCIs.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"31 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The multimodal interplay of the five fundamental senses—Sight, Hearing, Smell, Taste, and Touch—provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this paper, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this paper analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.
{"title":"A Survey of Multimodal Learning: Methods, Applications, and Future","authors":"Yuan Yuan, Zhaojian Li, Bin Zhao","doi":"10.1145/3713070","DOIUrl":"https://doi.org/10.1145/3713070","url":null,"abstract":"The multimodal interplay of the five fundamental senses—Sight, Hearing, Smell, Taste, and Touch—provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, multimodal machine learning tries to incorporate different forms of input, such as image, audio, and text, and determine their fundamental connections through joint modeling. As one of the future development forms of artificial intelligence, it is necessary to summarize the progress of multimodal machine learning. In this paper, we start with the form of a multimodal combination and provide a comprehensive survey of the emerging subject of multimodal machine learning, covering representative research approaches, the most recent advancements, and their applications. Specifically, this paper analyzes the relationship between different modalities in detail and sorts out the key issues in multimodal research from the application scenarios. Besides, we thoroughly reviewed state-of-the-art methods and datasets covered in multimodal learning research. We then identify the substantial challenges and potential developing directions in this field. Finally, given the comprehensive nature of this survey, both modality-specific and task-specific researchers can benefit from this survey and advance the field.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"205 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Process mining enables organizations to analyze the data stored in their information systems and derive insights regarding their business processes. However, raw data needs to be converted into a format that can be fed into process mining algorithms. Various pre-analysis activities can be performed on the raw data, such as imperfection removal or granularity level change. Although pre-analysis activities play a crucial role in process mining, there is currently a limited overview available regarding their scope and the extent of their examination. This study presents a systematic literature review of the pre-analysis activities in process mining projects. To better understand this stage and its current state of research, we explore which activities constitute the pre-analysis stage, their goals, the applied research methodologies, the proposed research outcomes, and the data used to evaluate the research outcomes. We identify 15 pre-analysis activities and concepts, e.g., data extraction, generation, and cleaning. We also discover that design science research is the methodology and methods that are the primary research outcome in previous studies. We also realize that the proposed outcomes have been evaluated using only real-life data most of the time. This study reveals that research on pre-analysis is a growing field of interest in process mining.
{"title":"Getting the Data in Shape for Your Process Mining Analysis: An In-Depth Analysis of the Pre-Analysis Stage","authors":"Shameer K. Pradhan, Mieke Jans, Niels Martin","doi":"10.1145/3712587","DOIUrl":"https://doi.org/10.1145/3712587","url":null,"abstract":"Process mining enables organizations to analyze the data stored in their information systems and derive insights regarding their business processes. However, raw data needs to be converted into a format that can be fed into process mining algorithms. Various pre-analysis activities can be performed on the raw data, such as imperfection removal or granularity level change. Although pre-analysis activities play a crucial role in process mining, there is currently a limited overview available regarding their scope and the extent of their examination. This study presents a systematic literature review of the pre-analysis activities in process mining projects. To better understand this stage and its current state of research, we explore which activities constitute the pre-analysis stage, their goals, the applied research methodologies, the proposed research outcomes, and the data used to evaluate the research outcomes. We identify 15 pre-analysis activities and concepts, e.g., data extraction, generation, and cleaning. We also discover that design science research is the methodology and methods that are the primary research outcome in previous studies. We also realize that the proposed outcomes have been evaluated using only real-life data most of the time. This study reveals that research on pre-analysis is a growing field of interest in process mining.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"15 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikel Ngueajio, Saurav Aryal, Marcellin Atemkeng, Gloria Washington, Danda Rawat
This survey emphasizes the significance of Explainable AI (XAI) techniques in detecting hateful speech and misinformation/Fake news. It explores recent trends in detecting these phenomena, highlighting current research that reveals a synergistic relationship between them. Additionally, it presents recent trends in the use of XAI methods to mitigate the occurrences of hateful land Fake contents in conversations. The survey reviews state-of-the-art XAI approaches, algorithms, modeling datasets, as well as the evaluation metrics leveraged for assessing model interpretability, and thus provides a comprehensive summary table of the literature surveyed and relevant datasets. It concludes with an overview of key observations, offering insights into the prominent model explainability methods used in hate speech and misinformation detection. The research strengths, limitations are also presented, as well as perspectives and suggestions for future directions in this research domain.
{"title":"Decoding Fake News and Hate Speech: A Survey of Explainable AI Techniques","authors":"Mikel Ngueajio, Saurav Aryal, Marcellin Atemkeng, Gloria Washington, Danda Rawat","doi":"10.1145/3711123","DOIUrl":"https://doi.org/10.1145/3711123","url":null,"abstract":"This survey emphasizes the significance of Explainable AI (XAI) techniques in detecting hateful speech and misinformation/Fake news. It explores recent trends in detecting these phenomena, highlighting current research that reveals a synergistic relationship between them. Additionally, it presents recent trends in the use of XAI methods to mitigate the occurrences of hateful land Fake contents in conversations. The survey reviews state-of-the-art XAI approaches, algorithms, modeling datasets, as well as the evaluation metrics leveraged for assessing model interpretability, and thus provides a comprehensive summary table of the literature surveyed and relevant datasets. It concludes with an overview of key observations, offering insights into the prominent model explainability methods used in hate speech and misinformation detection. The research strengths, limitations are also presented, as well as perspectives and suggestions for future directions in this research domain.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"30 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims to detect and describe different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). In this survey, we discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, current challenges in the field are highlighted along with observatory remarks and future trends in the field.
{"title":"Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols","authors":"Iqra Qasim, Alexander Horsch, Dilip Prasad","doi":"10.1145/3712059","DOIUrl":"https://doi.org/10.1145/3712059","url":null,"abstract":"Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims to detect and describe different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). In this survey, we discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, current challenges in the field are highlighted along with observatory remarks and future trends in the field.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"22 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}