Wenbo Ge, Pooia Lalbakhsh, L. Isai, Artem Lenskiy, H. Suominen
Volatility forecasting is an important aspect of finance as it dictates many decisions of market players. A snapshot of state-of-the-art neural network–based financial volatility forecasting was generated by examining 35 studies, published after 2015. Several issues were identified, such as the inability for easy and meaningful comparisons, and the large gap between modern machine learning models and those applied to volatility forecasting. A shared task was proposed to evaluate state-of-the-art models, and several promising ways to bridge the gap were suggested. Finally, adequate background was provided to serve as an introduction to the field of neural network volatility forecasting.
{"title":"Neural Network–Based Financial Volatility Forecasting: A Systematic Review","authors":"Wenbo Ge, Pooia Lalbakhsh, L. Isai, Artem Lenskiy, H. Suominen","doi":"10.1145/3483596","DOIUrl":"https://doi.org/10.1145/3483596","url":null,"abstract":"Volatility forecasting is an important aspect of finance as it dictates many decisions of market players. A snapshot of state-of-the-art neural network–based financial volatility forecasting was generated by examining 35 studies, published after 2015. Several issues were identified, such as the inability for easy and meaningful comparisons, and the large gap between modern machine learning models and those applied to volatility forecasting. A shared task was proposed to evaluate state-of-the-art models, and several promising ways to bridge the gap were suggested. Finally, adequate background was provided to serve as an introduction to the field of neural network volatility forecasting.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"35 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91090576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alison Fernandez Blanco, Alexandre Bergel, Juan Pablo Sandoval Alcocer
Understanding and optimizing memory usage of software applications is a difficult task, usually involving the analysis of large amounts of memory-related complex data. Over the years, numerous software visualizations have been proposed to help developers analyze the memory usage information of their programs. This article reports a systematic literature review of published works centered on software visualizations for analyzing the memory consumption of programs. We have systematically selected 46 articles and categorized them based on the tasks supported, data collected, visualization techniques, evaluations conducted, and prototype availability. As a result, we introduce a taxonomy based on these five dimensions to identify the main challenges of visualizing memory consumption and opportunities for improvement. Despite the effort to evaluate visualizations, we also find that most articles lack evidence regarding how these visualizations perform in practice. We also highlight that few articles are available for developers willing to adopt a visualization for memory consumption analysis. Additionally, we describe a number of research areas that are worth exploring.
{"title":"Software Visualizations to Analyze Memory Consumption: A Literature Review","authors":"Alison Fernandez Blanco, Alexandre Bergel, Juan Pablo Sandoval Alcocer","doi":"10.1145/3485134","DOIUrl":"https://doi.org/10.1145/3485134","url":null,"abstract":"Understanding and optimizing memory usage of software applications is a difficult task, usually involving the analysis of large amounts of memory-related complex data. Over the years, numerous software visualizations have been proposed to help developers analyze the memory usage information of their programs. This article reports a systematic literature review of published works centered on software visualizations for analyzing the memory consumption of programs. We have systematically selected 46 articles and categorized them based on the tasks supported, data collected, visualization techniques, evaluations conducted, and prototype availability. As a result, we introduce a taxonomy based on these five dimensions to identify the main challenges of visualizing memory consumption and opportunities for improvement. Despite the effort to evaluate visualizations, we also find that most articles lack evidence regarding how these visualizations perform in practice. We also highlight that few articles are available for developers willing to adopt a visualization for memory consumption analysis. Additionally, we describe a number of research areas that are worth exploring.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"31 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90665269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farkhanda Zafar, Hasan Ali Khattak, M. Aloqaily, Rasheed Hussain
Owing to the advancements in communication and computation technologies, the dream of commercialized connected and autonomous cars is becoming a reality. However, among other challenges such as environmental pollution, cost, maintenance, security, and privacy, the ownership of vehicles (especially for Autonomous Vehicles) is the major obstacle in the realization of this technology at the commercial level. Furthermore, the business model of pay-as-you-go type services further attracts the consumer, because there is no need for upfront investment. In this vein, the idea of car-sharing (aka carpooling) is getting ground due to, at least in part, its simplicity, cost-effectiveness, and affordable choice of transportation. Carpooling systems are still in their infancy and face challenges such as scheduling, matching passengers interests, business model, security, privacy, and communication. To date, a plethora of research work has already been done covering different aspects of carpooling services (ranging from applications to communication and technologies); however, there is still a lack of a holistic, comprehensive survey that can be a one-stop-shop for the researchers in this area to (i) find all the relevant information and (ii) identify the future research directions. To fill these research challenges, this article provides a comprehensive survey on carpooling in autonomous and connected vehicles and covers architecture, components, and solutions, including scheduling, matching, mobility, pricing models of carpooling. We also discuss the current challenges in carpooling and identify future research directions. This survey is aimed to spur further discussion among the research community for the effective realization of carpooling.
{"title":"Carpooling in Connected and Autonomous Vehicles: Current Solutions and Future Directions","authors":"Farkhanda Zafar, Hasan Ali Khattak, M. Aloqaily, Rasheed Hussain","doi":"10.1145/3501295","DOIUrl":"https://doi.org/10.1145/3501295","url":null,"abstract":"Owing to the advancements in communication and computation technologies, the dream of commercialized connected and autonomous cars is becoming a reality. However, among other challenges such as environmental pollution, cost, maintenance, security, and privacy, the ownership of vehicles (especially for Autonomous Vehicles) is the major obstacle in the realization of this technology at the commercial level. Furthermore, the business model of pay-as-you-go type services further attracts the consumer, because there is no need for upfront investment. In this vein, the idea of car-sharing (aka carpooling) is getting ground due to, at least in part, its simplicity, cost-effectiveness, and affordable choice of transportation. Carpooling systems are still in their infancy and face challenges such as scheduling, matching passengers interests, business model, security, privacy, and communication. To date, a plethora of research work has already been done covering different aspects of carpooling services (ranging from applications to communication and technologies); however, there is still a lack of a holistic, comprehensive survey that can be a one-stop-shop for the researchers in this area to (i) find all the relevant information and (ii) identify the future research directions. To fill these research challenges, this article provides a comprehensive survey on carpooling in autonomous and connected vehicles and covers architecture, components, and solutions, including scheduling, matching, mobility, pricing models of carpooling. We also discuss the current challenges in carpooling and identify future research directions. This survey is aimed to spur further discussion among the research community for the effective realization of carpooling.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"40 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80584803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabian Fagerholm, M. Felderer, D. Fucci, M. Unterkalmsteiner, Bogdan Marculescu, Markus Martini, Lars Göran Wallgren Tengberg, R. Feldt, Bettina Lehtela, Bal'azs Nagyv'aradi, Jehan Khattak
Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research.
{"title":"Cognition in Software Engineering: A Taxonomy and Survey of a Half-Century of Research","authors":"Fabian Fagerholm, M. Felderer, D. Fucci, M. Unterkalmsteiner, Bogdan Marculescu, Markus Martini, Lars Göran Wallgren Tengberg, R. Feldt, Bettina Lehtela, Bal'azs Nagyv'aradi, Jehan Khattak","doi":"10.1145/3508359","DOIUrl":"https://doi.org/10.1145/3508359","url":null,"abstract":"Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"32 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75005552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. C. Júnior, Domenico Amalfitano, Lina Garcés, A. R. Fasolino, S. Andrade, M. Delamaro
Context: The mobile app market is continually growing offering solutions to almost all aspects of people’s lives, e.g., healthcare, business, entertainment, as well as the stakeholders’ demand for apps that are more secure, portable, easy to use, among other non-functional requirements (NFRs). Therefore, manufacturers should guarantee that their mobile apps achieve high-quality levels. A good strategy is to include software testing and quality assurance activities during the whole life cycle of such solutions. Problem: Systematically warranting NFRs is not an easy task for any software product. Software engineers must take important decisions before adopting testing techniques and automation tools to support such endeavors. Proposal: To provide to the software engineers with a broad overview of existing dynamic techniques and automation tools for testing mobile apps regarding NFRs. Methods: We planned and conducted a Systematic Mapping Study (SMS) following well-established guidelines for executing secondary studies in software engineering. Results: We found 56 primary studies and characterized their contributions based on testing strategies, testing approaches, explored mobile platforms, and the proposed tools. Conclusions: The characterization allowed us to identify and discuss important trends and opportunities that can benefit both academics and practitioners.
{"title":"Dynamic Testing Techniques of Non-functional Requirements in Mobile Apps: A Systematic Mapping Study","authors":"M. C. Júnior, Domenico Amalfitano, Lina Garcés, A. R. Fasolino, S. Andrade, M. Delamaro","doi":"10.1145/3507903","DOIUrl":"https://doi.org/10.1145/3507903","url":null,"abstract":"Context: The mobile app market is continually growing offering solutions to almost all aspects of people’s lives, e.g., healthcare, business, entertainment, as well as the stakeholders’ demand for apps that are more secure, portable, easy to use, among other non-functional requirements (NFRs). Therefore, manufacturers should guarantee that their mobile apps achieve high-quality levels. A good strategy is to include software testing and quality assurance activities during the whole life cycle of such solutions. Problem: Systematically warranting NFRs is not an easy task for any software product. Software engineers must take important decisions before adopting testing techniques and automation tools to support such endeavors. Proposal: To provide to the software engineers with a broad overview of existing dynamic techniques and automation tools for testing mobile apps regarding NFRs. Methods: We planned and conducted a Systematic Mapping Study (SMS) following well-established guidelines for executing secondary studies in software engineering. Results: We found 56 primary studies and characterized their contributions based on testing strategies, testing approaches, explored mobile platforms, and the proposed tools. Conclusions: The characterization allowed us to identify and discuss important trends and opportunities that can benefit both academics and practitioners.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"16 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79441773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several image processing tasks, such as image classification and object detection, have been significantly improved using Convolutional Neural Networks (CNN). Like ResNet and EfficientNet, many architectures have achieved outstanding results in at least one dataset by the time of their creation. A critical factor in training concerns the network’s regularization, which prevents the structure from overfitting. This work analyzes several regularization methods developed in the past few years, showing significant improvements for different CNN models. The works are classified into three main areas: the first one is called “data augmentation,” where all the techniques focus on performing changes in the input data. The second, named “internal changes,” aims to describe procedures to modify the feature maps generated by the neural network or the kernels. The last one, called “label,” concerns transforming the labels of a given input. This work presents two main differences comparing to other available surveys about regularization: (i) the first concerns the papers gathered in the manuscript, which are not older than five years, and (ii) the second distinction is about reproducibility, i.e., all works referred here have their code available in public repositories or they have been directly implemented in some framework, such as TensorFlow or Torch.
{"title":"Avoiding Overfitting: A Survey on Regularization Methods for Convolutional Neural Networks","authors":"C. F. G. Santos, J. Papa","doi":"10.1145/3510413","DOIUrl":"https://doi.org/10.1145/3510413","url":null,"abstract":"Several image processing tasks, such as image classification and object detection, have been significantly improved using Convolutional Neural Networks (CNN). Like ResNet and EfficientNet, many architectures have achieved outstanding results in at least one dataset by the time of their creation. A critical factor in training concerns the network’s regularization, which prevents the structure from overfitting. This work analyzes several regularization methods developed in the past few years, showing significant improvements for different CNN models. The works are classified into three main areas: the first one is called “data augmentation,” where all the techniques focus on performing changes in the input data. The second, named “internal changes,” aims to describe procedures to modify the feature maps generated by the neural network or the kernels. The last one, called “label,” concerns transforming the labels of a given input. This work presents two main differences comparing to other available surveys about regularization: (i) the first concerns the papers gathered in the manuscript, which are not older than five years, and (ii) the second distinction is about reproducibility, i.e., all works referred here have their code available in public repositories or they have been directly implemented in some framework, such as TensorFlow or Torch.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"41 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87066732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claudio Filipi Gonçalves dos Santos, Diego de Souza Oliveira, Leandro A. Passos, Rafael Gonçalves Pires, Daniel Felipe Silva Santos, Lucas Pascotti Valem, Thierry P. Moreira, Marcos Cleison S. Santana, Mateus Roder, Jo Paulo Papa, Danilo Colombo
In general, biometry-based control systems may not rely on individual expected behavior or cooperation to operate appropriately. Instead, such systems should be aware of malicious procedures for unauthorized access attempts. Some works available in the literature suggest addressing the problem through gait recognition approaches. Such methods aim at identifying human beings through intrinsic perceptible features, despite dressed clothes or accessories. Although the issue denotes a relatively long-time challenge, most of the techniques developed to handle the problem present several drawbacks related to feature extraction and low classification rates, among other issues. However, deep learning-based approaches recently emerged as a robust set of tools to deal with virtually any image and computer-vision-related problem, providing paramount results for gait recognition as well. Therefore, this work provides a surveyed compilation of recent works regarding biometric detection through gait recognition with a focus on deep learning approaches, emphasizing their benefits and exposing their weaknesses. Besides, it also presents categorized and characterized descriptions of the datasets, approaches, and architectures employed to tackle associated constraints.
{"title":"Gait Recognition Based on Deep Learning: A Survey","authors":"Claudio Filipi Gonçalves dos Santos, Diego de Souza Oliveira, Leandro A. Passos, Rafael Gonçalves Pires, Daniel Felipe Silva Santos, Lucas Pascotti Valem, Thierry P. Moreira, Marcos Cleison S. Santana, Mateus Roder, Jo Paulo Papa, Danilo Colombo","doi":"10.1145/3490235","DOIUrl":"https://doi.org/10.1145/3490235","url":null,"abstract":"In general, biometry-based control systems may not rely on individual expected behavior or cooperation to operate appropriately. Instead, such systems should be aware of malicious procedures for unauthorized access attempts. Some works available in the literature suggest addressing the problem through gait recognition approaches. Such methods aim at identifying human beings through intrinsic perceptible features, despite dressed clothes or accessories. Although the issue denotes a relatively long-time challenge, most of the techniques developed to handle the problem present several drawbacks related to feature extraction and low classification rates, among other issues. However, deep learning-based approaches recently emerged as a robust set of tools to deal with virtually any image and computer-vision-related problem, providing paramount results for gait recognition as well. Therefore, this work provides a surveyed compilation of recent works regarding biometric detection through gait recognition with a focus on deep learning approaches, emphasizing their benefits and exposing their weaknesses. Besides, it also presents categorized and characterized descriptions of the datasets, approaches, and architectures employed to tackle associated constraints.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"12 1","pages":"1 - 34"},"PeriodicalIF":0.0,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88621039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ms. Aayushi Bansal, Dr. Rewa Sharma, Dr. Mamta Kathuria
Recent advancements in deep learning architecture have increased its utility in real-life applications. Deep learning models require a large amount of data to train the model. In many application domains, there is a limited set of data available for training neural networks as collecting new data is either not feasible or requires more resources such as in marketing, computer vision, and medical science. These models require a large amount of data to avoid the problem of overfitting. One of the data space solutions to the problem of limited data is data augmentation. The purpose of this study focuses on various data augmentation techniques that can be used to further improve the accuracy of a neural network. This saves the cost and time consumption required to collect new data for the training of deep neural networks by augmenting available data. This also regularizes the model and improves its capability of generalization. The need for large datasets in different fields such as computer vision, natural language processing, security, and healthcare is also covered in this survey paper. The goal of this paper is to provide a comprehensive survey of recent advancements in data augmentation techniques and their application in various domains.
{"title":"A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications","authors":"Ms. Aayushi Bansal, Dr. Rewa Sharma, Dr. Mamta Kathuria","doi":"10.1145/3502287","DOIUrl":"https://doi.org/10.1145/3502287","url":null,"abstract":"Recent advancements in deep learning architecture have increased its utility in real-life applications. Deep learning models require a large amount of data to train the model. In many application domains, there is a limited set of data available for training neural networks as collecting new data is either not feasible or requires more resources such as in marketing, computer vision, and medical science. These models require a large amount of data to avoid the problem of overfitting. One of the data space solutions to the problem of limited data is data augmentation. The purpose of this study focuses on various data augmentation techniques that can be used to further improve the accuracy of a neural network. This saves the cost and time consumption required to collect new data for the training of deep neural networks by augmenting available data. This also regularizes the model and improves its capability of generalization. The need for large datasets in different fields such as computer vision, natural language processing, security, and healthcare is also covered in this survey paper. The goal of this paper is to provide a comprehensive survey of recent advancements in data augmentation techniques and their application in various domains.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"248 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74501978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decision trees have the particularity of being machine learning models that are visually easy to interpret and understand. Therefore, they are primarily suited for sensitive domains like medical diagnosis, where decisions need to be explainable. However, if used on complex problems, then decision trees can become large, making them hard to grasp. In addition to this aspect, when learning decision trees, it may be necessary to consider a broader class of constraints, such as the fact that two variables should not be used in a single branch of the tree. This motivates the need to enforce constraints in learning algorithms of decision trees. We propose a survey of works that attempted to solve the problem of learning decision trees under constraints. Our contributions are fourfold. First, to the best of our knowledge, this is the first survey that deals with constraints on decision trees. Second, we define a flexible taxonomy of constraints applied to decision trees and methods for their treatment in the literature. Third, we benchmark state-of-the art depth-constrained decision tree learners with respect to predictive accuracy and computational time. Fourth, we discuss potential future research directions that would be of interest for researchers who wish to conduct research in this field.
{"title":"Constraint Enforcement on Decision Trees: A Survey","authors":"Géraldin Nanfack, Paul Temple, Benoît Frénay","doi":"10.1145/3506734","DOIUrl":"https://doi.org/10.1145/3506734","url":null,"abstract":"Decision trees have the particularity of being machine learning models that are visually easy to interpret and understand. Therefore, they are primarily suited for sensitive domains like medical diagnosis, where decisions need to be explainable. However, if used on complex problems, then decision trees can become large, making them hard to grasp. In addition to this aspect, when learning decision trees, it may be necessary to consider a broader class of constraints, such as the fact that two variables should not be used in a single branch of the tree. This motivates the need to enforce constraints in learning algorithms of decision trees. We propose a survey of works that attempted to solve the problem of learning decision trees under constraints. Our contributions are fourfold. First, to the best of our knowledge, this is the first survey that deals with constraints on decision trees. Second, we define a flexible taxonomy of constraints applied to decision trees and methods for their treatment in the literature. Third, we benchmark state-of-the art depth-constrained decision tree learners with respect to predictive accuracy and computational time. Fourth, we discuss potential future research directions that would be of interest for researchers who wish to conduct research in this field.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"131 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77418305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huge amounts of unstructured data including image, video, audio, and text are ubiquitously generated and shared, and it is a challenge to protect sensitive personal information in them, such as human faces, voiceprints, and authorships. Differential privacy is the standard privacy protection technology that provides rigorous privacy guarantees for various data. This survey summarizes and analyzes differential privacy solutions to protect unstructured data content before it is shared with untrusted parties. These differential privacy methods obfuscate unstructured data after they are represented with vectors and then reconstruct them with obfuscated vectors. We summarize specific privacy models and mechanisms together with possible challenges in them. We also discuss their privacy guarantees against AI attacks and utility losses. Finally, we discuss several possible directions for future research.
{"title":"A Survey on Differential Privacy for Unstructured Data Content","authors":"Ying Zhao, Jinjun Chen","doi":"10.1145/3490237","DOIUrl":"https://doi.org/10.1145/3490237","url":null,"abstract":"Huge amounts of unstructured data including image, video, audio, and text are ubiquitously generated and shared, and it is a challenge to protect sensitive personal information in them, such as human faces, voiceprints, and authorships. Differential privacy is the standard privacy protection technology that provides rigorous privacy guarantees for various data. This survey summarizes and analyzes differential privacy solutions to protect unstructured data content before it is shared with untrusted parties. These differential privacy methods obfuscate unstructured data after they are represented with vectors and then reconstruct them with obfuscated vectors. We summarize specific privacy models and mechanisms together with possible challenges in them. We also discuss their privacy guarantees against AI attacks and utility losses. Finally, we discuss several possible directions for future research.","PeriodicalId":7000,"journal":{"name":"ACM Computing Surveys (CSUR)","volume":"45 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74813125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}