Semi-supervised learning is a potential solution for improving training data in low-resourced abusive language detection contexts such as South African abusive language detection on Twitter. However, the existing semi-supervised learning methods have been skewed towards small amounts of labelled data, with small feature space. This paper, therefore, presents a semi-supervised learning technique that improves the distribution of training data by assigning labels to unlabelled data based on the majority voting over different feature sets of labelled and unlabelled data clusters. The technique is applied to South African English corpora consisting of labelled and unlabelled abusive tweets. The proposed technique is compared with state-of-the-art self-learning and active learning techniques based on syntactic and semantic features. The performance of these techniques with Logistic Regression, Support Vector Machine and Neural Networks are evaluated. The proposed technique, with accuracy and F1-score of 0.97 and 0.95, respectively, outperforms existing semi-supervised learning techniques. The learning curves show that the training data was used more efficiently by the proposed technique compared to existing techniques. Overall, n-gram syntactic features with a Logistic Regression classifier records the highest performance. The paper concludes that the proposed semi-supervised learning technique effectively detected implicit and explicit South African abusive language on Twitter.
{"title":"Improved semi-supervised learning technique for automatic detection of South African abusive language on Twitter","authors":"O. Oriola, E. Kotzé","doi":"10.18489/sacj.v32i2.847","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.847","url":null,"abstract":"Semi-supervised learning is a potential solution for improving training data in low-resourced abusive language detection contexts such as South African abusive language detection on Twitter. However, the existing semi-supervised learning methods have been skewed towards small amounts of labelled data, with small feature space. This paper, therefore, presents a semi-supervised learning technique that improves the distribution of training data by assigning labels to unlabelled data based on the majority voting over different feature sets of labelled and unlabelled data clusters. The technique is applied to South African English corpora consisting of labelled and unlabelled abusive tweets. The proposed technique is compared with state-of-the-art self-learning and active learning techniques based on syntactic and semantic features. The performance of these techniques with Logistic Regression, Support Vector Machine and Neural Networks are evaluated. The proposed technique, with accuracy and F1-score of 0.97 and 0.95, respectively, outperforms existing semi-supervised learning techniques. The learning curves show that the training data was used more efficiently by the proposed technique compared to existing techniques. Overall, n-gram syntactic features with a Logistic Regression classifier records the highest performance. The paper concludes that the proposed semi-supervised learning technique effectively detected implicit and explicit South African abusive language on Twitter.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43745529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
South African Computer Journal has not passed through the Covid-19 pandemic unscathed. Fortunately, at time of writing, none of our editors has contracted the virus. However, we have all lost time to unaccustomed activities like online courses. Another problem many academics have had is large numbers of plagiarism cases, arising from having to switch fast to online learning with inadequate time to prepare. These problems pale into insignificance compared with the massive socioeconomic destruction around the world. Despite all this, we are able to publish a second issue to schedule in December. In this issue, most of the papers are extended papers from the inaugural Artificial Intelligence research conference, Forum on AI Research (FAIR), held in Cape Town, South Africa over 3–6 December 2019. I therefore defer the main part of editorialising on content of the issue to the guest editors, Deshendran Moodley and Marelie Davel.
{"title":"Editorial: More Covid","authors":"P. Machanick","doi":"10.18489/sacj.v32i2.916","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.916","url":null,"abstract":"South African Computer Journal has not passed through the Covid-19 pandemic unscathed. Fortunately, at time of writing, none of our editors has contracted the virus. However, we have all lost time to unaccustomed activities like online courses. Another problem many academics have had is large numbers of plagiarism cases, arising from having to switch fast to online learning with inadequate time to prepare. These problems pale into insignificance compared with the massive socioeconomic destruction around the world. Despite all this, we are able to publish a second issue to schedule in December. In this issue, most of the papers are extended papers from the inaugural Artificial Intelligence research conference, Forum on AI Research (FAIR), held in Cape Town, South Africa over 3–6 December 2019. I therefore defer the main part of editorialising on content of the issue to the guest editors, Deshendran Moodley and Marelie Davel.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43839244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many novice programmers fail to comprehend source code and its related concepts in the same way that their instructors do. As emphasised in the Decoding the Disciplines (DtDs) framework, each discipline (including Computer Science) has its own unique set of mental operations. However, instructors often take certain important mental operations for granted and do not explain these 'hidden' steps explicitly when modelling problem solutions. A clear understanding of the underlying cognitive processes and related support strategies employed by experts during source code comprehension (SCC) could ultimately be utilised to help novice programmers to better execute the cognitive processes necessary to efficiently comprehend source code. Positioned within Step 2 of the DtDs framework, this study employed decoding interviews and observations, followed by narrative data analysis, to identify the underlying cognitive processes and related support (though often 'hidden') strategies utilised by a select group of experienced programming instructors during an SCC task. The insights gained were then used to formulate a set of important cognitive-related support strategies for efficient SCC. Programming instructors are encouraged to continuously emphasise strategies like these when modelling their expert ways of thinking regarding efficient SCC more explicitly to their novice students.
{"title":"Decoding the underlying cognitive processes and related support strategies utilised by expert instructors during source code comprehension","authors":"Pakiso J. Khomokhoana, Liezel Nel","doi":"10.18489/sacj.v32i2.811","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.811","url":null,"abstract":"Many novice programmers fail to comprehend source code and its related concepts in the same way that their instructors do. As emphasised in the Decoding the Disciplines (DtDs) framework, each discipline (including Computer Science) has its own unique set of mental operations. However, instructors often take certain important mental operations for granted and do not explain these 'hidden' steps explicitly when modelling problem solutions. A clear understanding of the underlying cognitive processes and related support strategies employed by experts during source code comprehension (SCC) could ultimately be utilised to help novice programmers to better execute the cognitive processes necessary to efficiently comprehend source code. Positioned within Step 2 of the DtDs framework, this study employed decoding interviews and observations, followed by narrative data analysis, to identify the underlying cognitive processes and related support (though often 'hidden') strategies utilised by a select group of experienced programming instructors during an SCC task. The insights gained were then used to formulate a set of important cognitive-related support strategies for efficient SCC. Programming instructors are encouraged to continuously emphasise strategies like these when modelling their expert ways of thinking regarding efficient SCC more explicitly to their novice students.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45831979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
No framework exists that can explain and predict the generalisation ability of deep neural networks in general circumstances. In fact, this question has not been answered for some of the least complicated of neural network architectures: fully-connected feedforward networks with rectified linear activations and a limited number of hidden layers. For such an architecture, we show how adding a summary layer to the network makes it more amenable to analysis, and allows us to define the conditions that are required to guarantee that a set of samples will all be classified correctly. This process does not describe the generalisation behaviour of these networks, but produces a number of metrics that are useful for probing their learning and generalisation behaviour. We support the analytical conclusions with empirical results, both to confirm that the mathematical guarantees hold in practice, and to demonstrate the use of the analysis process.
{"title":"Using Summary Layers to Probe Neural Network Behaviour","authors":"Marelie Hattingh Davel","doi":"10.18489/sacj.v32i2.861","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.861","url":null,"abstract":"No framework exists that can explain and predict the generalisation ability of deep neural networks in general circumstances. In fact, this question has not been answered for some of the least complicated of neural network architectures: fully-connected feedforward networks with rectified linear activations and a limited number of hidden layers. For such an architecture, we show how adding a summary layer to the network makes it more amenable to analysis, and allows us to define the conditions that are required to guarantee that a set of samples will all be classified correctly. This process does not describe the generalisation behaviour of these networks, but produces a number of metrics that are useful for probing their learning and generalisation behaviour. We support the analytical conclusions with empirical results, both to confirm that the mathematical guarantees hold in practice, and to demonstrate the use of the analysis process.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46725987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reinforcement learning has recently experienced increased prominence in the machine learning community. There are many approaches to solving reinforcement learning problems with new techniques developed constantly. When solving problems using reinforcement learning, there are various difficult challenges to overcome. par To ensure progress in the field, benchmarks are important for testing new algorithms and comparing with other approaches. The reproducibility of results for fair comparison is therefore vital in ensuring that improvements are accurately judged. This paper provides an overview of different contributions to reinforcement learning benchmarking and discusses how they can assist researchers to address the challenges facing reinforcement learning. The contributions discussed are the most used and recent in the literature. The paper discusses the contributions in terms of implementation, tasks and provided algorithm implementations with benchmarks. par The survey aims to bring attention to the wide range of reinforcement learning benchmarking tasks available and to encourage research to take place in a standardised manner. Additionally, this survey acts as an overview for researchers not familiar with the different tasks that can be used to develop and test new reinforcement learning algorithms.
{"title":"A survey of benchmarks for reinforcement learning algorithms","authors":"B. Stapelberg, K. Malan","doi":"10.18489/sacj.v32i2.746","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.746","url":null,"abstract":"Reinforcement learning has recently experienced increased prominence in the machine learning community. There are many approaches to solving reinforcement learning problems with new techniques developed constantly. When solving problems using reinforcement learning, there are various difficult challenges to overcome. par To ensure progress in the field, benchmarks are important for testing new algorithms and comparing with other approaches. The reproducibility of results for fair comparison is therefore vital in ensuring that improvements are accurately judged. This paper provides an overview of different contributions to reinforcement learning benchmarking and discusses how they can assist researchers to address the challenges facing reinforcement learning. The contributions discussed are the most used and recent in the literature. The paper discusses the contributions in terms of implementation, tasks and provided algorithm implementations with benchmarks. par The survey aims to bring attention to the wide range of reinforcement learning benchmarking tasks available and to encourage research to take place in a standardised manner. Additionally, this survey acts as an overview for researchers not familiar with the different tasks that can be used to develop and test new reinforcement learning algorithms.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89871418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Datalog is a powerful language that can be used to represent explicit knowledge and compute inferences in knowledge bases. Datalog cannot, however, represent or reason about contradictory rules. This is a limitation as contradictions are often present in domains that contain exceptions. In this paper, we extend Datalog to represent contradictory and defeasible information. We define an approach to efficiently reason about contradictory information in Datalog and show that it satisfies the KLM requirements for a rational consequence relation. We introduce DDLV, a defeasible Datalog reasoning system that implements this approach. Finally, we evaluate the performance of DDLV.
{"title":"DDLV: A System for rational preferential reasoning for datalog","authors":"Michael Harrison, T. Meyer","doi":"10.18489/sacj.v32i2.850","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.850","url":null,"abstract":"Datalog is a powerful language that can be used to represent explicit knowledge and compute inferences in knowledge bases. Datalog cannot, however, represent or reason about contradictory rules. This is a limitation as contradictions are often present in domains that contain exceptions. In this paper, we extend Datalog to represent contradictory and defeasible information. We define an approach to efficiently reason about contradictory information in Datalog and show that it satisfies the KLM requirements for a rational consequence relation. We introduce DDLV, a defeasible Datalog reasoning system that implements this approach. Finally, we evaluate the performance of DDLV.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41500123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Discrete Pulse Transform decomposes a signal into pulses, with the most recent and effective implementation being a graph-base algorithm called the Roadmaker’s Pavage. Even though an efficient implementation, the theoretical structure results in a slow, deterministic algorithm. This paper examines the use of the spectral domain of graphs and designs graph filter banks to downsample the algorithm, investigating the extent to which this speeds up the algorithm. Converting graph signals to the spectral domain is costly, thus estimation for filter banks is examined, as well as the design of a reusable filter bank. The sampled version requires hyperparameters to reconstruct the same textures of the image as the original algorithm, preventing a large scale study. Here an objective and efficient way of deriving similar results between the original and our proposed Filtered Roadmaker’s Pavage is provided. The method makes use of the Ht-index, separating the distribution of information at scale intervals. Empirical research using benchmark datasets provides improved results, showing that using the proposed algorithm consistently runs faster, uses less computational resources, while having a positive SSIM with low variance. This provides an informative and faster approximation to the nonlinear DPT, a property not standardly achievable.
{"title":"Ht-index for empirical evaluation of the sampled graph-based Discrete Pulse Transform","authors":"Mark De Lancey, I. Fabris-Rotelli","doi":"10.18489/sacj.v32i2.849","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.849","url":null,"abstract":"The Discrete Pulse Transform decomposes a signal into pulses, with the most recent and effective implementation being a graph-base algorithm called the Roadmaker’s Pavage. Even though an efficient implementation, the theoretical structure results in a slow, deterministic algorithm. This paper examines the use of the spectral domain of graphs and designs graph filter banks to downsample the algorithm, investigating the extent to which this speeds up the algorithm. Converting graph signals to the spectral domain is costly, thus estimation for filter banks is examined, as well as the design of a reusable filter bank. The sampled version requires hyperparameters to reconstruct the same textures of the image as the original algorithm, preventing a large scale study. Here an objective and efficient way of deriving similar results between the original and our proposed Filtered Roadmaker’s Pavage is provided. The method makes use of the Ht-index, separating the distribution of information at scale intervals. Empirical research using benchmark datasets provides improved results, showing that using the proposed algorithm consistently runs faster, uses less computational resources, while having a positive SSIM with low variance. This provides an informative and faster approximation to the nonlinear DPT, a property not standardly achievable.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48157230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this special issue, we feature selected papers from the inaugural Forum for Artificial Intelligence Research (FAIR), established and hosted by the Centre for Artificial Intelligence Research (CAIR)1. FAIR 2019 was held at the UCT Graduate School of Business Conference Centre in Cape Town, between 3 and 6 December 2019. The Department of Science and Technology’s (DST) latest White Paper on Science, Technology and Innovation (2019) identifies Artificial Intelligence (AI) and advanced Information and Communication Technologies (ICTs) as priority areas for South Africa. It recognises that these technologies will change the way the South African society and economy function. The potential of AI is already being unlocked in key areas of South African society. For example, South Africa’s power utility, Eskom, has identified AI as a future area for research and innovation, and is exploring the use of machine learning for real-time monitoring and fault prediction at their power stations (Bhugwandin et al., 2019). The South African Revenue Service is aggressively building an in-house AI capability for analysing and detecting non-compliance in tax returns (South African Revenue Services, 2020). The South African AI research community has also grown substantially over the last few years. While AI is generally considered to be a subdiscipline of Computer Science (Stone et al., 2016), it is at heart multidisciplinary: active AI research groups in South African universities can be found in Computer Science, Engineering, Philosophy, Information Systems, Statistics and Applied Mathematics departments. Within this context, FAIR was established to provide a venue for South African AI researchers from a broad range of disciplines to meet, interact and publish their work. Research contributions were solicited in five tracks, namely applications of AI, ethics and AI, knowledge representation, machine learning, and other topics in AI. A total of 72 submissions were received, consisting of full papers, work in progress and extended abstracts (of work under review or published elsewhere). Full paper submissions were blind reviewed by at least two independent reviewers from the relevant disciplines and 20 full papers were accepted for publication in the conference proceedings (Davel & Barnard, 2019).
{"title":"Guest Editorial: FAIR 2019 special issue","authors":"Deshendran Moodley, Marelie Hattingh Davel","doi":"10.18489/sacj.v32i2.915","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.915","url":null,"abstract":"In this special issue, we feature selected papers from the inaugural Forum for Artificial Intelligence Research (FAIR), established and hosted by the Centre for Artificial Intelligence Research (CAIR)1. FAIR 2019 was held at the UCT Graduate School of Business Conference Centre in Cape Town, between 3 and 6 December 2019. The Department of Science and Technology’s (DST) latest White Paper on Science, Technology and Innovation (2019) identifies Artificial Intelligence (AI) and advanced Information and Communication Technologies (ICTs) as priority areas for South Africa. It recognises that these technologies will change the way the South African society and economy function. The potential of AI is already being unlocked in key areas of South African society. For example, South Africa’s power utility, Eskom, has identified AI as a future area for research and innovation, and is exploring the use of machine learning for real-time monitoring and fault prediction at their power stations (Bhugwandin et al., 2019). The South African Revenue Service is aggressively building an in-house AI capability for analysing and detecting non-compliance in tax returns (South African Revenue Services, 2020). The South African AI research community has also grown substantially over the last few years. While AI is generally considered to be a subdiscipline of Computer Science (Stone et al., 2016), it is at heart multidisciplinary: active AI research groups in South African universities can be found in Computer Science, Engineering, Philosophy, Information Systems, Statistics and Applied Mathematics departments. Within this context, FAIR was established to provide a venue for South African AI researchers from a broad range of disciplines to meet, interact and publish their work. Research contributions were solicited in five tracks, namely applications of AI, ethics and AI, knowledge representation, machine learning, and other topics in AI. A total of 72 submissions were received, consisting of full papers, work in progress and extended abstracts (of work under review or published elsewhere). Full paper submissions were blind reviewed by at least two independent reviewers from the relevant disciplines and 20 full papers were accepted for publication in the conference proceedings (Davel & Barnard, 2019).","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42741549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feedforward neural networks provide the basis for complex regression models that produce accurate predictions in a variety of applications. However, they generally do not explicitly provide any information about the utility of each of the input parameters in terms of their contribution to model accuracy. With this in mind, we develop the pairwise network, an adaptation to the fully connected feedforward network that allows the ranking of input parameters according to their contribution to model output. The application is demonstrated in the context of a space physics problem. Geomagnetic storms are multi-day events characterised by significant perturbations to the magnetic field of the Earth, driven by solar activity. Previous storm forecasting efforts typically use solar wind measurements as input parameters to a regression problem tasked with predicting a perturbation index such as the 1-minute cadence symmetric-H (Sym-H) index. We re-visit the task of predicting Sym-H from solar wind parameters, with two ‘twists’: (i) Geomagnetic storm phase information is incorporated as model inputs and shown to increase prediction performance. (ii) We describe the pairwise network structure and training process – first validating ranking ability on synthetic data, before using the network to analyse the Sym-H problem.
{"title":"Pairwise networks for feature ranking of a geomagnetic storm model","authors":"J. Beukes, Marelie Hattingh Davel, S. Lotz","doi":"10.18489/sacj.v32i2.860","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.860","url":null,"abstract":"Feedforward neural networks provide the basis for complex regression models that produce accurate predictions in a variety of applications. However, they generally do not explicitly provide any information about the utility of each of the input parameters in terms of their contribution to model accuracy. With this in mind, we develop the pairwise network, an adaptation to the fully connected feedforward network that allows the ranking of input parameters according to their contribution to model output. The application is demonstrated in the context of a space physics problem. Geomagnetic storms are multi-day events characterised by significant perturbations to the magnetic field of the Earth, driven by solar activity. Previous storm forecasting efforts typically use solar wind measurements as input parameters to a regression problem tasked with predicting a perturbation index such as the 1-minute cadence symmetric-H (Sym-H) index. We re-visit the task of predicting Sym-H from solar wind parameters, with two ‘twists’: (i) Geomagnetic storm phase information is incorporated as model inputs and shown to increase prediction performance. (ii) We describe the pairwise network structure and training process – first validating ranking ability on synthetic data, before using the network to analyse the Sym-H problem.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44370390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard
The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.
{"title":"Benign interpolation of noise in deep learning","authors":"Marthinus W. Theunissen, Marelie Hattingh Davel, E. Barnard","doi":"10.18489/sacj.v32i2.833","DOIUrl":"https://doi.org/10.18489/sacj.v32i2.833","url":null,"abstract":"The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance tradeoff in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally, we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46650255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}