Pub Date : 2018-10-01DOI: 10.1109/BRACIS.2018.00050
C. Shulby, Leonardo Pombal, Vitor Jordão, Guilherme Ziolle, Bruno Martho, Antônio Postal, Thiago Prochnow
Violence is an epidemic in Brazil and a problem on the rise world-wide. Mobile devices provide communication technologies which can be used to monitor and alert about violent situations. However, current solutions, like panic buttons or safe words, might increase the loss of life in violent situations. We propose an embedded artificial intelligence solution, using natural language and speech processing technology, to silently alert someone who can help in this situation. The corpus used contains 400 positive phrases and 800 negative phrases, totaling 1,200 sentences which are classified using two well-known extraction methods for natural language processing tasks: bag-of-words and word embeddings and classified with a support vector machine. We describe the proof-of-concept product in development with promising results, indicating a path towards a commercial product. More importantly we show that model improvements via word embeddings and data augmentation techniques provide an intrinsically robust model. The final embedded solution also has a small footprint of less than 10 MB.
{"title":"Proactive Security: Embedded AI Solution for Violent and Abusive Speech Recognition","authors":"C. Shulby, Leonardo Pombal, Vitor Jordão, Guilherme Ziolle, Bruno Martho, Antônio Postal, Thiago Prochnow","doi":"10.1109/BRACIS.2018.00050","DOIUrl":"https://doi.org/10.1109/BRACIS.2018.00050","url":null,"abstract":"Violence is an epidemic in Brazil and a problem on the rise world-wide. Mobile devices provide communication technologies which can be used to monitor and alert about violent situations. However, current solutions, like panic buttons or safe words, might increase the loss of life in violent situations. We propose an embedded artificial intelligence solution, using natural language and speech processing technology, to silently alert someone who can help in this situation. The corpus used contains 400 positive phrases and 800 negative phrases, totaling 1,200 sentences which are classified using two well-known extraction methods for natural language processing tasks: bag-of-words and word embeddings and classified with a support vector machine. We describe the proof-of-concept product in development with promising results, indicating a path towards a commercial product. More importantly we show that model improvements via word embeddings and data augmentation techniques provide an intrinsically robust model. The final embedded solution also has a small footprint of less than 10 MB.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125013593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/bracis.2018.00020
Felipe Penhorate Carvalho da Fonseca, Luciano Antonio Digiampietri
Nowadays, there is a wide range of academic data available on the web. This information allows solving tasks such as the discovery of specialists in a given area, identification of potential scholarship holders, suggestion of collaborators, among others. However, the success of these tasks depends on the quality of the data used, since incorrect or incomplete data tend to impair the performance of the applied algorithms. The present work utilized machine learning techniques to help to infer the researchers' areas based on the data registered in the Lattes Platform, using the subareas as a case study. The subareas present a variant of the original problem with more challenges, as the number of classes is bigger. The goal of this paper is to analyze the contribution of factors such as social network metrics, the language of the titles and the hierarchical structure of the areas in the performance of the algorithms, and propose a new approach combining different characteristics. The proposed approach can be applied to different academic data, but the data from the Lattes Platform was used for the tests and validations of the proposed solution. As a result, we identified that the social network metrics and the numerical representations of the data improved inference accuracy when compared to state-of-the-art techniques, and the use of the hierarchical structure information achieved even better results.
{"title":"Inference of Researchers' Area of Expertise","authors":"Felipe Penhorate Carvalho da Fonseca, Luciano Antonio Digiampietri","doi":"10.1109/bracis.2018.00020","DOIUrl":"https://doi.org/10.1109/bracis.2018.00020","url":null,"abstract":"Nowadays, there is a wide range of academic data available on the web. This information allows solving tasks such as the discovery of specialists in a given area, identification of potential scholarship holders, suggestion of collaborators, among others. However, the success of these tasks depends on the quality of the data used, since incorrect or incomplete data tend to impair the performance of the applied algorithms. The present work utilized machine learning techniques to help to infer the researchers' areas based on the data registered in the Lattes Platform, using the subareas as a case study. The subareas present a variant of the original problem with more challenges, as the number of classes is bigger. The goal of this paper is to analyze the contribution of factors such as social network metrics, the language of the titles and the hierarchical structure of the areas in the performance of the algorithms, and propose a new approach combining different characteristics. The proposed approach can be applied to different academic data, but the data from the Lattes Platform was used for the tests and validations of the proposed solution. As a result, we identified that the social network metrics and the numerical representations of the data improved inference accuracy when compared to state-of-the-art techniques, and the use of the hierarchical structure information achieved even better results.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127045875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/BRACIS.2018.00092
R. Souza, G. P. Coelho, A. A. S. Santos, D. Schiozer
Optimizing production strategies for oil extraction is not a simple task, mainly due to the large number of variables and uncertainties associated with the problem. Metaheuristics are well-known tools that can be easily applied to this type of problem. However, the large amount of objective function evaluations that such tools require to obtain a good solution is a serious drawback in the context of oil production strategy definition (PSD): the evaluation of a production strategy requires the use of oil field simulation software and each simulation can take hours to complete. Thus, in this work a modified version of a steady-state genetic algorithm is proposed, together with specific recombination, mutation and local search operators specifically tailored for the PSD problem, which aim to reduce the computational cost of the optimization process. The developed algorithm was used to optimize the well positions in a production strategy for a synthetic oil reservoir model and the results were compared with those obtained by a classical genetic algorithm and by a commercial optimization tool.
{"title":"Search Operators for Genetic Algorithms Applied to Well Positioning in Oil Fields","authors":"R. Souza, G. P. Coelho, A. A. S. Santos, D. Schiozer","doi":"10.1109/BRACIS.2018.00092","DOIUrl":"https://doi.org/10.1109/BRACIS.2018.00092","url":null,"abstract":"Optimizing production strategies for oil extraction is not a simple task, mainly due to the large number of variables and uncertainties associated with the problem. Metaheuristics are well-known tools that can be easily applied to this type of problem. However, the large amount of objective function evaluations that such tools require to obtain a good solution is a serious drawback in the context of oil production strategy definition (PSD): the evaluation of a production strategy requires the use of oil field simulation software and each simulation can take hours to complete. Thus, in this work a modified version of a steady-state genetic algorithm is proposed, together with specific recombination, mutation and local search operators specifically tailored for the PSD problem, which aim to reduce the computational cost of the optimization process. The developed algorithm was used to optimize the well positions in a production strategy for a synthetic oil reservoir model and the results were compared with those obtained by a classical genetic algorithm and by a commercial optimization tool.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115437884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/BRACIS.2018.00057
M. S. Andrade, F. Cordeiro, V. Macário, Fabiana F. Lima, Suy F. Hwang, Julyanne C. G. Mendonca
Chromosome analysis is an important task to detect genetic diseases. However, the process of identifying chromosomes can be very time-consuming. Therefore, the use of an automatic process to detect chromosomes is an important step to aid the diagnosis. The proposed work develop a new approach to automatize the segmentation of chromosomes, using adaptive thresholding combined with fuzzy logic. The proposed method is evaluated using the database from CRCN-NE, which has 35 images. Results showed that the proposed approach compared with state of the art techniques obtained better segmentation results, with sensitivity and specificity values of 91% and 92%, respectively.
{"title":"A Fuzzy-Adaptive Approach to Segment Metaphase Chromosome Images","authors":"M. S. Andrade, F. Cordeiro, V. Macário, Fabiana F. Lima, Suy F. Hwang, Julyanne C. G. Mendonca","doi":"10.1109/BRACIS.2018.00057","DOIUrl":"https://doi.org/10.1109/BRACIS.2018.00057","url":null,"abstract":"Chromosome analysis is an important task to detect genetic diseases. However, the process of identifying chromosomes can be very time-consuming. Therefore, the use of an automatic process to detect chromosomes is an important step to aid the diagnosis. The proposed work develop a new approach to automatize the segmentation of chromosomes, using adaptive thresholding combined with fuzzy logic. The proposed method is evaluated using the database from CRCN-NE, which has 35 images. Results showed that the proposed approach compared with state of the art techniques obtained better segmentation results, with sensitivity and specificity values of 91% and 92%, respectively.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122440139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/BRACIS.2018.00051
Cloves Lima, Ivan Santos, F. Barros, A. Mota
Software products must show high-quality levels to succeed in a competitive market. Usually, products reliability is assured by testing activities. However, SW testing is sometimes neglected by Companies due to its high costs - particularly when manually executed. In this light, this work investigates intelligent methods for SW testing automation, focusing on the software products review process. We propose a new process for test plan creation based on the inspection of SW documents (in particular, Release Notes) using text mining techniques. The implemented prototype, the SWAT Plan tool (SPt), automatically extracts from Release Notes relevant areas of the SW to be examined by exploratory tests teams. SPt was tested using real-world data from Motorola Mobility, our partner Company. The experiments compared the current manual process with the automated process using SPt, accessing time spent and relevant areas identified in both methods. The obtained results were very encouraging.
{"title":"SPt: A Text Mining Process to Extract Relevant Areas from SW Documents to Exploratory Tests","authors":"Cloves Lima, Ivan Santos, F. Barros, A. Mota","doi":"10.1109/BRACIS.2018.00051","DOIUrl":"https://doi.org/10.1109/BRACIS.2018.00051","url":null,"abstract":"Software products must show high-quality levels to succeed in a competitive market. Usually, products reliability is assured by testing activities. However, SW testing is sometimes neglected by Companies due to its high costs - particularly when manually executed. In this light, this work investigates intelligent methods for SW testing automation, focusing on the software products review process. We propose a new process for test plan creation based on the inspection of SW documents (in particular, Release Notes) using text mining techniques. The implemented prototype, the SWAT Plan tool (SPt), automatically extracts from Release Notes relevant areas of the SW to be examined by exploratory tests teams. SPt was tested using real-world data from Motorola Mobility, our partner Company. The experiments compared the current manual process with the automated process using SPt, accessing time spent and relevant areas identified in both methods. The obtained results were very encouraging.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114377796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/bracis.2018.00052
Tatiane Nogueira Rios, Braian Varjão Gama Bispo
Feature selection is widely used to overcome the problems caused by the curse of dimensionality, since it reduces data dimensionality by removing irrelevant and redundant features from a dataset. Moreover, it is an important pre-processing step usually mandatory in text mining tasks using Machine Learning techniques. In this paper, we propose a new feature selection method for text classification, named Statera, that selects a subset of features that guarantees the representativeness of all classes from a domain in a balanced way, and calculates such degree of representativeness based on information retrieval measures. We demonstrate the effectiveness of our method conducting experiments on nine real document collections. The result shows that the proposed approach can outperform state-of-art feature selection methods, achieving good classification results even with a very small number of features.
{"title":"Statera: A Balanced Feature Selection Method for Text Classification","authors":"Tatiane Nogueira Rios, Braian Varjão Gama Bispo","doi":"10.1109/bracis.2018.00052","DOIUrl":"https://doi.org/10.1109/bracis.2018.00052","url":null,"abstract":"Feature selection is widely used to overcome the problems caused by the curse of dimensionality, since it reduces data dimensionality by removing irrelevant and redundant features from a dataset. Moreover, it is an important pre-processing step usually mandatory in text mining tasks using Machine Learning techniques. In this paper, we propose a new feature selection method for text classification, named Statera, that selects a subset of features that guarantees the representativeness of all classes from a domain in a balanced way, and calculates such degree of representativeness based on information retrieval measures. We demonstrate the effectiveness of our method conducting experiments on nine real document collections. The result shows that the proposed approach can outperform state-of-art feature selection methods, achieving good classification results even with a very small number of features.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122092844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/BRACIS.2018.00034
L. Pacífico, Teresa B Ludermir, João F. L. Oliveira
Extreme Learning Machine (ELM) has been introduced as an algorithm for the training of Single-Hidden Layer Feedforward Neural Networks, capable of obtaining faster performances than traditional gradient-descendent approaches, such as Back-Propagation algorithm. Although effective, ELM suffers from some drawbacks, since the adopted strategy of random determination of the input weights and hidden biases may lead to non-optimal performances. Many Evolutionary Algorithms (EAs) have been employed to select input weights and hidden biases for ELM, generating Evolutionary Extreme Learning Machine (EELM) models. In this work, we evaluate the influence of three different treatments to handle the population out-bounded individuals problem in EAs by comparing three different Evolutionary Extreme Learning Machine approaches. The experimental evaluation is based on a rank system obtained by using Friedman hypothesis tests in relation to the experiments performed on ten benchmark data sets. The experimental results pointed out that some treatments to handle the out-bounded individuals are more adequate than others for the selected problems, and also, some EELMs are more sensible to the way that out-bounded individuals are treated than others.
{"title":"Evolutionary ELMs with Alternative Treatments for the Population Out-Bounded Individuals","authors":"L. Pacífico, Teresa B Ludermir, João F. L. Oliveira","doi":"10.1109/BRACIS.2018.00034","DOIUrl":"https://doi.org/10.1109/BRACIS.2018.00034","url":null,"abstract":"Extreme Learning Machine (ELM) has been introduced as an algorithm for the training of Single-Hidden Layer Feedforward Neural Networks, capable of obtaining faster performances than traditional gradient-descendent approaches, such as Back-Propagation algorithm. Although effective, ELM suffers from some drawbacks, since the adopted strategy of random determination of the input weights and hidden biases may lead to non-optimal performances. Many Evolutionary Algorithms (EAs) have been employed to select input weights and hidden biases for ELM, generating Evolutionary Extreme Learning Machine (EELM) models. In this work, we evaluate the influence of three different treatments to handle the population out-bounded individuals problem in EAs by comparing three different Evolutionary Extreme Learning Machine approaches. The experimental evaluation is based on a rank system obtained by using Friedman hypothesis tests in relation to the experiments performed on ten benchmark data sets. The experimental results pointed out that some treatments to handle the out-bounded individuals are more adequate than others for the selected problems, and also, some EELMs are more sensible to the way that out-bounded individuals are treated than others.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124338217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/bracis.2018.00011
Gabriel Santos Barbosa, Leonardo da Silva Costa, Ajalmar Rêgo da Rocha Neto
Optimum-Path Forest (OPF) is a graph-based supervised classifier that has achieved remarkable performances in many applications. OPF has many advantages when compared to other supervised classifiers, since it is free of parameters, achieves zero classification errors on the training set without overfitting, handles multiple classes without modifications or extensions, and does not make assumptions about the shape and separability of the classes. However, one drawback of the OPF classifier is the fact that its classification computational cost grows proportionally to the size of the training set. To overcome this issue, we propose a novel method based on genetic algorithms (GAs) to prune irrelevant training samples and still preserve or even improve accuracy in OPF classification. We validate the method using public datasets obtained from UCI repository.
{"title":"A New Genetic Algorithm-Based Pruning Approach for Optimum-Path Forest","authors":"Gabriel Santos Barbosa, Leonardo da Silva Costa, Ajalmar Rêgo da Rocha Neto","doi":"10.1109/bracis.2018.00011","DOIUrl":"https://doi.org/10.1109/bracis.2018.00011","url":null,"abstract":"Optimum-Path Forest (OPF) is a graph-based supervised classifier that has achieved remarkable performances in many applications. OPF has many advantages when compared to other supervised classifiers, since it is free of parameters, achieves zero classification errors on the training set without overfitting, handles multiple classes without modifications or extensions, and does not make assumptions about the shape and separability of the classes. However, one drawback of the OPF classifier is the fact that its classification computational cost grows proportionally to the size of the training set. To overcome this issue, we propose a novel method based on genetic algorithms (GAs) to prune irrelevant training samples and still preserve or even improve accuracy in OPF classification. We validate the method using public datasets obtained from UCI repository.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114926974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/bracis.2018.00084
Ricardo Maroquio Bernardo, L. C. Batista da Silva, P. F. Ferreira Rosa
This paper presents digital stabilization solution for videos captured from remotely piloted aircraft systems (RPAS) in order to enable persistent surveillance tasks based on stationary aerial images, in which situation the image quality has a direct impact on the accuracy of the algorithms for detecting independently moving objects (IMOs). The proposed method uses keypoint detection from a reference frame and tracks the displacement of these keypoints in the following frames, in order to compute the geometric transformation required to promote the alignment between the frames. Experiments were conducted in simulated 3D scenes and in real scenes, comparing different algorithms available in the literature. Using an innovative method for keypoint selection improvement, the results show that the solution is feasible even when executed in a single board computer (SBC) as the Raspberry Pi 3 Model B, providing adequate output even for real time surveillance applications.
{"title":"Onboard Video Stabilization for Low Cost Small RPAS Surveillance Applications","authors":"Ricardo Maroquio Bernardo, L. C. Batista da Silva, P. F. Ferreira Rosa","doi":"10.1109/bracis.2018.00084","DOIUrl":"https://doi.org/10.1109/bracis.2018.00084","url":null,"abstract":"This paper presents digital stabilization solution for videos captured from remotely piloted aircraft systems (RPAS) in order to enable persistent surveillance tasks based on stationary aerial images, in which situation the image quality has a direct impact on the accuracy of the algorithms for detecting independently moving objects (IMOs). The proposed method uses keypoint detection from a reference frame and tracks the displacement of these keypoints in the following frames, in order to compute the geometric transformation required to promote the alignment between the frames. Experiments were conducted in simulated 3D scenes and in real scenes, comparing different algorithms available in the literature. Using an innovative method for keypoint selection improvement, the results show that the solution is feasible even when executed in a single board computer (SBC) as the Raspberry Pi 3 Model B, providing adequate output even for real time surveillance applications.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"15 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123660463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/BRACIS.2018.00054
Fernando Tadao Ito, Helena de Medeiros Caseli, J. Moreira
With the exponential growth of multimedia datasets comes the need to combine multiple data representations to create "conceptual" vector spaces in order to use all available sources of information. Following previous experiments [1], in this paper we explore how two different languages can be combined to better represent information. Methods to create textual representations, such as Word2Vec and GloVe, are already well-established in academia and, usually, a single representation method is used in Machine Learning tasks. In this paper, we investigate the effects of different combinations of textual representations to perform classification tasks on a multilingual dataset composed of international news in Portuguese and English. This paper aims to analyze the differences between combinations, and how these representations perform in a small dataset with multiple data inputs.
{"title":"The Effects of Underlying Mono and Multilingual Representations for Text Classification","authors":"Fernando Tadao Ito, Helena de Medeiros Caseli, J. Moreira","doi":"10.1109/BRACIS.2018.00054","DOIUrl":"https://doi.org/10.1109/BRACIS.2018.00054","url":null,"abstract":"With the exponential growth of multimedia datasets comes the need to combine multiple data representations to create \"conceptual\" vector spaces in order to use all available sources of information. Following previous experiments [1], in this paper we explore how two different languages can be combined to better represent information. Methods to create textual representations, such as Word2Vec and GloVe, are already well-established in academia and, usually, a single representation method is used in Machine Learning tasks. In this paper, we investigate the effects of different combinations of textual representations to perform classification tasks on a multilingual dataset composed of international news in Portuguese and English. This paper aims to analyze the differences between combinations, and how these representations perform in a small dataset with multiple data inputs.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123226880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}