Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00122
Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros
Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.
{"title":"Kennard-Stone Balance Algorithm for Time-series Big Data Stream Mining","authors":"Tengyue Li, S. Fong, Yaoyang Wu, A. J. Tallón-Ballesteros","doi":"10.1109/ICDMW51313.2020.00122","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00122","url":null,"abstract":"Nowadays time series are generated relatively more easily and in larger quantity than ever, by the advances of IoT and sensor applications. Training a prediction model effectively using such big data streams poses certain challenges in machine learning. Data sampling has been an important technique in handling over-sized data in pre-processing which converts the huge data streams into a manageable and representative subset before loading them into a model induction process. In this paper a novel data conversion method, namely Kennard-Stone Balance (KSB) Algorithm is proposed. In the past decades, KS has been used by researchers for partitioning a bounded dataset into appropriate portions of training and testing data in cross-validation. In this new proposal, we extend KS into balancing the sub-sampled data in consideration of the class distribution by round-robin. It is also the first time KS is applied on time-series for the purpose of extracting a meaningful representation of big data streams, for improving the performance of a machine learning model. Preliminary simulation results show the advantages of KBS. Analysis, discussion and future works are reported in this short paper. It is anticipated that KBS brings a new alternative of data sampling to data stream mining with lots of potentials.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00106
Hyoungwoo Lee, J. Choo
Spatio-temporal forecasting is a research area applicable to many industrial fields, such as forecasting power consumption in real-life and predicting traffic conditions of roads. For example, in the traffic forecasting, it is important to analyze spatial relations and temporal trends in order to predict traffic changes in roads over time. In the spatio-temporal forecasting task, previous studies applied graph modeling to capture spatial relations. However, existing models use only the recently available data to predict traffic conditions, leading to the degraded performance of the model. Further research is necessary for predicting the speed in the far future. As a study to tackle this issue, we aim to improve the performance of the model by providing the model with additional data through time-series segmentation. In order to verify whether the additional data could be meaningful to the model, an experiment was conducted to compare the performance of the model trained with existing data and the model trained with our data and analyze the distribution of the additional data.
{"title":"Data analysis and processing for spatio-temporal forecasting","authors":"Hyoungwoo Lee, J. Choo","doi":"10.1109/ICDMW51313.2020.00106","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00106","url":null,"abstract":"Spatio-temporal forecasting is a research area applicable to many industrial fields, such as forecasting power consumption in real-life and predicting traffic conditions of roads. For example, in the traffic forecasting, it is important to analyze spatial relations and temporal trends in order to predict traffic changes in roads over time. In the spatio-temporal forecasting task, previous studies applied graph modeling to capture spatial relations. However, existing models use only the recently available data to predict traffic conditions, leading to the degraded performance of the model. Further research is necessary for predicting the speed in the far future. As a study to tackle this issue, we aim to improve the performance of the model by providing the model with additional data through time-series segmentation. In order to verify whether the additional data could be meaningful to the model, an experiment was conducted to compare the performance of the model trained with existing data and the model trained with our data and analyze the distribution of the additional data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114760428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00012
I. Chaturvedi, E. Ragusa, P. Gastaldo, E. Cambria
Thanks to recent advances in machine learning, some say AI is the new engine and data is the new coal. Mining this ‘coal’ from the ever-growing Social Web, however, can be a formidable task. In this work, we address this problem in the context of sentiment analysis using convolutional online adaptation learning (COAL). In particular, we consider semi-supervised learning of convolutional features, which we use to train an online model. Such a model, which can be trained in one domain but also used to predict sentiment in other domains, outperforms the baseline in the range of 5-20%.
{"title":"COAL: Convolutional Online Adaptation Learning for Opinion Mining","authors":"I. Chaturvedi, E. Ragusa, P. Gastaldo, E. Cambria","doi":"10.1109/ICDMW51313.2020.00012","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00012","url":null,"abstract":"Thanks to recent advances in machine learning, some say AI is the new engine and data is the new coal. Mining this ‘coal’ from the ever-growing Social Web, however, can be a formidable task. In this work, we address this problem in the context of sentiment analysis using convolutional online adaptation learning (COAL). In particular, we consider semi-supervised learning of convolutional features, which we use to train an online model. Such a model, which can be trained in one domain but also used to predict sentiment in other domains, outperforms the baseline in the range of 5-20%.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"40 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134221778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00090
Anindya Moitra, Nicholas O. Malott, P. Wilsey
This paper introduces a framework to compute persistent homology, a principal tool in Topological Data Analysis, on potentially unbounded and evolving data streams. The framework is organized into online and offline components. The online element maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The framework is applied to the detection of horizontal or reticulate genomic exchanges during the evolution of species that cannot be identified by phylogenetic inference or traditional data mining. The method effectively detects reticulate evolution that occurs through reassortment and recombination in large streams of genomic sequences of Influenza and HIV viruses.
{"title":"Persistent Homology on Streaming Data","authors":"Anindya Moitra, Nicholas O. Malott, P. Wilsey","doi":"10.1109/ICDMW51313.2020.00090","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00090","url":null,"abstract":"This paper introduces a framework to compute persistent homology, a principal tool in Topological Data Analysis, on potentially unbounded and evolving data streams. The framework is organized into online and offline components. The online element maintains a summary of the data that preserves the topological structure of the stream. The offline component computes the persistence intervals from the data captured by the summary. The framework is applied to the detection of horizontal or reticulate genomic exchanges during the evolution of species that cannot be identified by phylogenetic inference or traditional data mining. The method effectively detects reticulate evolution that occurs through reassortment and recombination in large streams of genomic sequences of Influenza and HIV viruses.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00118
Akira Kusaba, Kilho Shin, D. Shepard, T. Kuboyama
Machine learning has countless applications in time series analysis: controlling smart grids, detecting mechanical failures, and analyzing stock prices. Fourier mode decomposition (FMD) is the most common method of analysis because it decomposes time series into finite waveform components, or modes, but its principal shortcoming is that FMD assumes every mode has a constant amplitude, an assumption that rarely holds in real-world data. In contrast, Koopman mode decomposition (KMD) can detect modes with exponentially-increasing or - decreasing amplitudes, although it has mostly been applied to diagnosing data errors, not to prediction. What has kept KMD from being applied to prediction is partly a shortcoming in a mathematical formulation. This paper seeks to remedy that shortcoming: it provides a mathematically-precise formulation of KMD as a practical tool. This formulation, in turn, allows us to develop a novel practical method for prediction of future data. We further demonstrate our method's effectiveness using both synthetic data and real plasma flow data.
{"title":"Predictive Nonlinear Modeling by Koopman Mode Decomposition","authors":"Akira Kusaba, Kilho Shin, D. Shepard, T. Kuboyama","doi":"10.1109/ICDMW51313.2020.00118","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00118","url":null,"abstract":"Machine learning has countless applications in time series analysis: controlling smart grids, detecting mechanical failures, and analyzing stock prices. Fourier mode decomposition (FMD) is the most common method of analysis because it decomposes time series into finite waveform components, or modes, but its principal shortcoming is that FMD assumes every mode has a constant amplitude, an assumption that rarely holds in real-world data. In contrast, Koopman mode decomposition (KMD) can detect modes with exponentially-increasing or - decreasing amplitudes, although it has mostly been applied to diagnosing data errors, not to prediction. What has kept KMD from being applied to prediction is partly a shortcoming in a mathematical formulation. This paper seeks to remedy that shortcoming: it provides a mathematically-precise formulation of KMD as a practical tool. This formulation, in turn, allows us to develop a novel practical method for prediction of future data. We further demonstrate our method's effectiveness using both synthetic data and real plasma flow data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133973211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00038
Li Yang, E. Shijia, Shiyao Xu, Yang Xiang
Recent progress in personalized recommendation has shown great potential in exploiting structure information provided by a knowledge graph (KG). As a heterogeneous information network, KG contains rich semantic relatedness among entities, which contributes to addressing notorious issues such as data sparsity and cold start. State-of-the-art KG-based recommendation approaches try to propagate information along KG links to encode long-range connectivities into hidden representations. However, most of them only model the user or item representation independently, lacking a focus on user-item interaction. To this end, we propose the Interactive Knowledge Graph Attention Network (IKGAT), which directly models user-item interaction and high-order structure information within KG. For the user representation, following an interactive attention mechanism, we use the item to attend over the user's neighbors and then propagate their information to update the representation. Such a process is extended to multi-hops away to obtain richer neighborhood information. Similarly, the item representation is updated under the supervision of the user. With that design, IKGAT can capture collaborative signals and user preferences effectively. Experiment results on three public datasets show that IKGAT consistently outperforms the state-of-the-art approaches, especially when the dataset is sparse.
{"title":"Interactive Knowledge Graph Attention Network for Recommender Systems","authors":"Li Yang, E. Shijia, Shiyao Xu, Yang Xiang","doi":"10.1109/ICDMW51313.2020.00038","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00038","url":null,"abstract":"Recent progress in personalized recommendation has shown great potential in exploiting structure information provided by a knowledge graph (KG). As a heterogeneous information network, KG contains rich semantic relatedness among entities, which contributes to addressing notorious issues such as data sparsity and cold start. State-of-the-art KG-based recommendation approaches try to propagate information along KG links to encode long-range connectivities into hidden representations. However, most of them only model the user or item representation independently, lacking a focus on user-item interaction. To this end, we propose the Interactive Knowledge Graph Attention Network (IKGAT), which directly models user-item interaction and high-order structure information within KG. For the user representation, following an interactive attention mechanism, we use the item to attend over the user's neighbors and then propagate their information to update the representation. Such a process is extended to multi-hops away to obtain richer neighborhood information. Similarly, the item representation is updated under the supervision of the user. With that design, IKGAT can capture collaborative signals and user preferences effectively. Experiment results on three public datasets show that IKGAT consistently outperforms the state-of-the-art approaches, especially when the dataset is sparse.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133447280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00123
Zoltán Puha, M. Kaptein, A. Lemmens
Field experimentation has become a well-established practice to estimate individual treatment effects. In recent years, the Active Learning (AL) literature has developed methods to optimize the design of field experiments and reduce their cost. In this paper, we propose a novel AL algorithm for individual treatment effect estimation that works in batch mode for cases where the outcomes of an intervention are not immediate. It uniquely combines Expected Model Change Maximization and Bayesian Additive Regression Trees. Our approach (B-EMCMITE) uses the predictive uncertainty around the individual treatment effects to actively sample new units for experimentation and decide which treatment they will receive. We perform extensive simulations and test our approach on semi-synthetic, real-life data. B-EMCMITE outperforms alternative approaches and substantially reduces the number of observations needed to estimate individual treatment effects compared to A/B tests.
{"title":"Batch Mode Active Learning for Individual Treatment Effect Estimation","authors":"Zoltán Puha, M. Kaptein, A. Lemmens","doi":"10.1109/ICDMW51313.2020.00123","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00123","url":null,"abstract":"Field experimentation has become a well-established practice to estimate individual treatment effects. In recent years, the Active Learning (AL) literature has developed methods to optimize the design of field experiments and reduce their cost. In this paper, we propose a novel AL algorithm for individual treatment effect estimation that works in batch mode for cases where the outcomes of an intervention are not immediate. It uniquely combines Expected Model Change Maximization and Bayesian Additive Regression Trees. Our approach (B-EMCMITE) uses the predictive uncertainty around the individual treatment effects to actively sample new units for experimentation and decide which treatment they will receive. We perform extensive simulations and test our approach on semi-synthetic, real-life data. B-EMCMITE outperforms alternative approaches and substantially reduces the number of observations needed to estimate individual treatment effects compared to A/B tests.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122132433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00111
Sungwoo Park, Jihoon Moon, Eenjun Hwang
One key component in the heat-using facility of district heating systems is the differential pressure control valve. This valve ensures a stable flow of water to the heat exchanger and the temperature control valve. It also makes a stable pressure difference between the supply and return lines. Hence, its malfunctioning could cause significant heat losses and, consequently, economic losses. To avoid this, it is necessary to monitor the abnormal operation of the valve in real-time. Despite various machine learning-based anomaly detection models, their decision is limited in practical use unless the rationale for the decision is appropriately explained. In this paper, we propose a Shapley additive explanation-based explainable anomaly detection scheme that can present the degree of contribution of input variables to the derived result. We report some of the experimental results.
{"title":"Explainable Anomaly Detection for District Heating Based on Shapley Additive Explanations","authors":"Sungwoo Park, Jihoon Moon, Eenjun Hwang","doi":"10.1109/ICDMW51313.2020.00111","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00111","url":null,"abstract":"One key component in the heat-using facility of district heating systems is the differential pressure control valve. This valve ensures a stable flow of water to the heat exchanger and the temperature control valve. It also makes a stable pressure difference between the supply and return lines. Hence, its malfunctioning could cause significant heat losses and, consequently, economic losses. To avoid this, it is necessary to monitor the abnormal operation of the valve in real-time. Despite various machine learning-based anomaly detection models, their decision is limited in practical use unless the rationale for the decision is appropriately explained. In this paper, we propose a Shapley additive explanation-based explainable anomaly detection scheme that can present the degree of contribution of input variables to the derived result. We report some of the experimental results.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Financial analysts' earnings forecast is one of the most critical inputs for security valuation and investment decisions. However, it is challenging to utilize such information for two main reasons: missing values and heterogeneity among analysts. In this paper, we show that one recent breakthrough in nonlinear tensor completion algorithm, CoSTCo [1], overcomes the difficulty by imputing missing values and significantly improves the forecast accuracy in earnings. Compared with conventional imputation approaches, CoSTCo effectively captures latent information and reduces the tensor completion errors by 50%, even with 98% missing values. Furthermore, we show that using firm characteristics as auxiliary information we can improve firms' earnings prediction accuracy by 6%. Results are consistent using different performance metrics and across various industry sectors. Notably, the performance improvement is more salient for the sectors with high heterogeneity. Our findings imply the successful application of advanced ML techniques in a real financial problem.
{"title":"Nonlinear Tensor Completion Using Domain Knowledge: An Application in Analysts' Earnings Forecast","authors":"Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu","doi":"10.1109/ICDMW51313.2020.00059","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00059","url":null,"abstract":"Financial analysts' earnings forecast is one of the most critical inputs for security valuation and investment decisions. However, it is challenging to utilize such information for two main reasons: missing values and heterogeneity among analysts. In this paper, we show that one recent breakthrough in nonlinear tensor completion algorithm, CoSTCo [1], overcomes the difficulty by imputing missing values and significantly improves the forecast accuracy in earnings. Compared with conventional imputation approaches, CoSTCo effectively captures latent information and reduces the tensor completion errors by 50%, even with 98% missing values. Furthermore, we show that using firm characteristics as auxiliary information we can improve firms' earnings prediction accuracy by 6%. Results are consistent using different performance metrics and across various industry sectors. Notably, the performance improvement is more salient for the sectors with high heterogeneity. Our findings imply the successful application of advanced ML techniques in a real financial problem.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"16 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130164361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/ICDMW51313.2020.00011
Jonathan Kevin Chandra, E. Cambria
With the rapid adoption of the Internet, fast-moving social media platforms have been able to extract and encapsulate real-time public sentiments on different entities. Real-time sentiment analysis on current dynamic events such as elections, global affairs and sports are essential in the understanding the public's reaction to the states and trajectories of these events. In this paper, we aim to extract the sentiments of the Belt and Road Initiative from Twitter. Using aspect-based sentiment analysis, we were able to obtain the tweet's sentiment polarity on the related aspect category to better understand the topics that were discussed. We have developed an end-to-end sentiment analysis system that collects relevant data from Twitter, processes it and visualizes it on an intuitive display. We employed a hybrid approach of symbolic and sub-symbolic techniques using gated convolutional networks, aspect embeddings and the SenticNet framework to solve the subtasks of aspect category detection and aspect category polarity. A confidence score threshold was used to decide on the results provided by the models from the differing approaches.
{"title":"One Belt, One Road, One Sentiment? A Hybrid Approach to Gauging Public Opinions on the New Silk Road Initiative","authors":"Jonathan Kevin Chandra, E. Cambria","doi":"10.1109/ICDMW51313.2020.00011","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00011","url":null,"abstract":"With the rapid adoption of the Internet, fast-moving social media platforms have been able to extract and encapsulate real-time public sentiments on different entities. Real-time sentiment analysis on current dynamic events such as elections, global affairs and sports are essential in the understanding the public's reaction to the states and trajectories of these events. In this paper, we aim to extract the sentiments of the Belt and Road Initiative from Twitter. Using aspect-based sentiment analysis, we were able to obtain the tweet's sentiment polarity on the related aspect category to better understand the topics that were discussed. We have developed an end-to-end sentiment analysis system that collects relevant data from Twitter, processes it and visualizes it on an intuitive display. We employed a hybrid approach of symbolic and sub-symbolic techniques using gated convolutional networks, aspect embeddings and the SenticNet framework to solve the subtasks of aspect category detection and aspect category polarity. A confidence score threshold was used to decide on the results provided by the models from the differing approaches.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121144839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}