This paper describes an innovative Open Information Extraction method known as ATP-OIE1. It utilizes extraction patterns to find semantic relations. These patterns are generated automatically from examples, so it has greater autonomy than methods based on fixed rules. ATP-OIE can also summon other methods, ReVerb and ClausIE, if it is unable to find valid semantic relations in a sentence, thus improving its recall. In these cases, it is capable of generating new extraction patterns online, which improves its autonomy. It also implements different mechanisms to prevent common errors in the extraction of semantic relations. Lastly, ATP-OIE was compared with other state-of-the-art methods in a well known texts database: Reuters-21578, obtaining a higher precision than with other methods.
{"title":"ATP-OIE: An Autonomous Open Information Extraction Method","authors":"J. M. Rodríguez, H. Merlino, Patricia Pesado","doi":"10.1145/3388142.3388166","DOIUrl":"https://doi.org/10.1145/3388142.3388166","url":null,"abstract":"This paper describes an innovative Open Information Extraction method known as ATP-OIE1. It utilizes extraction patterns to find semantic relations. These patterns are generated automatically from examples, so it has greater autonomy than methods based on fixed rules. ATP-OIE can also summon other methods, ReVerb and ClausIE, if it is unable to find valid semantic relations in a sentence, thus improving its recall. In these cases, it is capable of generating new extraction patterns online, which improves its autonomy. It also implements different mechanisms to prevent common errors in the extraction of semantic relations. Lastly, ATP-OIE was compared with other state-of-the-art methods in a well known texts database: Reuters-21578, obtaining a higher precision than with other methods.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117258837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biological images are used in many applications, most of which are important in medical field. For example, MRI scans and CT scans result in high resolution images that are critical for diagnosis of cancers and other malfunction of organs. Nowadays, high resolution ultrasound images can provide details to examine blood vessel blockage. Another type of biological images are those of mixed patterns of proteins in microscope human protein atlas images.Due to the enormous amount of image data available even in a single medical organization, Machine Learning and Deep Learning technology have been used to assist in the image data analysis.Spark is a computing framework that has been proved to speed up data analysis dramatically. However, Spark Scala doesn't fully support Deep learning algorithms. In this paper, we present a case study of adapting the Random Forest (RF) and Convolutional Neural Network (CNN) to the Spark Scala framework. These algorithms were applied to multi-classes multilabel classification on a biological dataset from Kagglers. The experimental results show that both RF and CNN can be implemented with Spark Scala and achieve extremely high throughput performance.
{"title":"Adaptation of RF and CNN on Spark","authors":"Y. Kou, Zhi Hong, Yun Tian, S. Wang","doi":"10.1145/3388142.3388157","DOIUrl":"https://doi.org/10.1145/3388142.3388157","url":null,"abstract":"Biological images are used in many applications, most of which are important in medical field. For example, MRI scans and CT scans result in high resolution images that are critical for diagnosis of cancers and other malfunction of organs. Nowadays, high resolution ultrasound images can provide details to examine blood vessel blockage. Another type of biological images are those of mixed patterns of proteins in microscope human protein atlas images.Due to the enormous amount of image data available even in a single medical organization, Machine Learning and Deep Learning technology have been used to assist in the image data analysis.Spark is a computing framework that has been proved to speed up data analysis dramatically. However, Spark Scala doesn't fully support Deep learning algorithms. In this paper, we present a case study of adapting the Random Forest (RF) and Convolutional Neural Network (CNN) to the Spark Scala framework. These algorithms were applied to multi-classes multilabel classification on a biological dataset from Kagglers. The experimental results show that both RF and CNN can be implemented with Spark Scala and achieve extremely high throughput performance.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123421911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the offshore oil and gas industry, petroleum in each well of a remote wellhead platform (WHP) is extracted naturally from the ground to the sales delivery point. However, when the oil pressure drops or the well is nearly depleted, the flow rate up to the WHP declines. Installing a Wellhead Compressor (WC) on the WHP is the solution [9]. The WC acts locally on the selected wells and reduces back pressure, thereby substantially enhancing the efficiency of oil and gas recovery [21]. The WC sensors transmit data back to the historian time series database, and intelligent alarm systems are utilized as a critical tool to minimize unscheduled downtime which adversely affects production reliability, as well as monitoring time and cost burden of operating engineers. In this paper, an Attention-Based Bidirectional Long Short-Term Memory (ABD-LSTM) model is presented for WC failure prediction. We also propose feature extraction and data reduction techniques as complementary methods to improve the effectiveness of the training process in a large-scale dataset. We evaluate our model performance based on real WC sensor data. Compared to other Machine Learning (ML) algorithms, our proposed methodology is more powerful and accurate. Our proposed ABD-LSTM achieved an optimal F1 score of 85.28%.
{"title":"Wellhead Compressor Failure Prediction Using Attention-based Bidirectional LSTMs with Data Reduction Techniques","authors":"Wirasak Chomphu, B. Kijsirikul","doi":"10.1145/3388142.3388154","DOIUrl":"https://doi.org/10.1145/3388142.3388154","url":null,"abstract":"In the offshore oil and gas industry, petroleum in each well of a remote wellhead platform (WHP) is extracted naturally from the ground to the sales delivery point. However, when the oil pressure drops or the well is nearly depleted, the flow rate up to the WHP declines. Installing a Wellhead Compressor (WC) on the WHP is the solution [9]. The WC acts locally on the selected wells and reduces back pressure, thereby substantially enhancing the efficiency of oil and gas recovery [21]. The WC sensors transmit data back to the historian time series database, and intelligent alarm systems are utilized as a critical tool to minimize unscheduled downtime which adversely affects production reliability, as well as monitoring time and cost burden of operating engineers. In this paper, an Attention-Based Bidirectional Long Short-Term Memory (ABD-LSTM) model is presented for WC failure prediction. We also propose feature extraction and data reduction techniques as complementary methods to improve the effectiveness of the training process in a large-scale dataset. We evaluate our model performance based on real WC sensor data. Compared to other Machine Learning (ML) algorithms, our proposed methodology is more powerful and accurate. Our proposed ABD-LSTM achieved an optimal F1 score of 85.28%.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122540344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Words selection, writing style, stories cherry-picking, and many other factors play a role in framing news articles to fit the targeted audience or to align with the authors' beliefs. Hence, reporting facts alone is not evidence of bias-free journalism. Since the 2016 United States presidential elections, researchers focused on the media influence on the results of the elections. The news media attention has deviated from political parties to candidates. The news media shapes public perception of political candidates through news personalization. Despite its criticality, we are not aware of any studies which have examined news personalization from the machine learning or deep neural network perspective. In addition, some candidates accuse the media of favoritism which jeopardizes their chances of winning elections. Multiple methods were introduced to place news sources on one side of the political spectrum or the other, yet the mainstream media claims to be unbiased. Therefore, to avoid inaccurate assumptions, only news sources that have stated clearly their political affiliation are included in this research. In this paper, we constructed two datasets out of news articles written about the last two U.S. presidents with respect to news websites' political affiliation. Multiple intelligent models were developed to automatically predict the political affiliation of the personalized unseen article. The main objective of these models is to detect the political ideology of personalized news articles. Although the newly constructed datasets are highly imbalanced, the performance of the intelligent models is reasonably good. The results of the intelligent models are reported with a comparative analysis.
{"title":"Ideology Detection of Personalized Political News Coverage: A New Dataset","authors":"Khudran Alzhrani","doi":"10.1145/3388142.3388149","DOIUrl":"https://doi.org/10.1145/3388142.3388149","url":null,"abstract":"Words selection, writing style, stories cherry-picking, and many other factors play a role in framing news articles to fit the targeted audience or to align with the authors' beliefs. Hence, reporting facts alone is not evidence of bias-free journalism. Since the 2016 United States presidential elections, researchers focused on the media influence on the results of the elections. The news media attention has deviated from political parties to candidates. The news media shapes public perception of political candidates through news personalization. Despite its criticality, we are not aware of any studies which have examined news personalization from the machine learning or deep neural network perspective. In addition, some candidates accuse the media of favoritism which jeopardizes their chances of winning elections. Multiple methods were introduced to place news sources on one side of the political spectrum or the other, yet the mainstream media claims to be unbiased. Therefore, to avoid inaccurate assumptions, only news sources that have stated clearly their political affiliation are included in this research. In this paper, we constructed two datasets out of news articles written about the last two U.S. presidents with respect to news websites' political affiliation. Multiple intelligent models were developed to automatically predict the political affiliation of the personalized unseen article. The main objective of these models is to detect the political ideology of personalized news articles. Although the newly constructed datasets are highly imbalanced, the performance of the intelligent models is reasonably good. The results of the intelligent models are reported with a comparative analysis.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130914446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In workplaces, there is a massive amount of unstructured data from different sources. In this paper, we present a case study that explains how can through communications between employees, we can help to prioritize tasks requests to increase the efficiency of their works for both technical and non-technical workers. This involves managing daily incoming tasks based on their level of urgency and importance.To allow all workers to utilize the urgency-importance matrix as a time-management tool, we need to automate this tool. The textual content of incoming tasks are analyzed, and metrics related to urgency and importance are extracted. A third factor (i.e., the response variable) is defined based on the two input variables (urgency and importance). Then, machine learning applied to the data to predict the class of incoming tasks based on data outcome desired. We used ordinal regression, neural networks, and decision tree algorithms to predict the four levels of task priority. We measure the performance of all using recalls, precisions, and F-scores. All classifiers perform higher than 89% in terms of all measures.
{"title":"Text mining for incoming tasks based on the urgency/importance factors and task classification using machine learning tools","authors":"Y. Alshehri","doi":"10.1145/3388142.3388153","DOIUrl":"https://doi.org/10.1145/3388142.3388153","url":null,"abstract":"In workplaces, there is a massive amount of unstructured data from different sources. In this paper, we present a case study that explains how can through communications between employees, we can help to prioritize tasks requests to increase the efficiency of their works for both technical and non-technical workers. This involves managing daily incoming tasks based on their level of urgency and importance.To allow all workers to utilize the urgency-importance matrix as a time-management tool, we need to automate this tool. The textual content of incoming tasks are analyzed, and metrics related to urgency and importance are extracted. A third factor (i.e., the response variable) is defined based on the two input variables (urgency and importance). Then, machine learning applied to the data to predict the class of incoming tasks based on data outcome desired. We used ordinal regression, neural networks, and decision tree algorithms to predict the four levels of task priority. We measure the performance of all using recalls, precisions, and F-scores. All classifiers perform higher than 89% in terms of all measures.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130647346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyberbullying is becoming a huge problem on social media platforms. New statistics shows that more than a fourth of Norwegiankids report that they have been cyberbullied once or more duringthe last year. In the most recent years, it has become popularto utilize Neural Networks in order to automate the detection ofcyberbullying. These Neural Networks are often based on using Long-Short-Term-Memory layers solely or in combination withother types of layers. In this thesis we present a new Neural Networkdesign that can be used to detect traces of cyberbullying intextual media. The design is based on existing designs that combinesthe power of Convolutional layers with Long-Short-Term-Memorylayers. In addition, our design features the usage of stacked corelayers, which our research shows to increases the performance ofthe Neural Network. The design also features a new kind of activationmechanism, which is referred to as "Support-Vector-Machinelike activation". The "SupportVector-Machine like activation" isachieved by applying L2 weight regularization and utilizing a linearactivation function in the activation layer together with using aHinge loss function. Our experiments show that both the stackingof the layers and the "Support-Vector-Machine like activation"increasesthe performance of the Neural Network over traditionalState-Of-The-Art designs.
{"title":"Automated Cyberbullying Detection in Social Media Using an SVM Activated Stacked Convolution LSTM Network","authors":"Thor Aleksander Buan, Raghavendra Ramachandra","doi":"10.1145/3388142.3388147","DOIUrl":"https://doi.org/10.1145/3388142.3388147","url":null,"abstract":"Cyberbullying is becoming a huge problem on social media platforms. New statistics shows that more than a fourth of Norwegiankids report that they have been cyberbullied once or more duringthe last year. In the most recent years, it has become popularto utilize Neural Networks in order to automate the detection ofcyberbullying. These Neural Networks are often based on using Long-Short-Term-Memory layers solely or in combination withother types of layers. In this thesis we present a new Neural Networkdesign that can be used to detect traces of cyberbullying intextual media. The design is based on existing designs that combinesthe power of Convolutional layers with Long-Short-Term-Memorylayers. In addition, our design features the usage of stacked corelayers, which our research shows to increases the performance ofthe Neural Network. The design also features a new kind of activationmechanism, which is referred to as \"Support-Vector-Machinelike activation\". The \"SupportVector-Machine like activation\" isachieved by applying L2 weight regularization and utilizing a linearactivation function in the activation layer together with using aHinge loss function. Our experiments show that both the stackingof the layers and the \"Support-Vector-Machine like activation\"increasesthe performance of the Neural Network over traditionalState-Of-The-Art designs.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134533783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The solvency of captive insurance is the key financial metric captive managers care about. We built a solvency prediction model for a captive insurance fund using Monte Carlo simulation with the fund's historical losses, current financial data and setups. This model can predict the solvency score of the current captive fund using the fund survival probability as a measurement of solvency. If the simulated future solvency ratios break the upper and lower bounds, we count it as an insolvent case; otherwise, it is counted a solvent (or survival) case. After large scale simulation, we can approximate the future survival probability, i.e. the solvency score, of the current captive fund. The predicted income statements, the balance sheets and financial ratios, will also be generated. We use a heat-map to visualize the solvency score at each retention level so that it can provide support to captive insurance managers to make their decisions. This model is implemented in Excel VBA macro and MATLAB.
{"title":"Using Monte Carlo Simulation to Predict Captive Insurance Solvency","authors":"Lu Xiong, Don Hong","doi":"10.1145/3388142.3388171","DOIUrl":"https://doi.org/10.1145/3388142.3388171","url":null,"abstract":"The solvency of captive insurance is the key financial metric captive managers care about. We built a solvency prediction model for a captive insurance fund using Monte Carlo simulation with the fund's historical losses, current financial data and setups. This model can predict the solvency score of the current captive fund using the fund survival probability as a measurement of solvency. If the simulated future solvency ratios break the upper and lower bounds, we count it as an insolvent case; otherwise, it is counted a solvent (or survival) case. After large scale simulation, we can approximate the future survival probability, i.e. the solvency score, of the current captive fund. The predicted income statements, the balance sheets and financial ratios, will also be generated. We use a heat-map to visualize the solvency score at each retention level so that it can provide support to captive insurance managers to make their decisions. This model is implemented in Excel VBA macro and MATLAB.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131353181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For the conservation of energy in buildings, it is essential to understand the energy consumption pattern and make efforts based on the analyzed result for energy load reduction. In this research, we proposed a method for forecasting the electricity load of university buildings using a hybrid model of clustering technique and neural network using weather conditions. The novel approach discussed in this paper includes clustering one whole year data including the forecasting day using K-means clustering and using the result as an input parameter in a neural network for forecasting the electricity peak load of university buildings. The hybrid model has proved to increase the performance of forecasting rather than neural network alone. We also developed a graphical visualization platform for the analyzed result using an interactive web application called Shiny. Using Shiny application and forecasting electricity peak load with appreciable accuracy several hours before peak hours can aware the management authorities about the energy situation and provides sufficient time for making a strategy for peak load reduction. This method can also be implemented in the demand response for reducing the electricity bills by avoiding electricity usage during the high electricity rate hours.
{"title":"A Hybrid Model of Clustering and Neural Network Using Weather Conditions for Energy Management in Buildings","authors":"Bishnu Nepal, M. Yamaha","doi":"10.1145/3388142.3388172","DOIUrl":"https://doi.org/10.1145/3388142.3388172","url":null,"abstract":"For the conservation of energy in buildings, it is essential to understand the energy consumption pattern and make efforts based on the analyzed result for energy load reduction. In this research, we proposed a method for forecasting the electricity load of university buildings using a hybrid model of clustering technique and neural network using weather conditions. The novel approach discussed in this paper includes clustering one whole year data including the forecasting day using K-means clustering and using the result as an input parameter in a neural network for forecasting the electricity peak load of university buildings. The hybrid model has proved to increase the performance of forecasting rather than neural network alone. We also developed a graphical visualization platform for the analyzed result using an interactive web application called Shiny. Using Shiny application and forecasting electricity peak load with appreciable accuracy several hours before peak hours can aware the management authorities about the energy situation and provides sufficient time for making a strategy for peak load reduction. This method can also be implemented in the demand response for reducing the electricity bills by avoiding electricity usage during the high electricity rate hours.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115368722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of science and technology, L3 intelligent vehicles are gradually entering the mass production phase. Traditional testing tools and methods can hardly meet the requirements for multiple dimensions, high standard and big data of self-driving vehicles. The scenario-based simulation test method has great technical advantages in terms of test efficiency, verification cost and versatility, and is an important means for automatic driving test verification. However, it has shortcomings such as long scenario construction period and large repeatability. This paper is compiled based on secondary development of the automatic driving simulation software Panosim and presenting the automatic inputting of scenario and rapid adjustment of parameters through the digital twinning technology. In addition, the natural driving scenario database of China Automotive Technology and Research Center is used for verification. The results show that this method can improve the efficiency and accuracy of scenario construction, and greatly shorten the cycle of simulation test.
{"title":"Research on Automatic Generation Method of Scenario Based on Panosim","authors":"Zhang Lu, Zhibin Du, Xianglei Zhu","doi":"10.1145/3388142.3388165","DOIUrl":"https://doi.org/10.1145/3388142.3388165","url":null,"abstract":"With the development of science and technology, L3 intelligent vehicles are gradually entering the mass production phase. Traditional testing tools and methods can hardly meet the requirements for multiple dimensions, high standard and big data of self-driving vehicles. The scenario-based simulation test method has great technical advantages in terms of test efficiency, verification cost and versatility, and is an important means for automatic driving test verification. However, it has shortcomings such as long scenario construction period and large repeatability. This paper is compiled based on secondary development of the automatic driving simulation software Panosim and presenting the automatic inputting of scenario and rapid adjustment of parameters through the digital twinning technology. In addition, the natural driving scenario database of China Automotive Technology and Research Center is used for verification. The results show that this method can improve the efficiency and accuracy of scenario construction, and greatly shorten the cycle of simulation test.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122609241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media, is often the go-to place where people discuss their opinions and share their feelings. As some platforms provide more anonymity than others, users have taken advantage of that privilege, by sitting behind the screen, the use of profanity has been able to create a toxic environment. Although not all profanities are used to offend people, it is undeniable that the anonymity has allowed social media users to express themselves more freely, increasing the likelihood of swearing. In this study, the use of profanity by different gender classes is compiled, and the findings showed that different genders often employ swear words from different hate categories, e.g. males tend to use more terms from the "disability" hate group. Classification models have been developed to predict the gender of tweet authors, and results showed that profanity could be used to uncover the gender of anonymous users. This shows the possibility that profiling of cyberbullies can be done from the aspect of gender based on profanity usage.
{"title":"How Different Genders Use Profanity on Twitter?","authors":"S. Wong, P. Teh, Chi-Bin Cheng","doi":"10.1145/3388142.3388145","DOIUrl":"https://doi.org/10.1145/3388142.3388145","url":null,"abstract":"Social media, is often the go-to place where people discuss their opinions and share their feelings. As some platforms provide more anonymity than others, users have taken advantage of that privilege, by sitting behind the screen, the use of profanity has been able to create a toxic environment. Although not all profanities are used to offend people, it is undeniable that the anonymity has allowed social media users to express themselves more freely, increasing the likelihood of swearing. In this study, the use of profanity by different gender classes is compiled, and the findings showed that different genders often employ swear words from different hate categories, e.g. males tend to use more terms from the \"disability\" hate group. Classification models have been developed to predict the gender of tweet authors, and results showed that profanity could be used to uncover the gender of anonymous users. This shows the possibility that profiling of cyberbullies can be done from the aspect of gender based on profanity usage.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116444465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}