2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00070
Min-Yuh Day, Yu-Ling Kuo
End-to-end question answering system has attracted considerable attention in the artificial intelligence research community in recent years. In this paper, we proposed an integrated deep learning model for factoid question answering system. This study uses the Delta Reading Comprehension Dataset (DRCD) to build a model to implement a factoid question answering system and to combine the classification of question and answer to evaluate with exact match (EM) and F1 score. The study determines whether the comparison can increase the proportion of EM and whether the expected answer type can effectively increase the answer accuracy rate. To perfect the transformation, a question-and-answer system that uses the BERT pre-training model is applied to the DRCD dataset together with the expected answer type analysis and comparison. The contribution of this paper is that we proposed a system architecture of factoid question answering (QA) system using BERT with question expected answer type (Q-EAT) and answer type classification (AT) models. Findings confirm that the classification of question and answer can improve the EM ratio. When the question sentence and the answer classification are the same, the prediction accuracy EM of the question answering system will be improved.
{"title":"A Study of Deep Learning for Factoid Question Answering System","authors":"Min-Yuh Day, Yu-Ling Kuo","doi":"10.1109/IRI49571.2020.00070","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00070","url":null,"abstract":"End-to-end question answering system has attracted considerable attention in the artificial intelligence research community in recent years. In this paper, we proposed an integrated deep learning model for factoid question answering system. This study uses the Delta Reading Comprehension Dataset (DRCD) to build a model to implement a factoid question answering system and to combine the classification of question and answer to evaluate with exact match (EM) and F1 score. The study determines whether the comparison can increase the proportion of EM and whether the expected answer type can effectively increase the answer accuracy rate. To perfect the transformation, a question-and-answer system that uses the BERT pre-training model is applied to the DRCD dataset together with the expected answer type analysis and comparison. The contribution of this paper is that we proposed a system architecture of factoid question answering (QA) system using BERT with question expected answer type (Q-EAT) and answer type classification (AT) models. Findings confirm that the classification of question and answer can improve the EM ratio. When the question sentence and the answer classification are the same, the prediction accuracy EM of the question answering system will be improved.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"12 1","pages":"419-424"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79060693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00027
Jayesh Patel
In the Machine Age, Machine learning (ML) becomes a secret sauce to success for any business. Machine learning applications are not limited to autonomous cars or robotics but are widely used in almost all sectors including finance, healthcare, entertainment, government systems, telecommunications, and many others. Due to a lack of enterprise ML strategy, many enterprises still repeat the tedious steps and spend most of the time massaging the required data. It is easier to access a variety of data because of big data lakes and data democratization. Despite it and decent advances in ML, engineers still spend significant time in data cleansing and feature engineering. Most of the steps are often repeated in this exercise. As a result, it generates identical features with variations that lead to inconsistent results in testing and training ML applications. It often stretches the time to go-live and increases the number of iterations to ship a final ML application. Sharing the best practices and best features are not only time-savers but they also help to jumpstart ML application development. The democratization of ML features is a powerful way to share useful features, to reduce time go-live, and to enable rapid ML application development. It is one of the emerging trends in enterprise ML application development and this paper presents details about a way to achieve ML feature democratization.
{"title":"The Democratization of Machine Learning Features","authors":"Jayesh Patel","doi":"10.1109/IRI49571.2020.00027","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00027","url":null,"abstract":"In the Machine Age, Machine learning (ML) becomes a secret sauce to success for any business. Machine learning applications are not limited to autonomous cars or robotics but are widely used in almost all sectors including finance, healthcare, entertainment, government systems, telecommunications, and many others. Due to a lack of enterprise ML strategy, many enterprises still repeat the tedious steps and spend most of the time massaging the required data. It is easier to access a variety of data because of big data lakes and data democratization. Despite it and decent advances in ML, engineers still spend significant time in data cleansing and feature engineering. Most of the steps are often repeated in this exercise. As a result, it generates identical features with variations that lead to inconsistent results in testing and training ML applications. It often stretches the time to go-live and increases the number of iterations to ship a final ML application. Sharing the best practices and best features are not only time-savers but they also help to jumpstart ML application development. The democratization of ML features is a powerful way to share useful features, to reduce time go-live, and to enable rapid ML application development. It is one of the emerging trends in enterprise ML application development and this paper presents details about a way to achieve ML feature democratization.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"22 1","pages":"136-141"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90390176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00012
Ibrahim Yilmaz, Rahat Masum, Ambareen Siraj
Machine learning techniques help to understand underlying patterns in datasets to develop defense mechanisms against cyber attacks. Multilayer Perceptron (MLP) technique is a machine learning technique used in detecting attack vs. benign data. However, it is difficult to construct any effective model when there are imbalances in the dataset that prevent proper classification of attack samples in data. In this research, we use UGR’16 dataset to conduct data wrangling initially. This technique helps to prepare a test set from the original dataset to train the neural network model effectively. We experimented with a series of inputs of varying sizes (i.e. 10000, 50000, 1 million) to observe the performance of the MLP neural network model with distribution of features over accuracy. Later, we use Generative Adversarial Network (GAN) model that produces samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan) for balancing the dataset. These samples are generated based on data from the UGR’16 dataset. Further experiments with MLP neural network model shows that a balanced attack sample dataset, made possible with GAN, produces more accurate results than an imbalanced one.
{"title":"Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection","authors":"Ibrahim Yilmaz, Rahat Masum, Ambareen Siraj","doi":"10.1109/IRI49571.2020.00012","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00012","url":null,"abstract":"Machine learning techniques help to understand underlying patterns in datasets to develop defense mechanisms against cyber attacks. Multilayer Perceptron (MLP) technique is a machine learning technique used in detecting attack vs. benign data. However, it is difficult to construct any effective model when there are imbalances in the dataset that prevent proper classification of attack samples in data. In this research, we use UGR’16 dataset to conduct data wrangling initially. This technique helps to prepare a test set from the original dataset to train the neural network model effectively. We experimented with a series of inputs of varying sizes (i.e. 10000, 50000, 1 million) to observe the performance of the MLP neural network model with distribution of features over accuracy. Later, we use Generative Adversarial Network (GAN) model that produces samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan) for balancing the dataset. These samples are generated based on data from the UGR’16 dataset. Further experiments with MLP neural network model shows that a balanced attack sample dataset, made possible with GAN, produces more accurate results than an imbalanced one.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"24 1","pages":"25-30"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74315019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00018
Gavindya Jayawardena, S. Jayarathna
Eye-tracking experiments usually involves areas of interests (AOIs) for the analysis of eye gaze data as they could reveal potential cognitive load, and attentional patterns yielding interesting results about participants. While there are tools to define AOIs to extract eye movement data for the analysis of gaze measurements, they may require users to draw boundaries of AOIs on eye tracking stimuli manually or use markers to define AOIs in the space to generate AOI-mapped gaze locations. In this paper, we introduce a novel method to dynamically filter eye movement data from AOIs for the analysis of advanced eye gaze metrics. We incorporate pre-trained object detectors for offline detection of dynamic AOIs in dynamic eye-tracking stimuli such as video streams. We present our implementation and evaluation of object detectors to find the best object detector to be integrated in a real-time eye movement analysis pipeline to filter eye movement data that falls within the polygonal boundaries of detected dynamic AOIs. Our results indicate the utility of our method by applying it to a publicly available dataset.
{"title":"Automated Filtering of Eye Gaze Metrics from Dynamic Areas of Interest","authors":"Gavindya Jayawardena, S. Jayarathna","doi":"10.1109/IRI49571.2020.00018","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00018","url":null,"abstract":"Eye-tracking experiments usually involves areas of interests (AOIs) for the analysis of eye gaze data as they could reveal potential cognitive load, and attentional patterns yielding interesting results about participants. While there are tools to define AOIs to extract eye movement data for the analysis of gaze measurements, they may require users to draw boundaries of AOIs on eye tracking stimuli manually or use markers to define AOIs in the space to generate AOI-mapped gaze locations. In this paper, we introduce a novel method to dynamically filter eye movement data from AOIs for the analysis of advanced eye gaze metrics. We incorporate pre-trained object detectors for offline detection of dynamic AOIs in dynamic eye-tracking stimuli such as video streams. We present our implementation and evaluation of object detectors to find the best object detector to be integrated in a real-time eye movement analysis pipeline to filter eye movement data that falls within the polygonal boundaries of detected dynamic AOIs. Our results indicate the utility of our method by applying it to a publicly available dataset.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"107 1","pages":"67-74"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79574731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00045
Si-Hong Lam, Eric Brewer, Yiu-Kai Ng
According to the Canadian Science Publishing, there are approximately 2.5 million scientific papers published each year. The huge volume of publications can be contributed to a substantial increase in the total number of academic journals, including the increasing number of predatory or fake scientific journals, which yield high volumes of poor-quality research work. The effect of this scenario is that there is an obsolete jungle of journals to flip through in searching for high-quality and relevant references for researchers, ranging from the ones who simply look for citations to cite or latest development and knowledge in a specific scientific area of study. Querying existing web search engines and research paper archived websites is not the solution to the problem, since they are m-equipped to suggest high quality publications to meet the users’ information needs. In solving this problem, we propose an elegant research paper recommender, which is unique compared with existing ones, since besides considering the topics and contents of related publications, it also examines the authority and popularity of each publication to ensure its quality. Conducted empirical study shows that our recommender outperforms existing research paper recommenders and contributes to the design of searching relevant publications.
{"title":"Using a Deep Learning Model, Content Features, and Author Metadata to Recommend Research Papers","authors":"Si-Hong Lam, Eric Brewer, Yiu-Kai Ng","doi":"10.1109/IRI49571.2020.00045","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00045","url":null,"abstract":"According to the Canadian Science Publishing, there are approximately 2.5 million scientific papers published each year. The huge volume of publications can be contributed to a substantial increase in the total number of academic journals, including the increasing number of predatory or fake scientific journals, which yield high volumes of poor-quality research work. The effect of this scenario is that there is an obsolete jungle of journals to flip through in searching for high-quality and relevant references for researchers, ranging from the ones who simply look for citations to cite or latest development and knowledge in a specific scientific area of study. Querying existing web search engines and research paper archived websites is not the solution to the problem, since they are m-equipped to suggest high quality publications to meet the users’ information needs. In solving this problem, we propose an elegant research paper recommender, which is unique compared with existing ones, since besides considering the topics and contents of related publications, it also examines the authority and popularity of each publication to ensure its quality. Conducted empirical study shows that our recommender outperforms existing research paper recommenders and contributes to the design of searching relevant publications.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"265-270"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76863501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00063
D. Tamburri, W. Heuvel, Martin Garriga
Big Data analytics supported by AI algorithms enable skills localization and retrieval, in the context of a labor market intelligence problem. We formulate and solve this problem through specific DataOps models, blending data sources from administrative and technical partners in several countries into cooperation, creating shared knowledge to support policy and decision-making. We then focus on the critical task of skills extraction from resumes and vacancies featuring state-of-the-art machine learning models. We showcase preliminary results with applied machine learning on real data from the employment agencies of the Netherlands and the Flemish region in Belgium. The final goal is to match these skills to standard ontologies of skills, jobs and occupations.
{"title":"DataOps for Societal Intelligence: a Data Pipeline for Labor Market Skills Extraction and Matching","authors":"D. Tamburri, W. Heuvel, Martin Garriga","doi":"10.1109/IRI49571.2020.00063","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00063","url":null,"abstract":"Big Data analytics supported by AI algorithms enable skills localization and retrieval, in the context of a labor market intelligence problem. We formulate and solve this problem through specific DataOps models, blending data sources from administrative and technical partners in several countries into cooperation, creating shared knowledge to support policy and decision-making. We then focus on the critical task of skills extraction from resumes and vacancies featuring state-of-the-art machine learning models. We showcase preliminary results with applied machine learning on real data from the employment agencies of the Netherlands and the Flemish region in Belgium. The final goal is to match these skills to standard ontologies of skills, jobs and occupations.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"89 1","pages":"391-394"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78989268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00040
Srikanth Vadlamani, M. Hashemi
Lack of adequate streetlights likely affect public safety, particularly in neighborhoods with higher crime rates. Several researchers have studied the influence of streetlights on crime. However, those studies compare the crime rate during the day and not night or explore crime patterns in socially disorganized communities. This study focuses on detecting the pattern of nighttime street crime near a broken or due-for-repair streetlights. Historical crime data and data on city streetlight service requests studied in this project. Analytical approaches for this projects include the least squares linear regression model applied to determine the relationship between streetlight and crime data and Ripley’s K function is used to detect crime clusters near broken streetlights. The Moran’s I index is used to measuring the spatial correlation between broken streetlights and crime rates. Optimized hotspot analysis is used to predict crime locations. This study found that broken streetlights cause increasing trends of crime near them The Moran’s I index’s large positive value underscored the statistically-significant clustering of street crimes around broken streetlights
{"title":"Studying the impact of streetlights on street crime rate using geo-statistics","authors":"Srikanth Vadlamani, M. Hashemi","doi":"10.1109/IRI49571.2020.00040","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00040","url":null,"abstract":"Lack of adequate streetlights likely affect public safety, particularly in neighborhoods with higher crime rates. Several researchers have studied the influence of streetlights on crime. However, those studies compare the crime rate during the day and not night or explore crime patterns in socially disorganized communities. This study focuses on detecting the pattern of nighttime street crime near a broken or due-for-repair streetlights. Historical crime data and data on city streetlight service requests studied in this project. Analytical approaches for this projects include the least squares linear regression model applied to determine the relationship between streetlight and crime data and Ripley’s K function is used to detect crime clusters near broken streetlights. The Moran’s I index is used to measuring the spatial correlation between broken streetlights and crime rates. Optimized hotspot analysis is used to predict crime locations. This study found that broken streetlights cause increasing trends of crime near them The Moran’s I index’s large positive value underscored the statistically-significant clustering of street crimes around broken streetlights","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"231-236"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75126564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00032
Jiajia Lu, Xiaofeng Di, Luyi Bai
Due to an ever-increasing number of RDF data with time features and space features, it is an important task to query efficiently spatiotemporal RDF data over RDF datasets. In this paper, the spatiotemporal RDF data contains time features, space features and text features, which are processed separately to facilitate query. Meanwhile the decomposition graph algorithm and the combination query paths algorithm are designed. The query graph with spatiotemporal features is split into multiple paths, and then every path in the query graph is used to search for the best matching path in the path sets contained in the data graph. Due to the existence of inaccurate matchings, approximate matchings are performed according to the evaluation function to find the best matching path. Finally, all the best paths are combined to generate a matching result graph. Our approach is evaluated from approximate performances and query performances. The experimental results show that the effectiveness and efficiency of our method
{"title":"Approximate Matching of Spatiotemporal RDF Data by Path","authors":"Jiajia Lu, Xiaofeng Di, Luyi Bai","doi":"10.1109/IRI49571.2020.00032","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00032","url":null,"abstract":"Due to an ever-increasing number of RDF data with time features and space features, it is an important task to query efficiently spatiotemporal RDF data over RDF datasets. In this paper, the spatiotemporal RDF data contains time features, space features and text features, which are processed separately to facilitate query. Meanwhile the decomposition graph algorithm and the combination query paths algorithm are designed. The query graph with spatiotemporal features is split into multiple paths, and then every path in the query graph is used to search for the best matching path in the path sets contained in the data graph. Due to the existence of inaccurate matchings, approximate matchings are performed according to the evaluation function to find the best matching path. Finally, all the best paths are combined to generate a matching result graph. Our approach is evaluated from approximate performances and query performances. The experimental results show that the effectiveness and efficiency of our method","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"61 1575 1","pages":"172-179"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82879699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/iri49571.2020.00008
Abdulhamid A. Adebayo
Abdulhamid Adebayo, IBM T.J. Watson Research Center, USA Abdulrhman M Alshareef, King AbdulAziz University, Saudi Arabia Anna Squicciarini, Pennsylvania State University, USA Arun Thapa, Tuskegee University, USA Balaji Palanisamy, University of Pittsburgh, USA Bharat Rawal, Pennsylvania State University, USA Caojin Zhang, Wayne State University, USA Chin-Wan Chung, Korea Advanced Institute of Science and Technology, South Korea Chongyang Shi, Beijing Institute of Technology, China Da Yan, University of Alabama at Birmingham, USA Dalei Wu, University of Tennessee at Chattanooga, USA Du Zhang, California State University, USA Elisa Bertino, Purdue University, USA Fei Zhao, University of Alabama at Birmingham, USA Feifei Zhang, Institute of Automation, Chinese Academy of Sciences, China Haiman Tian, Florida International University, USA Hao Wang, Louisiana State University, USA Hemanth Gudaparthi, University of Cincinnati, USA Hung T Nguyen, Carnegie Mellon University, USA Kayhan Ghafoor, Salahaddin University-Erbil, Iraq Kouichi Sakurai, Kyushu University, Japan Lidan Shou, Zhejiang University, China Ling Zhou, Jiangsu University, China Lixiao Huang, Arizona State University, USA Maria Presa-Reyes, Florida International University, USA Mei-Ling Shyu, University of Miami, USA Mengjun Xie, University of Tennessee at Chattanooga, USA Mohan Baruwal, Swinburne University of Technology, Australia Mortada Al-Banna, University of New South Wales, Australia Mounifah Alenazi, University of Cincinnati, USA Mukesh Saini, Indian Institute of Technology Ropar, India Nathalie Baracaldo, IBM Almaden Research Center, USA Nuray Baltaci, University of Pittsburgh, USA Omair Shafiq, Carleton University, Canada Orhun Vural, University of Alabama at Birmingham, USA Raj Gaire, CSIRO, Australia Ronald Doku, Howard University, USA Saad Sadiq, University of Miami, USA Sachin S Shetty, Old Dominion University, USA Samira Pouyanfar, Microsoft, USA Sandeep Reddivari, University of North Florida, USA Shihong Huang, Florida Atlantic University, USA Soumyanil Banerjee, Wayne State University, USA Taghi M. Khoshgoftaar, Florida Atlantic University, USA Tanmay Bhowmik, Mississippi State University, USA Tanvir Ahmed, Oracle, USA
阿德巴约、IBM T.J.沃森研究中心、美国Abdulrhman M Alshareef、阿卜杜勒-阿齐兹国王大学、沙特阿拉伯安娜·斯奎恰里尼、宾夕法尼亚州立大学、美国Arun Thapa、塔斯基吉大学、美国巴拉吉·帕拉尼萨米、匹兹堡大学、美国巴拉特·拉瓦尔、宾夕法尼亚州立大学、美国张超金、韦恩州立大学、美国钟镇浣、韩国科学技术院、韩国石重阳、北京理工大学、中国大严、阿拉巴马大学伯明翰分校,美国吴大磊,田纳西大学查塔努加分校,美国张杜,加州州立大学,美国Elisa Bertino,普渡大学,美国赵飞,阿拉巴马大学伯明翰分校,美国张菲菲,中国科学院自动化研究所,中国田海曼,佛罗里达国际大学,美国王浩,路易斯安那州立大学,美国Hemanth Gudaparthi,辛辛那提大学,美国Hung T Nguyen,卡内基梅隆大学,美国Kayhan Ghafoor、伊拉克萨拉哈丁大学-埃尔比勒、伊拉克樱井Kouichi、九州大学、日本寿立丹、浙江大学、中国周玲、江苏大学、中国黄立晓、亚利桑那州立大学、美国Maria Presa-Reyes、佛罗里达国际大学、美国施美玲、迈阿密大学、美国谢孟军、田纳西大学查塔努加分校、美国Mohan Baruwal、斯威本理工大学、澳大利亚Mortada Al-Banna、新南威尔士大学、澳大利亚Mounifah Alenazi、美国辛辛那提大学Mukesh Saini、印度罗帕尔理工学院、印度Nathalie Baracaldo、IBM阿尔马登研究中心、美国Nuray Baltaci、匹兹堡大学、美国Omair Shafiq、卡尔顿大学、加拿大Orhun Vural、阿拉巴马大学伯明翰分校、美国Raj Gaire、CSIRO、澳大利亚Ronald Doku、霍华德大学、美国Saad Sadiq、迈阿密大学、美国Sachin S Shetty、Old Dominion大学、美国Samira Pouyanfar、微软、美国Sandeep Reddivari,北佛罗里达大学,美国Shihong Huang,佛罗里达大西洋大学,美国Soumyanil Banerjee, Wayne州立大学,美国Taghi M. Khoshgoftaar,佛罗里达大西洋大学,美国Tanmay Bhowmik,密西西比州立大学,美国Tanvir Ahmed, Oracle,美国
{"title":"IRI 2020 Committees","authors":"Abdulhamid A. Adebayo","doi":"10.1109/iri49571.2020.00008","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00008","url":null,"abstract":"Abdulhamid Adebayo, IBM T.J. Watson Research Center, USA Abdulrhman M Alshareef, King AbdulAziz University, Saudi Arabia Anna Squicciarini, Pennsylvania State University, USA Arun Thapa, Tuskegee University, USA Balaji Palanisamy, University of Pittsburgh, USA Bharat Rawal, Pennsylvania State University, USA Caojin Zhang, Wayne State University, USA Chin-Wan Chung, Korea Advanced Institute of Science and Technology, South Korea Chongyang Shi, Beijing Institute of Technology, China Da Yan, University of Alabama at Birmingham, USA Dalei Wu, University of Tennessee at Chattanooga, USA Du Zhang, California State University, USA Elisa Bertino, Purdue University, USA Fei Zhao, University of Alabama at Birmingham, USA Feifei Zhang, Institute of Automation, Chinese Academy of Sciences, China Haiman Tian, Florida International University, USA Hao Wang, Louisiana State University, USA Hemanth Gudaparthi, University of Cincinnati, USA Hung T Nguyen, Carnegie Mellon University, USA Kayhan Ghafoor, Salahaddin University-Erbil, Iraq Kouichi Sakurai, Kyushu University, Japan Lidan Shou, Zhejiang University, China Ling Zhou, Jiangsu University, China Lixiao Huang, Arizona State University, USA Maria Presa-Reyes, Florida International University, USA Mei-Ling Shyu, University of Miami, USA Mengjun Xie, University of Tennessee at Chattanooga, USA Mohan Baruwal, Swinburne University of Technology, Australia Mortada Al-Banna, University of New South Wales, Australia Mounifah Alenazi, University of Cincinnati, USA Mukesh Saini, Indian Institute of Technology Ropar, India Nathalie Baracaldo, IBM Almaden Research Center, USA Nuray Baltaci, University of Pittsburgh, USA Omair Shafiq, Carleton University, Canada Orhun Vural, University of Alabama at Birmingham, USA Raj Gaire, CSIRO, Australia Ronald Doku, Howard University, USA Saad Sadiq, University of Miami, USA Sachin S Shetty, Old Dominion University, USA Samira Pouyanfar, Microsoft, USA Sandeep Reddivari, University of North Florida, USA Shihong Huang, Florida Atlantic University, USA Soumyanil Banerjee, Wayne State University, USA Taghi M. Khoshgoftaar, Florida Atlantic University, USA Tanmay Bhowmik, Mississippi State University, USA Tanvir Ahmed, Oracle, USA","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88822360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-01DOI: 10.1109/IRI49571.2020.00041
Salim Sazzed
Bengali, one of the most spoken languages, lacks tools and resources for sentiment analysis. To date, the Bengali language does not have any sentiment lexicon of its own; only the translated versions of English lexica are available. Therefore, in this work, we focus on developing a Bengali sentiment lexicon from a large Bengali review corpus utilizing a cross-lingual approach. To build the sentiment dictionary, we first created a Bengali corpus of around 42000 drama reviews; among them, we manually annotated around 12000 reviews. Utilizing a machine translation system, labeled and unlabeled Bengali review corpus, English sentiment lexica, pointwise mutual information (PMI), and supervised machine learning (ML) classifiers in different phases, we develop a Bengali sentiment lexicon of around 1000 sentiment words. We compare the coverage of our lexicon with the translated English lexica in two evaluation datasets. The proposed lexicon achieves 70%-74% coverage in document-level and around 65% coverage in word-level, which is approximately 30%-100% improvement over the translated lexica in word-level and 30%-50% in document-level. The results demonstrate that our developed lexicon is highly effective in recognizing sentiments in the Bengali text.
{"title":"Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources","authors":"Salim Sazzed","doi":"10.1109/IRI49571.2020.00041","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00041","url":null,"abstract":"Bengali, one of the most spoken languages, lacks tools and resources for sentiment analysis. To date, the Bengali language does not have any sentiment lexicon of its own; only the translated versions of English lexica are available. Therefore, in this work, we focus on developing a Bengali sentiment lexicon from a large Bengali review corpus utilizing a cross-lingual approach. To build the sentiment dictionary, we first created a Bengali corpus of around 42000 drama reviews; among them, we manually annotated around 12000 reviews. Utilizing a machine translation system, labeled and unlabeled Bengali review corpus, English sentiment lexica, pointwise mutual information (PMI), and supervised machine learning (ML) classifiers in different phases, we develop a Bengali sentiment lexicon of around 1000 sentiment words. We compare the coverage of our lexicon with the translated English lexica in two evaluation datasets. The proposed lexicon achieves 70%-74% coverage in document-level and around 65% coverage in word-level, which is approximately 30%-100% improvement over the translated lexica in word-level and 30%-50% in document-level. The results demonstrate that our developed lexicon is highly effective in recognizing sentiments in the Bengali text.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"96 1","pages":"237-244"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73589858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...