2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献_第5页

A Study of Deep Learning for Factoid Question Answering System 基于深度学习的虚假问答系统研究

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00070

Min-Yuh Day, Yu-Ling Kuo

End-to-end question answering system has attracted considerable attention in the artificial intelligence research community in recent years. In this paper, we proposed an integrated deep learning model for factoid question answering system. This study uses the Delta Reading Comprehension Dataset (DRCD) to build a model to implement a factoid question answering system and to combine the classification of question and answer to evaluate with exact match (EM) and F1 score. The study determines whether the comparison can increase the proportion of EM and whether the expected answer type can effectively increase the answer accuracy rate. To perfect the transformation, a question-and-answer system that uses the BERT pre-training model is applied to the DRCD dataset together with the expected answer type analysis and comparison. The contribution of this paper is that we proposed a system architecture of factoid question answering (QA) system using BERT with question expected answer type (Q-EAT) and answer type classification (AT) models. Findings confirm that the classification of question and answer can improve the EM ratio. When the question sentence and the answer classification are the same, the prediction accuracy EM of the question answering system will be improved.

端到端问答系统近年来在人工智能研究界引起了相当大的关注。在本文中，我们提出了一种集成深度学习模型的仿式问答系统。本研究利用Delta阅读理解数据集(DRCD)构建模型，实现了一个基于事实的问答系统，并将问答分类与精确匹配(EM)和F1分数相结合进行评价。研究确定了比较是否可以增加EM的比例，以及期望的答案类型是否可以有效地提高答案准确率。为了完善转换，将使用BERT预训练模型的问答系统应用于DRCD数据集，并对预期答案类型进行分析和比较。本文的贡献在于，我们提出了一种基于BERT的问题期望答案类型(Q-EAT)和答案类型分类(AT)模型的事实问答(QA)系统架构。研究结果证实，问题和答案的分类可以提高EM比率。当问题句和答案分类相同时，问答系统的预测精度EM将得到提高。

{"title":"A Study of Deep Learning for Factoid Question Answering System","authors":"Min-Yuh Day, Yu-Ling Kuo","doi":"10.1109/IRI49571.2020.00070","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00070","url":null,"abstract":"End-to-end question answering system has attracted considerable attention in the artificial intelligence research community in recent years. In this paper, we proposed an integrated deep learning model for factoid question answering system. This study uses the Delta Reading Comprehension Dataset (DRCD) to build a model to implement a factoid question answering system and to combine the classification of question and answer to evaluate with exact match (EM) and F1 score. The study determines whether the comparison can increase the proportion of EM and whether the expected answer type can effectively increase the answer accuracy rate. To perfect the transformation, a question-and-answer system that uses the BERT pre-training model is applied to the DRCD dataset together with the expected answer type analysis and comparison. The contribution of this paper is that we proposed a system architecture of factoid question answering (QA) system using BERT with question expected answer type (Q-EAT) and answer type classification (AT) models. Findings confirm that the classification of question and answer can improve the EM ratio. When the question sentence and the answer classification are the same, the prediction accuracy EM of the question answering system will be improved.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"12 1","pages":"419-424"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79060693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

The Democratization of Machine Learning Features 机器学习特征的民主化

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00027

Jayesh Patel

In the Machine Age, Machine learning (ML) becomes a secret sauce to success for any business. Machine learning applications are not limited to autonomous cars or robotics but are widely used in almost all sectors including finance, healthcare, entertainment, government systems, telecommunications, and many others. Due to a lack of enterprise ML strategy, many enterprises still repeat the tedious steps and spend most of the time massaging the required data. It is easier to access a variety of data because of big data lakes and data democratization. Despite it and decent advances in ML, engineers still spend significant time in data cleansing and feature engineering. Most of the steps are often repeated in this exercise. As a result, it generates identical features with variations that lead to inconsistent results in testing and training ML applications. It often stretches the time to go-live and increases the number of iterations to ship a final ML application. Sharing the best practices and best features are not only time-savers but they also help to jumpstart ML application development. The democratization of ML features is a powerful way to share useful features, to reduce time go-live, and to enable rapid ML application development. It is one of the emerging trends in enterprise ML application development and this paper presents details about a way to achieve ML feature democratization.

在机器时代，机器学习(ML)成为任何企业成功的秘诀。机器学习应用并不局限于自动驾驶汽车或机器人，它被广泛应用于几乎所有领域，包括金融、医疗、娱乐、政府系统、电信等。由于缺乏企业机器学习策略，许多企业仍然重复繁琐的步骤，并花费大部分时间处理所需的数据。由于大数据湖和数据民主化，更容易访问各种数据。尽管机器学习取得了长足的进步，但工程师们仍然在数据清理和特征工程上花费了大量时间。在这个练习中，大多数步骤经常重复。因此，它会生成相同的特征，但会导致在测试和训练ML应用程序中产生不一致的结果。它通常会延长上线时间，并增加交付最终ML应用程序的迭代次数。分享最佳实践和最佳特性不仅可以节省时间，而且还有助于快速启动ML应用程序开发。ML特性的民主化是共享有用特性、缩短上线时间和实现快速ML应用程序开发的一种强大方式。这是企业机器学习应用开发的新兴趋势之一，本文详细介绍了一种实现机器学习特征民主化的方法。

{"title":"The Democratization of Machine Learning Features","authors":"Jayesh Patel","doi":"10.1109/IRI49571.2020.00027","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00027","url":null,"abstract":"In the Machine Age, Machine learning (ML) becomes a secret sauce to success for any business. Machine learning applications are not limited to autonomous cars or robotics but are widely used in almost all sectors including finance, healthcare, entertainment, government systems, telecommunications, and many others. Due to a lack of enterprise ML strategy, many enterprises still repeat the tedious steps and spend most of the time massaging the required data. It is easier to access a variety of data because of big data lakes and data democratization. Despite it and decent advances in ML, engineers still spend significant time in data cleansing and feature engineering. Most of the steps are often repeated in this exercise. As a result, it generates identical features with variations that lead to inconsistent results in testing and training ML applications. It often stretches the time to go-live and increases the number of iterations to ship a final ML application. Sharing the best practices and best features are not only time-savers but they also help to jumpstart ML application development. The democratization of ML features is a powerful way to share useful features, to reduce time go-live, and to enable rapid ML application development. It is one of the emerging trends in enterprise ML application development and this paper presents details about a way to achieve ML feature democratization.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"22 1","pages":"136-141"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90390176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection 基于生成对抗网络的入侵检测数据不平衡问题

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00012

Ibrahim Yilmaz, Rahat Masum, Ambareen Siraj

Machine learning techniques help to understand underlying patterns in datasets to develop defense mechanisms against cyber attacks. Multilayer Perceptron (MLP) technique is a machine learning technique used in detecting attack vs. benign data. However, it is difficult to construct any effective model when there are imbalances in the dataset that prevent proper classification of attack samples in data. In this research, we use UGR’16 dataset to conduct data wrangling initially. This technique helps to prepare a test set from the original dataset to train the neural network model effectively. We experimented with a series of inputs of varying sizes (i.e. 10000, 50000, 1 million) to observe the performance of the MLP neural network model with distribution of features over accuracy. Later, we use Generative Adversarial Network (GAN) model that produces samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan) for balancing the dataset. These samples are generated based on data from the UGR’16 dataset. Further experiments with MLP neural network model shows that a balanced attack sample dataset, made possible with GAN, produces more accurate results than an imbalanced one.

机器学习技术有助于理解数据集中的潜在模式，以开发针对网络攻击的防御机制。多层感知器(MLP)技术是一种用于检测攻击数据与良性数据的机器学习技术。然而，当数据集中存在不平衡，无法对数据中的攻击样本进行正确分类时，很难构建有效的模型。在本研究中，我们首先使用UGR ' 16数据集进行数据整理。该技术有助于从原始数据集中准备一个测试集，以有效地训练神经网络模型。我们用一系列不同大小的输入(即10000、50000、100万)进行实验，观察MLP神经网络模型在特征分布上的性能。随后，我们使用生成式对抗网络(GAN)模型生成不同攻击标签的样本(例如黑名单，异常垃圾邮件，ssh扫描)来平衡数据集。这些样本是基于UGR ' 16数据集的数据生成的。对MLP神经网络模型的进一步实验表明，GAN使平衡的攻击样本数据集比不平衡的攻击样本数据集产生更准确的结果。

{"title":"Addressing Imbalanced Data Problem with Generative Adversarial Network For Intrusion Detection","authors":"Ibrahim Yilmaz, Rahat Masum, Ambareen Siraj","doi":"10.1109/IRI49571.2020.00012","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00012","url":null,"abstract":"Machine learning techniques help to understand underlying patterns in datasets to develop defense mechanisms against cyber attacks. Multilayer Perceptron (MLP) technique is a machine learning technique used in detecting attack vs. benign data. However, it is difficult to construct any effective model when there are imbalances in the dataset that prevent proper classification of attack samples in data. In this research, we use UGR’16 dataset to conduct data wrangling initially. This technique helps to prepare a test set from the original dataset to train the neural network model effectively. We experimented with a series of inputs of varying sizes (i.e. 10000, 50000, 1 million) to observe the performance of the MLP neural network model with distribution of features over accuracy. Later, we use Generative Adversarial Network (GAN) model that produces samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan) for balancing the dataset. These samples are generated based on data from the UGR’16 dataset. Further experiments with MLP neural network model shows that a balanced attack sample dataset, made possible with GAN, produces more accurate results than an imbalanced one.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"24 1","pages":"25-30"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74315019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Automated Filtering of Eye Gaze Metrics from Dynamic Areas of Interest 从感兴趣的动态区域自动过滤眼睛注视指标

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00018

Gavindya Jayawardena, S. Jayarathna

Eye-tracking experiments usually involves areas of interests (AOIs) for the analysis of eye gaze data as they could reveal potential cognitive load, and attentional patterns yielding interesting results about participants. While there are tools to define AOIs to extract eye movement data for the analysis of gaze measurements, they may require users to draw boundaries of AOIs on eye tracking stimuli manually or use markers to define AOIs in the space to generate AOI-mapped gaze locations. In this paper, we introduce a novel method to dynamically filter eye movement data from AOIs for the analysis of advanced eye gaze metrics. We incorporate pre-trained object detectors for offline detection of dynamic AOIs in dynamic eye-tracking stimuli such as video streams. We present our implementation and evaluation of object detectors to find the best object detector to be integrated in a real-time eye movement analysis pipeline to filter eye movement data that falls within the polygonal boundaries of detected dynamic AOIs. Our results indicate the utility of our method by applying it to a publicly available dataset.

眼球追踪实验通常涉及兴趣区域(AOIs)来分析眼球注视数据，因为它们可以揭示潜在的认知负荷，以及产生关于参与者的有趣结果的注意模式。虽然有工具可以定义aoi来提取眼球运动数据以分析凝视测量，但它们可能需要用户手动在眼动追踪刺激上绘制aoi的边界，或者使用标记在空间中定义aoi以生成aoi映射的凝视位置。本文介绍了一种从aoi中动态过滤眼球运动数据的新方法，用于分析高级眼球注视指标。我们结合了预训练的对象检测器来离线检测动态眼动跟踪刺激(如视频流)中的动态aoi。我们提出了目标检测器的实现和评估，以找到最好的目标检测器集成到实时眼动分析管道中，以过滤落在检测到的动态aoi的多边形边界内的眼动数据。通过将我们的方法应用于公开可用的数据集，我们的结果表明了它的实用性。

{"title":"Automated Filtering of Eye Gaze Metrics from Dynamic Areas of Interest","authors":"Gavindya Jayawardena, S. Jayarathna","doi":"10.1109/IRI49571.2020.00018","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00018","url":null,"abstract":"Eye-tracking experiments usually involves areas of interests (AOIs) for the analysis of eye gaze data as they could reveal potential cognitive load, and attentional patterns yielding interesting results about participants. While there are tools to define AOIs to extract eye movement data for the analysis of gaze measurements, they may require users to draw boundaries of AOIs on eye tracking stimuli manually or use markers to define AOIs in the space to generate AOI-mapped gaze locations. In this paper, we introduce a novel method to dynamically filter eye movement data from AOIs for the analysis of advanced eye gaze metrics. We incorporate pre-trained object detectors for offline detection of dynamic AOIs in dynamic eye-tracking stimuli such as video streams. We present our implementation and evaluation of object detectors to find the best object detector to be integrated in a real-time eye movement analysis pipeline to filter eye movement data that falls within the polygonal boundaries of detected dynamic AOIs. Our results indicate the utility of our method by applying it to a publicly available dataset.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"107 1","pages":"67-74"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79574731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Using a Deep Learning Model, Content Features, and Author Metadata to Recommend Research Papers 使用深度学习模型、内容特征和作者元数据推荐研究论文

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00045

Si-Hong Lam, Eric Brewer, Yiu-Kai Ng

According to the Canadian Science Publishing, there are approximately 2.5 million scientific papers published each year. The huge volume of publications can be contributed to a substantial increase in the total number of academic journals, including the increasing number of predatory or fake scientific journals, which yield high volumes of poor-quality research work. The effect of this scenario is that there is an obsolete jungle of journals to flip through in searching for high-quality and relevant references for researchers, ranging from the ones who simply look for citations to cite or latest development and knowledge in a specific scientific area of study. Querying existing web search engines and research paper archived websites is not the solution to the problem, since they are m-equipped to suggest high quality publications to meet the users’ information needs. In solving this problem, we propose an elegant research paper recommender, which is unique compared with existing ones, since besides considering the topics and contents of related publications, it also examines the authority and popularity of each publication to ensure its quality. Conducted empirical study shows that our recommender outperforms existing research paper recommenders and contributes to the design of searching relevant publications.

根据加拿大科学出版社的数据，每年大约有250万篇科学论文发表。大量的出版物可以促成学术期刊总数的大幅增加，包括越来越多的掠夺性或假冒科学期刊，这些期刊产生了大量低质量的研究工作。这种情况的影响是，在为研究人员寻找高质量和相关的参考文献时，有一个过时的期刊丛林要翻阅，从那些只是寻找引用或在特定科学研究领域的最新发展和知识的期刊。查询现有的网络搜索引擎和研究论文存档网站并不是解决问题的办法，因为它们有能力建议高质量的出版物来满足用户的信息需求。为了解决这一问题，我们提出了一种优雅的研究论文推荐，与现有的研究论文推荐相比，它是独一无二的，因为它除了考虑相关出版物的主题和内容外，还考察了每篇出版物的权威性和受欢迎程度，以确保其质量。实证研究表明，我们的推荐器优于现有的研究论文推荐器，并有助于搜索相关出版物的设计。

{"title":"Using a Deep Learning Model, Content Features, and Author Metadata to Recommend Research Papers","authors":"Si-Hong Lam, Eric Brewer, Yiu-Kai Ng","doi":"10.1109/IRI49571.2020.00045","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00045","url":null,"abstract":"According to the Canadian Science Publishing, there are approximately 2.5 million scientific papers published each year. The huge volume of publications can be contributed to a substantial increase in the total number of academic journals, including the increasing number of predatory or fake scientific journals, which yield high volumes of poor-quality research work. The effect of this scenario is that there is an obsolete jungle of journals to flip through in searching for high-quality and relevant references for researchers, ranging from the ones who simply look for citations to cite or latest development and knowledge in a specific scientific area of study. Querying existing web search engines and research paper archived websites is not the solution to the problem, since they are m-equipped to suggest high quality publications to meet the users’ information needs. In solving this problem, we propose an elegant research paper recommender, which is unique compared with existing ones, since besides considering the topics and contents of related publications, it also examines the authority and popularity of each publication to ensure its quality. Conducted empirical study shows that our recommender outperforms existing research paper recommenders and contributes to the design of searching relevant publications.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"265-270"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76863501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DataOps for Societal Intelligence: a Data Pipeline for Labor Market Skills Extraction and Matching 社会智能的数据操作:劳动力市场技能提取和匹配的数据管道

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00063

D. Tamburri, W. Heuvel, Martin Garriga

Big Data analytics supported by AI algorithms enable skills localization and retrieval, in the context of a labor market intelligence problem. We formulate and solve this problem through specific DataOps models, blending data sources from administrative and technical partners in several countries into cooperation, creating shared knowledge to support policy and decision-making. We then focus on the critical task of skills extraction from resumes and vacancies featuring state-of-the-art machine learning models. We showcase preliminary results with applied machine learning on real data from the employment agencies of the Netherlands and the Flemish region in Belgium. The final goal is to match these skills to standard ontologies of skills, jobs and occupations.

在劳动力市场情报问题的背景下，人工智能算法支持的大数据分析可以实现技能定位和检索。我们通过特定的DataOps模型制定和解决这一问题，将来自多个国家的行政和技术合作伙伴的数据源融合到合作中，创建共享知识以支持政策和决策。然后，我们将重点放在从简历和职位空缺中提取技能的关键任务上，并采用最先进的机器学习模型。我们展示了应用机器学习对来自荷兰和比利时佛兰德地区的就业机构的真实数据的初步结果。最终目标是将这些技能与技能、工作和职业的标准本体相匹配。

引用次数: 16

Studying the impact of streetlights on street crime rate using geo-statistics 利用地理统计学研究路灯对街道犯罪率的影响

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00040

Srikanth Vadlamani, M. Hashemi

Lack of adequate streetlights likely affect public safety, particularly in neighborhoods with higher crime rates. Several researchers have studied the influence of streetlights on crime. However, those studies compare the crime rate during the day and not night or explore crime patterns in socially disorganized communities. This study focuses on detecting the pattern of nighttime street crime near a broken or due-for-repair streetlights. Historical crime data and data on city streetlight service requests studied in this project. Analytical approaches for this projects include the least squares linear regression model applied to determine the relationship between streetlight and crime data and Ripley’s K function is used to detect crime clusters near broken streetlights. The Moran’s I index is used to measuring the spatial correlation between broken streetlights and crime rates. Optimized hotspot analysis is used to predict crime locations. This study found that broken streetlights cause increasing trends of crime near them The Moran’s I index’s large positive value underscored the statistically-significant clustering of street crimes around broken streetlights

缺乏充足的路灯可能会影响公共安全，特别是在犯罪率较高的社区。几位研究人员研究了路灯对犯罪的影响。然而，这些研究比较的是白天的犯罪率，而不是夜晚的犯罪率，或者探索社会混乱社区的犯罪模式。这项研究的重点是在损坏或需要维修的路灯附近检测夜间街头犯罪的模式。本课题研究了历史犯罪数据和城市路灯服务请求数据。该项目的分析方法包括最小二乘线性回归模型，用于确定路灯与犯罪数据之间的关系，以及使用Ripley的K函数来检测损坏路灯附近的犯罪集群。莫兰指数用于衡量路灯损坏与犯罪率之间的空间相关性。利用优化的热点分析预测犯罪地点。研究发现，路灯破损导致路灯附近的犯罪呈上升趋势。Moran 's I指数的大正值强调了路灯破损附近街道犯罪的统计学显著聚集

{"title":"Studying the impact of streetlights on street crime rate using geo-statistics","authors":"Srikanth Vadlamani, M. Hashemi","doi":"10.1109/IRI49571.2020.00040","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00040","url":null,"abstract":"Lack of adequate streetlights likely affect public safety, particularly in neighborhoods with higher crime rates. Several researchers have studied the influence of streetlights on crime. However, those studies compare the crime rate during the day and not night or explore crime patterns in socially disorganized communities. This study focuses on detecting the pattern of nighttime street crime near a broken or due-for-repair streetlights. Historical crime data and data on city streetlight service requests studied in this project. Analytical approaches for this projects include the least squares linear regression model applied to determine the relationship between streetlight and crime data and Ripley’s K function is used to detect crime clusters near broken streetlights. The Moran’s I index is used to measuring the spatial correlation between broken streetlights and crime rates. Optimized hotspot analysis is used to predict crime locations. This study found that broken streetlights cause increasing trends of crime near them The Moran’s I index’s large positive value underscored the statistically-significant clustering of street crimes around broken streetlights","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"231-236"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75126564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Approximate Matching of Spatiotemporal RDF Data by Path 时空RDF数据的路径近似匹配

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00032

Jiajia Lu, Xiaofeng Di, Luyi Bai

Due to an ever-increasing number of RDF data with time features and space features, it is an important task to query efficiently spatiotemporal RDF data over RDF datasets. In this paper, the spatiotemporal RDF data contains time features, space features and text features, which are processed separately to facilitate query. Meanwhile the decomposition graph algorithm and the combination query paths algorithm are designed. The query graph with spatiotemporal features is split into multiple paths, and then every path in the query graph is used to search for the best matching path in the path sets contained in the data graph. Due to the existence of inaccurate matchings, approximate matchings are performed according to the evaluation function to find the best matching path. Finally, all the best paths are combined to generate a matching result graph. Our approach is evaluated from approximate performances and query performances. The experimental results show that the effectiveness and efficiency of our method

由于具有时间特征和空间特征的RDF数据越来越多，如何在RDF数据集上高效地查询时空RDF数据是一个重要的任务。在本文中，时空RDF数据包含时间特征、空间特征和文本特征，为了便于查询，它们被分别处理。同时设计了分解图算法和组合查询路径算法。将具有时空特征的查询图分割成多条路径，然后利用查询图中的每条路径在数据图中包含的路径集中搜索最优匹配路径。由于不准确匹配的存在，根据评价函数进行近似匹配，寻找最佳匹配路径。最后，将所有最佳路径进行组合，生成匹配结果图。我们的方法从近似性能和查询性能两方面进行了评估。实验结果表明了该方法的有效性和高效性

{"title":"Approximate Matching of Spatiotemporal RDF Data by Path","authors":"Jiajia Lu, Xiaofeng Di, Luyi Bai","doi":"10.1109/IRI49571.2020.00032","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00032","url":null,"abstract":"Due to an ever-increasing number of RDF data with time features and space features, it is an important task to query efficiently spatiotemporal RDF data over RDF datasets. In this paper, the spatiotemporal RDF data contains time features, space features and text features, which are processed separately to facilitate query. Meanwhile the decomposition graph algorithm and the combination query paths algorithm are designed. The query graph with spatiotemporal features is split into multiple paths, and then every path in the query graph is used to search for the best matching path in the path sets contained in the data graph. Due to the existence of inaccurate matchings, approximate matchings are performed according to the evaluation function to find the best matching path. Finally, all the best paths are combined to generate a matching result graph. Our approach is evaluated from approximate performances and query performances. The experimental results show that the effectiveness and efficiency of our method","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"61 1575 1","pages":"172-179"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82879699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IRI 2020 Committees IRI 2020委员会

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/iri49571.2020.00008

Abdulhamid A. Adebayo

Abdulhamid Adebayo, IBM T.J. Watson Research Center, USA Abdulrhman M Alshareef, King AbdulAziz University, Saudi Arabia Anna Squicciarini, Pennsylvania State University, USA Arun Thapa, Tuskegee University, USA Balaji Palanisamy, University of Pittsburgh, USA Bharat Rawal, Pennsylvania State University, USA Caojin Zhang, Wayne State University, USA Chin-Wan Chung, Korea Advanced Institute of Science and Technology, South Korea Chongyang Shi, Beijing Institute of Technology, China Da Yan, University of Alabama at Birmingham, USA Dalei Wu, University of Tennessee at Chattanooga, USA Du Zhang, California State University, USA Elisa Bertino, Purdue University, USA Fei Zhao, University of Alabama at Birmingham, USA Feifei Zhang, Institute of Automation, Chinese Academy of Sciences, China Haiman Tian, Florida International University, USA Hao Wang, Louisiana State University, USA Hemanth Gudaparthi, University of Cincinnati, USA Hung T Nguyen, Carnegie Mellon University, USA Kayhan Ghafoor, Salahaddin University-Erbil, Iraq Kouichi Sakurai, Kyushu University, Japan Lidan Shou, Zhejiang University, China Ling Zhou, Jiangsu University, China Lixiao Huang, Arizona State University, USA Maria Presa-Reyes, Florida International University, USA Mei-Ling Shyu, University of Miami, USA Mengjun Xie, University of Tennessee at Chattanooga, USA Mohan Baruwal, Swinburne University of Technology, Australia Mortada Al-Banna, University of New South Wales, Australia Mounifah Alenazi, University of Cincinnati, USA Mukesh Saini, Indian Institute of Technology Ropar, India Nathalie Baracaldo, IBM Almaden Research Center, USA Nuray Baltaci, University of Pittsburgh, USA Omair Shafiq, Carleton University, Canada Orhun Vural, University of Alabama at Birmingham, USA Raj Gaire, CSIRO, Australia Ronald Doku, Howard University, USA Saad Sadiq, University of Miami, USA Sachin S Shetty, Old Dominion University, USA Samira Pouyanfar, Microsoft, USA Sandeep Reddivari, University of North Florida, USA Shihong Huang, Florida Atlantic University, USA Soumyanil Banerjee, Wayne State University, USA Taghi M. Khoshgoftaar, Florida Atlantic University, USA Tanmay Bhowmik, Mississippi State University, USA Tanvir Ahmed, Oracle, USA

阿德巴约、IBM T.J.沃森研究中心、美国Abdulrhman M Alshareef、阿卜杜勒-阿齐兹国王大学、沙特阿拉伯安娜·斯奎恰里尼、宾夕法尼亚州立大学、美国Arun Thapa、塔斯基吉大学、美国巴拉吉·帕拉尼萨米、匹兹堡大学、美国巴拉特·拉瓦尔、宾夕法尼亚州立大学、美国张超金、韦恩州立大学、美国钟镇浣、韩国科学技术院、韩国石重阳、北京理工大学、中国大严、阿拉巴马大学伯明翰分校，美国吴大磊，田纳西大学查塔努加分校，美国张杜，加州州立大学，美国Elisa Bertino，普渡大学，美国赵飞，阿拉巴马大学伯明翰分校，美国张菲菲，中国科学院自动化研究所，中国田海曼，佛罗里达国际大学，美国王浩，路易斯安那州立大学，美国Hemanth Gudaparthi，辛辛那提大学，美国Hung T Nguyen，卡内基梅隆大学，美国Kayhan Ghafoor、伊拉克萨拉哈丁大学-埃尔比勒、伊拉克樱井Kouichi、九州大学、日本寿立丹、浙江大学、中国周玲、江苏大学、中国黄立晓、亚利桑那州立大学、美国Maria Presa-Reyes、佛罗里达国际大学、美国施美玲、迈阿密大学、美国谢孟军、田纳西大学查塔努加分校、美国Mohan Baruwal、斯威本理工大学、澳大利亚Mortada Al-Banna、新南威尔士大学、澳大利亚Mounifah Alenazi、美国辛辛那提大学Mukesh Saini、印度罗帕尔理工学院、印度Nathalie Baracaldo、IBM阿尔马登研究中心、美国Nuray Baltaci、匹兹堡大学、美国Omair Shafiq、卡尔顿大学、加拿大Orhun Vural、阿拉巴马大学伯明翰分校、美国Raj Gaire、CSIRO、澳大利亚Ronald Doku、霍华德大学、美国Saad Sadiq、迈阿密大学、美国Sachin S Shetty、Old Dominion大学、美国Samira Pouyanfar、微软、美国Sandeep Reddivari，北佛罗里达大学，美国Shihong Huang，佛罗里达大西洋大学，美国Soumyanil Banerjee, Wayne州立大学，美国Taghi M. Khoshgoftaar，佛罗里达大西洋大学，美国Tanmay Bhowmik，密西西比州立大学，美国Tanvir Ahmed, Oracle，美国

{"title":"IRI 2020 Committees","authors":"Abdulhamid A. Adebayo","doi":"10.1109/iri49571.2020.00008","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00008","url":null,"abstract":"Abdulhamid Adebayo, IBM T.J. Watson Research Center, USA Abdulrhman M Alshareef, King AbdulAziz University, Saudi Arabia Anna Squicciarini, Pennsylvania State University, USA Arun Thapa, Tuskegee University, USA Balaji Palanisamy, University of Pittsburgh, USA Bharat Rawal, Pennsylvania State University, USA Caojin Zhang, Wayne State University, USA Chin-Wan Chung, Korea Advanced Institute of Science and Technology, South Korea Chongyang Shi, Beijing Institute of Technology, China Da Yan, University of Alabama at Birmingham, USA Dalei Wu, University of Tennessee at Chattanooga, USA Du Zhang, California State University, USA Elisa Bertino, Purdue University, USA Fei Zhao, University of Alabama at Birmingham, USA Feifei Zhang, Institute of Automation, Chinese Academy of Sciences, China Haiman Tian, Florida International University, USA Hao Wang, Louisiana State University, USA Hemanth Gudaparthi, University of Cincinnati, USA Hung T Nguyen, Carnegie Mellon University, USA Kayhan Ghafoor, Salahaddin University-Erbil, Iraq Kouichi Sakurai, Kyushu University, Japan Lidan Shou, Zhejiang University, China Ling Zhou, Jiangsu University, China Lixiao Huang, Arizona State University, USA Maria Presa-Reyes, Florida International University, USA Mei-Ling Shyu, University of Miami, USA Mengjun Xie, University of Tennessee at Chattanooga, USA Mohan Baruwal, Swinburne University of Technology, Australia Mortada Al-Banna, University of New South Wales, Australia Mounifah Alenazi, University of Cincinnati, USA Mukesh Saini, Indian Institute of Technology Ropar, India Nathalie Baracaldo, IBM Almaden Research Center, USA Nuray Baltaci, University of Pittsburgh, USA Omair Shafiq, Carleton University, Canada Orhun Vural, University of Alabama at Birmingham, USA Raj Gaire, CSIRO, Australia Ronald Doku, Howard University, USA Saad Sadiq, University of Miami, USA Sachin S Shetty, Old Dominion University, USA Samira Pouyanfar, Microsoft, USA Sandeep Reddivari, University of North Florida, USA Shihong Huang, Florida Atlantic University, USA Soumyanil Banerjee, Wayne State University, USA Taghi M. Khoshgoftaar, Florida Atlantic University, USA Tanmay Bhowmik, Mississippi State University, USA Tanvir Ahmed, Oracle, USA","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88822360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources 利用语料库和跨语资源开发孟加拉语情感词典

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00041

Salim Sazzed

Bengali, one of the most spoken languages, lacks tools and resources for sentiment analysis. To date, the Bengali language does not have any sentiment lexicon of its own; only the translated versions of English lexica are available. Therefore, in this work, we focus on developing a Bengali sentiment lexicon from a large Bengali review corpus utilizing a cross-lingual approach. To build the sentiment dictionary, we first created a Bengali corpus of around 42000 drama reviews; among them, we manually annotated around 12000 reviews. Utilizing a machine translation system, labeled and unlabeled Bengali review corpus, English sentiment lexica, pointwise mutual information (PMI), and supervised machine learning (ML) classifiers in different phases, we develop a Bengali sentiment lexicon of around 1000 sentiment words. We compare the coverage of our lexicon with the translated English lexica in two evaluation datasets. The proposed lexicon achieves 70%-74% coverage in document-level and around 65% coverage in word-level, which is approximately 30%-100% improvement over the translated lexica in word-level and 30%-50% in document-level. The results demonstrate that our developed lexicon is highly effective in recognizing sentiments in the Bengali text.

孟加拉语是使用人数最多的语言之一，缺乏情感分析的工具和资源。迄今为止，孟加拉语还没有自己的情感词汇;只有英文词典的翻译版本可用。因此，在这项工作中，我们专注于利用跨语言方法从大型孟加拉语评论语料库中开发孟加拉语情感词典。为了构建情感词典，我们首先创建了一个孟加拉语语料库，其中包含大约42000篇戏剧评论;其中，我们手工标注了大约12000条评论。利用机器翻译系统、标记和未标记的孟加拉语评论语料库、英语情感词典、点互信息(PMI)和不同阶段的监督机器学习(ML)分类器，我们开发了一个包含大约1000个情感词的孟加拉语情感词典。我们在两个评估数据集中比较了我们的词典与翻译的英语词典的覆盖率。本文提出的词典在文档级达到70%-74%的覆盖率，在词级达到65%左右的覆盖率，比翻译后的词典在词级和文档级分别提高了30%-100%和30%-50%。结果表明，我们开发的词典在孟加拉语文本情感识别方面是非常有效的。

{"title":"Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources","authors":"Salim Sazzed","doi":"10.1109/IRI49571.2020.00041","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00041","url":null,"abstract":"Bengali, one of the most spoken languages, lacks tools and resources for sentiment analysis. To date, the Bengali language does not have any sentiment lexicon of its own; only the translated versions of English lexica are available. Therefore, in this work, we focus on developing a Bengali sentiment lexicon from a large Bengali review corpus utilizing a cross-lingual approach. To build the sentiment dictionary, we first created a Bengali corpus of around 42000 drama reviews; among them, we manually annotated around 12000 reviews. Utilizing a machine translation system, labeled and unlabeled Bengali review corpus, English sentiment lexica, pointwise mutual information (PMI), and supervised machine learning (ML) classifiers in different phases, we develop a Bengali sentiment lexicon of around 1000 sentiment words. We compare the coverage of our lexicon with the translated English lexica in two evaluation datasets. The proposed lexicon achieves 70%-74% coverage in document-level and around 65% coverage in word-level, which is approximately 30%-100% improvement over the translated lexica in word-level and 30%-50% in document-level. The results demonstrate that our developed lexicon is highly effective in recognizing sentiments in the Bengali text.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"96 1","pages":"237-244"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73589858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12