2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献_第6页

An Adaptive and Dynamic Biosensor Epidemic Model for COVID-19 新型冠状病毒肺炎自适应动态生物传感器流行模型

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00051

Salvador V. Balkus, Joshua Rumbut, Honggang Wang, Hua Fang

The impact of the COVID-19 global pandemic has required governments across the world to develop effective public health policies using epidemiological models. Unfortunately, as a result of limited testing ability, these models often rely on lagged rather than real-time data, and cannot be adapted to small geographies to provide localized forecasts. This study proposes ADBio, a multi-level adaptive and dynamic biosensor-based model that can be used to predict the risk of infection with COVID-19 from the individual level to the county level, providing more timely and accurate estimates of virus exposure at all levels. The model is evaluated using diagnosis simulation based on current COVID-19 cases as well as GPS movement data for Massachusetts and New York, where COVID-19 hotspots had previously been observed. Results demonstrate that lagged testing data is indeed a major detriment to current modeling efforts, and that unlike the standard SEIR model, ADBio is able to adapt to arbitrarily small geographic regions and provide reasonable forecasts of COVID-19 cases. The features of this model enable greater national pandemic preparedness and provide local town and county governments a valuable tool for decision-making during a pandemic.

COVID-19全球大流行的影响要求世界各国政府利用流行病学模型制定有效的公共卫生政策。不幸的是，由于测试能力有限，这些模型往往依赖滞后数据而不是实时数据，不能适应小区域以提供局部预测。本研究提出了基于生物传感器的多层次自适应动态ADBio模型，该模型可用于从个体到县域的COVID-19感染风险预测，为各级病毒暴露提供更及时、准确的估计。该模型基于当前COVID-19病例以及马萨诸塞州和纽约州的GPS移动数据进行诊断模拟，这两个地区此前曾观察到COVID-19热点。结果表明，滞后的测试数据确实是当前建模工作的主要损害，与标准的SEIR模型不同，ADBio能够适应任意小的地理区域，并提供合理的COVID-19病例预测。这一模式的特点有助于加强国家大流行防范，并为地方镇县政府在大流行期间提供宝贵的决策工具。

{"title":"An Adaptive and Dynamic Biosensor Epidemic Model for COVID-19","authors":"Salvador V. Balkus, Joshua Rumbut, Honggang Wang, Hua Fang","doi":"10.1109/IRI49571.2020.00051","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00051","url":null,"abstract":"The impact of the COVID-19 global pandemic has required governments across the world to develop effective public health policies using epidemiological models. Unfortunately, as a result of limited testing ability, these models often rely on lagged rather than real-time data, and cannot be adapted to small geographies to provide localized forecasts. This study proposes ADBio, a multi-level adaptive and dynamic biosensor-based model that can be used to predict the risk of infection with COVID-19 from the individual level to the county level, providing more timely and accurate estimates of virus exposure at all levels. The model is evaluated using diagnosis simulation based on current COVID-19 cases as well as GPS movement data for Massachusetts and New York, where COVID-19 hotspots had previously been observed. Results demonstrate that lagged testing data is indeed a major detriment to current modeling efforts, and that unlike the standard SEIR model, ADBio is able to adapt to arbitrarily small geographic regions and provide reasonable forecasts of COVID-19 cases. The features of this model enable greater national pandemic preparedness and provide local town and county governments a valuable tool for decision-making during a pandemic.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"174 1","pages":"306-313"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72636099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Background Subtraction with a Hierarchical Pitman-Yor Process Mixture Model of Generalized Gaussian Distributions 广义高斯分布的分层Pitman-Yor过程混合模型的背景减法

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00024

Srikanth Amudala, Samr Ali, N. Bouguila

This paper presents hierarchical Pitman-Yor process mixture of generalized Gaussian distributions for background subtraction. The motivation behind choosing generalized Gaussian distribution is its flexibility as compared to the widely used Gaussian. We also integrate the Pitman-Yor process into our proposed model for an infinite extension that leads to better performance in the task of background subtraction. Our model is learned via a variational Bayes approach and is applied on the challenging Change Detection dataset. Experimental results on background subtraction show the effectiveness of the proposed algorithm.

本文提出了一种基于分层Pitman-Yor混合过程的广义高斯分布背景减法。选择广义高斯分布的动机是与广泛使用的高斯分布相比，它的灵活性。我们还将Pitman-Yor过程集成到我们提出的模型中，以实现无限扩展，从而在背景减法任务中获得更好的性能。我们的模型是通过变分贝叶斯方法学习的，并应用于具有挑战性的变化检测数据集。背景减法的实验结果表明了该算法的有效性。

引用次数: 0

AD4ML: Axiomatic Design to Specify Machine Learning Solutions for Manufacturing AD4ML:为制造业指定机器学习解决方案的公理设计

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00029

Alejandro Gabriel Villanueva Zacarias, Rachaa Ghabri, P. Reimann

Machine learning is increasingly adopted in manufacturing use cases, e.g., for fault detection in a production line. Each new use case requires developing its own machine learning (ML) solution. A ML solution integrates different software components to read, process, and analyze all use case data, as well as to finally generate the output that domain experts need for their decision-making. The process to design a system specification for a ML solution is not straight-forward. It entails two types of complexity: (1) The technical complexity of selecting combinations of ML algorithms and software components that suit a use case; (2) the organizational complexity of integrating different requirements from a multidisciplinary team of, e.g., domain experts, data scientists, and IT specialists. In this paper, we propose several adaptations to Axiomatic Design in order to design ML solution specifications that handle these complexities. We call this Axiomatic Design for Machine Learning (AD4ML). We apply AD4ML to specify a ML solution for a fault detection use case and discuss to what extent our approach conquers the above-mentioned complexities. We also discuss how AD4ML facilitates the agile design of ML solutions.

机器学习越来越多地应用于制造用例中，例如用于生产线中的故障检测。每个新的用例都需要开发自己的机器学习(ML)解决方案。ML解决方案集成了不同的软件组件来读取、处理和分析所有用例数据，并最终生成领域专家决策所需的输出。为ML解决方案设计系统规范的过程并不是直截了当的。它包含两种类型的复杂性:(1)选择适合用例的ML算法和软件组件组合的技术复杂性;(2)整合来自多学科团队(如领域专家、数据科学家和IT专家)的不同需求的组织复杂性。在本文中，我们提出了对公理设计的一些调整，以便设计处理这些复杂性的ML解决方案规范。我们称之为机器学习公理设计(AD4ML)。我们应用AD4ML为故障检测用例指定ML解决方案，并讨论我们的方法在多大程度上克服了上述复杂性。我们还讨论了AD4ML如何促进ML解决方案的敏捷设计。

{"title":"AD4ML: Axiomatic Design to Specify Machine Learning Solutions for Manufacturing","authors":"Alejandro Gabriel Villanueva Zacarias, Rachaa Ghabri, P. Reimann","doi":"10.1109/IRI49571.2020.00029","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00029","url":null,"abstract":"Machine learning is increasingly adopted in manufacturing use cases, e.g., for fault detection in a production line. Each new use case requires developing its own machine learning (ML) solution. A ML solution integrates different software components to read, process, and analyze all use case data, as well as to finally generate the output that domain experts need for their decision-making. The process to design a system specification for a ML solution is not straight-forward. It entails two types of complexity: (1) The technical complexity of selecting combinations of ML algorithms and software components that suit a use case; (2) the organizational complexity of integrating different requirements from a multidisciplinary team of, e.g., domain experts, data scientists, and IT specialists. In this paper, we propose several adaptations to Axiomatic Design in order to design ML solution specifications that handle these complexities. We call this Axiomatic Design for Machine Learning (AD4ML). We apply AD4ML to specify a ML solution for a fault detection use case and discuss to what extent our approach conquers the above-mentioned complexities. We also discuss how AD4ML facilitates the agile design of ML solutions.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"128 1","pages":"148-155"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80984663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Distribution-based Regression for Real-time COVID-19 Cases Detection from Chest X-ray and CT Images 基于分布的回归方法在胸部x线和CT图像中实时检测COVID-19病例

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00023

Nuha Zamzami, Pantea Koochemeshkian, N. Bouguila

The novel coronavirus (COVID-19) that started last December in Wuhan, Hubei Province, China has become a serious healthcare threat with over five million confirmed cases in 215 countries around the world as on May 20. The World Health Organization recommends a rapid diagnosis and immediate isolation of suspected cases. Thus, there is an imminent need to develop an automatic real-time detection system as a quick alternative diagnosis option to control the virus spread. In this work, we propose a regression model based on a flexible distribution called shifted-scaled Dirichlet for real-time detection of coronavirus pneumonia infected patient using chest X-ray radiographs. To derive the parameters of our proposed model, we adopt the maximum likelihood method, where we update the parameters based on the stochastic gradient descent. The experimental results demonstrate that our approach is highly effective for detecting COVID-19 cases and understand the infection on a real-time basis with high accuracy up to 97%.

去年12月在中国湖北省武汉市爆发的新型冠状病毒感染症(COVID-19)，截至5月20日，在全球215个国家确诊病例超过500万例，已成为严重的医疗威胁。世界卫生组织建议迅速诊断并立即隔离疑似病例。因此，迫切需要开发一种自动实时检测系统，作为控制病毒传播的快速替代诊断选择。在这项工作中，我们提出了一种基于移位尺度Dirichlet灵活分布的回归模型，用于胸部x线片实时检测冠状病毒肺炎感染者。为了得到我们所提出的模型的参数，我们采用了极大似然方法，其中我们基于随机梯度下降更新参数。实验结果表明，该方法对检测COVID-19病例非常有效，实时了解感染情况，准确率高达97%。

{"title":"A Distribution-based Regression for Real-time COVID-19 Cases Detection from Chest X-ray and CT Images","authors":"Nuha Zamzami, Pantea Koochemeshkian, N. Bouguila","doi":"10.1109/IRI49571.2020.00023","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00023","url":null,"abstract":"The novel coronavirus (COVID-19) that started last December in Wuhan, Hubei Province, China has become a serious healthcare threat with over five million confirmed cases in 215 countries around the world as on May 20. The World Health Organization recommends a rapid diagnosis and immediate isolation of suspected cases. Thus, there is an imminent need to develop an automatic real-time detection system as a quick alternative diagnosis option to control the virus spread. In this work, we propose a regression model based on a flexible distribution called shifted-scaled Dirichlet for real-time detection of coronavirus pneumonia infected patient using chest X-ray radiographs. To derive the parameters of our proposed model, we adopt the maximum likelihood method, where we update the parameters based on the stochastic gradient descent. The experimental results demonstrate that our approach is highly effective for detecting COVID-19 cases and understand the infection on a real-time basis with high accuracy up to 97%.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"1 1","pages":"104-111"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79913622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Using Deep Learning To Assign Rheumatoid Arthritis Scores 使用深度学习分配类风湿关节炎评分

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00065

S. Dang, L. Allison

In this work, we report the performance of the deep learning model in automatically assigning joint scores and overall patients scores for Rheumatoid Arthritis patients’ X-ray images. The dataset is from RA2 DREAM Challenge https://www.synapse.org/#!Synapse:syn20545111/wiki/594083. Overall, we achieve good predictive performance with an average accuracy of 0.908.

在这项工作中，我们报告了深度学习模型在自动分配类风湿关节炎患者x射线图像的关节评分和总体患者评分方面的性能。数据集来自RA2 DREAM Challenge https://www.synapse.org/#!Synapse:syn20545111/wiki/594083。总体而言，我们获得了良好的预测性能，平均准确率为0.908。

引用次数: 3

Fully Bayesian Learning of Multivariate Beta Mixture Models 多元Beta混合模型的全贝叶斯学习

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00025

Mahsa Amirkhani, Narges Manouchehri, N. Bouguila

Mixture models have been widely used as statistical learning paradigms in various unsupervised machine learning applications, where labeling a vast amount of data is impractical and costly. They have shown a significant success and convincing performance in many real-world problems such as medical applications, image clustering and anomaly detection. In this paper, we explore a fully Bayesian analysis of multivariate Beta mixture model and propose a solution for the problem of estimating parameters using Markov Chain Monte Carlo technique. We exploit Gibbs sampling within Metropolis-Hastings for Monte Carlo simulation. We also obtained prior distribution which is a conjugate for multivariate Beta. The performance of our proposed method is evaluated and compared with Bayesian Gaussian mixture model via challenging applications, including cell image categorization and network intrusion detection. Experimental results confirm that the proposed technique can provide an effective solution comparing to similar alternatives.

混合模型已被广泛用作各种无监督机器学习应用中的统计学习范式，在这些应用中，标记大量数据是不切实际且昂贵的。在医疗应用、图像聚类和异常检测等许多现实问题中，它们都取得了显著的成功和令人信服的表现。本文探讨了多元Beta混合模型的全贝叶斯分析，并提出了一种利用马尔可夫链蒙特卡罗技术估计参数问题的解决方案。我们利用吉布斯采样在大都会黑斯廷斯蒙特卡洛模拟。我们还得到了多元Beta的共轭先验分布。通过具有挑战性的应用，包括细胞图像分类和网络入侵检测，评估了我们提出的方法的性能，并与贝叶斯高斯混合模型进行了比较。实验结果表明，与同类方案相比，该方法是一种有效的解决方案。

{"title":"Fully Bayesian Learning of Multivariate Beta Mixture Models","authors":"Mahsa Amirkhani, Narges Manouchehri, N. Bouguila","doi":"10.1109/IRI49571.2020.00025","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00025","url":null,"abstract":"Mixture models have been widely used as statistical learning paradigms in various unsupervised machine learning applications, where labeling a vast amount of data is impractical and costly. They have shown a significant success and convincing performance in many real-world problems such as medical applications, image clustering and anomaly detection. In this paper, we explore a fully Bayesian analysis of multivariate Beta mixture model and propose a solution for the problem of estimating parameters using Markov Chain Monte Carlo technique. We exploit Gibbs sampling within Metropolis-Hastings for Monte Carlo simulation. We also obtained prior distribution which is a conjugate for multivariate Beta. The performance of our proposed method is evaluated and compared with Bayesian Gaussian mixture model via challenging applications, including cell image categorization and network intrusion detection. Experimental results confirm that the proposed technique can provide an effective solution comparing to similar alternatives.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"140 1","pages":"120-127"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74901130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Attention-Guided Generative Adversarial Network to Address Atypical Anatomy in Synthetic CT Generation. 注意引导生成对抗网络在合成CT生成中解决非典型解剖问题。

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 Epub Date: 2020-09-10 DOI: 10.1109/iri49571.2020.00034

Hajar Emami, Ming Dong, Carri K Glide-Hurst

Recently, interest in MR-only treatment planning using synthetic CTs (synCTs) has grown rapidly in radiation therapy. However, developing class solutions for medical images that contain atypical anatomy remains a major limitation. In this paper, we propose a novel spatial attention-guided generative adversarial network (attention-GAN) model to generate accurate synCTs using T1-weighted MRI images as the input to address atypical anatomy. Experimental results on fifteen brain cancer patients show that attention-GAN outperformed existing synCT models and achieved an average MAE of 85.223±12.08, 232.41±60.86, 246.38±42.67 Hounsfield units between synCT and CT-SIM across the entire head, bone and air regions, respectively. Qualitative analysis shows that attention-GAN has the ability to use spatially focused areas to better handle outliers, areas with complex anatomy or post-surgical regions, and thus offer strong potential for supporting near real-time MR-only treatment planning.

最近，在放射治疗中，对使用合成ct (synct)的MR-only治疗计划的兴趣迅速增长。然而，开发类解决方案的医学图像，包含非典型解剖仍然是一个主要的限制。在本文中，我们提出了一种新的空间注意引导生成对抗网络(attention-GAN)模型，该模型使用t1加权MRI图像作为输入来生成准确的同步ct，以解决非典型解剖问题。15例脑癌患者的实验结果表明，注意- gan优于现有的synCT模型，synCT与CT-SIM在整个头部、骨骼和空气区域的平均MAE分别为85.223±12.08、232.41±60.86、246.38±42.67 Hounsfield单位。定性分析表明，注意力gan具有利用空间聚焦区域更好地处理异常值、复杂解剖区域或术后区域的能力，因此为支持近实时的仅磁共振治疗计划提供了强大的潜力。

{"title":"Attention-Guided Generative Adversarial Network to Address Atypical Anatomy in Synthetic CT Generation.","authors":"Hajar Emami, Ming Dong, Carri K Glide-Hurst","doi":"10.1109/iri49571.2020.00034","DOIUrl":"10.1109/iri49571.2020.00034","url":null,"abstract":"<p><p>Recently, interest in MR-only treatment planning using synthetic CTs (synCTs) has grown rapidly in radiation therapy. However, developing class solutions for medical images that contain atypical anatomy remains a major limitation. In this paper, we propose a novel spatial attention-guided generative adversarial network (attention-GAN) model to generate accurate synCTs using T1-weighted MRI images as the input to address atypical anatomy. Experimental results on fifteen brain cancer patients show that attention-GAN outperformed existing synCT models and achieved an average MAE of 85.223±12.08, 232.41±60.86, 246.38±42.67 Hounsfield units between synCT and CT-SIM across the entire head, bone and air regions, respectively. Qualitative analysis shows that attention-GAN has the ability to use spatially focused areas to better handle outliers, areas with complex anatomy or post-surgical regions, and thus offer strong potential for supporting near real-time MR-only treatment planning.</p>","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"2020 ","pages":"188-193"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/iri49571.2020.00034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38999271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Cross-Domain Helpfulness Prediction of Online Consumer Reviews by Deep Learning Model 基于深度学习模型的在线消费者评论的跨领域有用性预测

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00069

Shih-Hung Wu, Yi-Kun Chen

Customer reviews provide helpful information such as usage experiences or critiques; these are critical information resource for future customers. Since the amount of online review is getting bigger, people need a way to find the most helpful ones automatically. Previous studies addressed on the prediction of the percentage of the helpfulness voting results based on a regression model or classified them into a helpful or unhelpful classes. However, the voting result of an online review is not a constant over time, and we also find that there are many reviews getting zero vote. Therefore, we collect the voting results of the same online customer reviews over time, and observe the change of votes to find a better learning target. We collected a dataset with online reviews in five different product categories (“Apple”, “Video Game”, “Clothing, Shoes & Jewelry”, “Sports & Outdoors”, and “Prime Video”) from Amazon.com with the voting result on the helpfulness of the reviews, and monitor the helpfulness voting for six weeks. Experiments are conducted on the dataset to get a reasonable classification on the zero and non-zero vote reviews. We construct a classification system that can classify the online reviews via the deep learning model BERT. The results show that the classifier can get good result on the helpfulness prediction. We also test the classifier on cross-domain prediction and get promising results.

客户评论提供有用的信息，如使用体验或评论;这些都是未来客户的关键信息资源。由于在线评论的数量越来越大，人们需要一种方法来自动找到最有帮助的评论。以往的研究都是基于回归模型对有益投票结果的百分比进行预测，或者将其分为有益和无益两类。然而，在线评论的投票结果并不是随时间而恒定的，我们也发现有很多评论是零票。因此，我们收集同一在线客户评论在一段时间内的投票结果，并观察投票的变化，以找到更好的学习目标。我们从亚马逊网站上收集了五个不同产品类别(“苹果”、“视频游戏”、“服装、鞋子和珠宝”、“运动和户外”和“Prime视频”)的在线评论数据集，并对评论的有用性进行了投票，并对有用性投票进行了为期六周的监控。在数据集上进行实验，对零票评论和非零票评论进行合理分类。我们通过深度学习模型BERT构建了一个可以对在线评论进行分类的分类系统。结果表明，该分类器在有用性预测上取得了较好的效果。我们还对分类器进行了跨域预测测试，得到了令人满意的结果。

{"title":"Cross-Domain Helpfulness Prediction of Online Consumer Reviews by Deep Learning Model","authors":"Shih-Hung Wu, Yi-Kun Chen","doi":"10.1109/IRI49571.2020.00069","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00069","url":null,"abstract":"Customer reviews provide helpful information such as usage experiences or critiques; these are critical information resource for future customers. Since the amount of online review is getting bigger, people need a way to find the most helpful ones automatically. Previous studies addressed on the prediction of the percentage of the helpfulness voting results based on a regression model or classified them into a helpful or unhelpful classes. However, the voting result of an online review is not a constant over time, and we also find that there are many reviews getting zero vote. Therefore, we collect the voting results of the same online customer reviews over time, and observe the change of votes to find a better learning target. We collected a dataset with online reviews in five different product categories (“Apple”, “Video Game”, “Clothing, Shoes & Jewelry”, “Sports & Outdoors”, and “Prime Video”) from Amazon.com with the voting result on the helpfulness of the reviews, and monitor the helpfulness voting for six weeks. Experiments are conducted on the dataset to get a reasonable classification on the zero and non-zero vote reviews. We construct a classification system that can classify the online reviews via the deep learning model BERT. The results show that the classifier can get good result on the helpfulness prediction. We also test the classifier on cross-domain prediction and get promising results.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"43 1","pages":"412-418"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80942078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Relevance of Grapheme’s Shape Complexity in Writer Verification Task 书写者验证任务中字形复杂度的相关性

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00016

A. Bensefia, Chawki Djeddi

Recognizing and identifying people, based on their physical and behavioral characteristics, have always had a wide range of applications, inciting researchers to propose dedicated human recognition systems for each human characteristic. These systems operate according to two different modes: identification mode, where the task is to assign one of the preregistered identities in the system to the human’s sample read as input. The second mode is the verification (authentication), is a decision task stating if a human’s sample read as input belongs really to the claimed identity. Handwriting has emerged as one of these behavioral features that attracted a lot of interests during the last decade. Many writer identification systems have been developed comparing to writer verification (authentication) systems. In this paper we propose an original approach based on the usage of the shape complexity to authenticate writers’ identities. To this end, a local feature (grapheme) is considered, where the graphemes are generated automatically with a dedicated segmentation module. The Fourier Elliptic Transform was used to measure the shape complexity of the resulting graphemes. Only the top complex graphemes (K-Graphemes) were used to measure the similarity between a pair of handwritten samples. The approach was evaluated with 3 sets of 50 different writers of the BFL dataset, where we obtained a performance of almost 80% of good acceptance at 8% error rate. These results validate completely the relevance of the shape complexity in writer recognition tasks.

基于人的身体和行为特征来识别和识别人，一直有着广泛的应用，这促使研究人员为每个人的特征提出专门的人类识别系统。这些系统根据两种不同的模式运行:识别模式，其中任务是将系统中预注册的身份之一分配给人类的样本读取作为输入。第二种模式是验证(authentication)，这是一项决策任务，说明作为输入读取的人类样本是否真正属于所声称的身份。在过去的十年里，书写已经成为这些行为特征之一，吸引了很多人的兴趣。与编写器验证(身份验证)系统相比，已经开发了许多编写器识别系统。本文提出了一种基于形状复杂度的作者身份认证方法。为此，考虑了一个局部特征(字素)，其中字素是用专用的分割模块自动生成的。傅里叶椭圆变换用于测量所得到的石墨烯的形状复杂度。仅使用顶部复杂石墨烯(k -石墨烯)来测量一对手写样本之间的相似性。我们用BFL数据集的3组50个不同的作者对该方法进行了评估，我们在8%的错误率下获得了几乎80%的良好接受度。这些结果完全验证了形状复杂性在写作者识别任务中的相关性。

{"title":"Relevance of Grapheme’s Shape Complexity in Writer Verification Task","authors":"A. Bensefia, Chawki Djeddi","doi":"10.1109/IRI49571.2020.00016","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00016","url":null,"abstract":"Recognizing and identifying people, based on their physical and behavioral characteristics, have always had a wide range of applications, inciting researchers to propose dedicated human recognition systems for each human characteristic. These systems operate according to two different modes: identification mode, where the task is to assign one of the preregistered identities in the system to the human’s sample read as input. The second mode is the verification (authentication), is a decision task stating if a human’s sample read as input belongs really to the claimed identity. Handwriting has emerged as one of these behavioral features that attracted a lot of interests during the last decade. Many writer identification systems have been developed comparing to writer verification (authentication) systems. In this paper we propose an original approach based on the usage of the shape complexity to authenticate writers’ identities. To this end, a local feature (grapheme) is considered, where the graphemes are generated automatically with a dedicated segmentation module. The Fourier Elliptic Transform was used to measure the shape complexity of the resulting graphemes. Only the top complex graphemes (K-Graphemes) were used to measure the similarity between a pair of handwritten samples. The approach was evaluated with 3 sets of 50 different writers of the BFL dataset, where we obtained a performance of almost 80% of good acceptance at 8% error rate. These results validate completely the relevance of the shape complexity in writer recognition tasks.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"36 1","pages":"53-58"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88360860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An I/O Request Packet (IRP) Driven Effective Ransomware Detection Scheme using Artificial Neural Network 一种基于人工神经网络的I/O请求包驱动的有效勒索软件检测方案

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00053

Md. Ahsan Ayub, Andrea Continella, Ambareen Siraj

In recent times, there has been a global surge of ransomware attacks targeted at industries of various types and sizes from retail to critical infrastructure. Ransomware researchers are constantly coming across new kinds of ransomware samples every day and discovering novel ransomware families out in the wild. To mitigate this ever-growing menace, academia and industry-based security researchers have been utilizing unique ways to defend against this type of cyber-attacks. I/O Request Packet (IRP), a low-level file system I/O log, is a newly found research paradigm for defense against ransomware that is being explored frequently. As such in this study, to learn granular level, actionable insights of ransomware behavior, we analyze the IRP logs of 272 ransomware samples belonging to 18 different ransomware families captured during individual execution. We further our analysis by building an effective Artificial Neural Network (ANN) structure for successful ransomware detection by learning the underlying patterns of the IRP logs. We evaluate the ANN model with three different experimental settings to prove the effectiveness of our approach. The model demonstrates outstanding performance in terms of accuracy, precision score, recall score, and F1 score, i.e., in the range of 99.7%±0.2%.

最近，全球范围内针对从零售到关键基础设施等各种类型和规模的行业的勒索软件攻击激增。勒索软件研究人员每天都在不断地遇到新的勒索软件样本，并在野外发现新的勒索软件家族。为了减轻这种日益增长的威胁，学术界和行业安全研究人员一直在利用独特的方法来防御这种类型的网络攻击。I/O请求包(IRP)是一种低级文件系统I/O日志，是一种新发现的用于防御勒索软件的研究范式，正在被频繁探索。因此，在本研究中，为了了解勒索软件行为的颗粒级，可操作的见解，我们分析了在单个执行期间捕获的属于18个不同勒索软件家族的272个勒索软件样本的IRP日志。通过学习IRP日志的底层模式，我们构建了一个有效的人工神经网络(ANN)结构，用于成功检测勒索软件，从而进一步进行了分析。我们用三种不同的实验设置来评估人工神经网络模型，以证明我们方法的有效性。该模型在准确率、精密度评分、召回率评分和F1评分方面表现优异，均在99.7%±0.2%的范围内。

{"title":"An I/O Request Packet (IRP) Driven Effective Ransomware Detection Scheme using Artificial Neural Network","authors":"Md. Ahsan Ayub, Andrea Continella, Ambareen Siraj","doi":"10.1109/IRI49571.2020.00053","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00053","url":null,"abstract":"In recent times, there has been a global surge of ransomware attacks targeted at industries of various types and sizes from retail to critical infrastructure. Ransomware researchers are constantly coming across new kinds of ransomware samples every day and discovering novel ransomware families out in the wild. To mitigate this ever-growing menace, academia and industry-based security researchers have been utilizing unique ways to defend against this type of cyber-attacks. I/O Request Packet (IRP), a low-level file system I/O log, is a newly found research paradigm for defense against ransomware that is being explored frequently. As such in this study, to learn granular level, actionable insights of ransomware behavior, we analyze the IRP logs of 272 ransomware samples belonging to 18 different ransomware families captured during individual execution. We further our analysis by building an effective Artificial Neural Network (ANN) structure for successful ransomware detection by learning the underlying patterns of the IRP logs. We evaluate the ANN model with three different experimental settings to prove the effectiveness of our approach. The model demonstrates outstanding performance in terms of accuracy, precision score, recall score, and F1 score, i.e., in the range of 99.7%±0.2%.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"14 1","pages":"319-324"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84806798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11