2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献_第7页

Cross-Domain Helpfulness Prediction of Online Consumer Reviews by Deep Learning Model 基于深度学习模型的在线消费者评论的跨领域有用性预测

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00069

Shih-Hung Wu, Yi-Kun Chen

Customer reviews provide helpful information such as usage experiences or critiques; these are critical information resource for future customers. Since the amount of online review is getting bigger, people need a way to find the most helpful ones automatically. Previous studies addressed on the prediction of the percentage of the helpfulness voting results based on a regression model or classified them into a helpful or unhelpful classes. However, the voting result of an online review is not a constant over time, and we also find that there are many reviews getting zero vote. Therefore, we collect the voting results of the same online customer reviews over time, and observe the change of votes to find a better learning target. We collected a dataset with online reviews in five different product categories (“Apple”, “Video Game”, “Clothing, Shoes & Jewelry”, “Sports & Outdoors”, and “Prime Video”) from Amazon.com with the voting result on the helpfulness of the reviews, and monitor the helpfulness voting for six weeks. Experiments are conducted on the dataset to get a reasonable classification on the zero and non-zero vote reviews. We construct a classification system that can classify the online reviews via the deep learning model BERT. The results show that the classifier can get good result on the helpfulness prediction. We also test the classifier on cross-domain prediction and get promising results.

客户评论提供有用的信息，如使用体验或评论;这些都是未来客户的关键信息资源。由于在线评论的数量越来越大，人们需要一种方法来自动找到最有帮助的评论。以往的研究都是基于回归模型对有益投票结果的百分比进行预测，或者将其分为有益和无益两类。然而，在线评论的投票结果并不是随时间而恒定的，我们也发现有很多评论是零票。因此，我们收集同一在线客户评论在一段时间内的投票结果，并观察投票的变化，以找到更好的学习目标。我们从亚马逊网站上收集了五个不同产品类别(“苹果”、“视频游戏”、“服装、鞋子和珠宝”、“运动和户外”和“Prime视频”)的在线评论数据集，并对评论的有用性进行了投票，并对有用性投票进行了为期六周的监控。在数据集上进行实验，对零票评论和非零票评论进行合理分类。我们通过深度学习模型BERT构建了一个可以对在线评论进行分类的分类系统。结果表明，该分类器在有用性预测上取得了较好的效果。我们还对分类器进行了跨域预测测试，得到了令人满意的结果。

{"title":"Cross-Domain Helpfulness Prediction of Online Consumer Reviews by Deep Learning Model","authors":"Shih-Hung Wu, Yi-Kun Chen","doi":"10.1109/IRI49571.2020.00069","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00069","url":null,"abstract":"Customer reviews provide helpful information such as usage experiences or critiques; these are critical information resource for future customers. Since the amount of online review is getting bigger, people need a way to find the most helpful ones automatically. Previous studies addressed on the prediction of the percentage of the helpfulness voting results based on a regression model or classified them into a helpful or unhelpful classes. However, the voting result of an online review is not a constant over time, and we also find that there are many reviews getting zero vote. Therefore, we collect the voting results of the same online customer reviews over time, and observe the change of votes to find a better learning target. We collected a dataset with online reviews in five different product categories (“Apple”, “Video Game”, “Clothing, Shoes & Jewelry”, “Sports & Outdoors”, and “Prime Video”) from Amazon.com with the voting result on the helpfulness of the reviews, and monitor the helpfulness voting for six weeks. Experiments are conducted on the dataset to get a reasonable classification on the zero and non-zero vote reviews. We construct a classification system that can classify the online reviews via the deep learning model BERT. The results show that the classifier can get good result on the helpfulness prediction. We also test the classifier on cross-domain prediction and get promising results.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"43 1","pages":"412-418"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80942078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Fully Bayesian Learning of Multivariate Beta Mixture Models 多元Beta混合模型的全贝叶斯学习

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00025

Mahsa Amirkhani, Narges Manouchehri, N. Bouguila

Mixture models have been widely used as statistical learning paradigms in various unsupervised machine learning applications, where labeling a vast amount of data is impractical and costly. They have shown a significant success and convincing performance in many real-world problems such as medical applications, image clustering and anomaly detection. In this paper, we explore a fully Bayesian analysis of multivariate Beta mixture model and propose a solution for the problem of estimating parameters using Markov Chain Monte Carlo technique. We exploit Gibbs sampling within Metropolis-Hastings for Monte Carlo simulation. We also obtained prior distribution which is a conjugate for multivariate Beta. The performance of our proposed method is evaluated and compared with Bayesian Gaussian mixture model via challenging applications, including cell image categorization and network intrusion detection. Experimental results confirm that the proposed technique can provide an effective solution comparing to similar alternatives.

混合模型已被广泛用作各种无监督机器学习应用中的统计学习范式，在这些应用中，标记大量数据是不切实际且昂贵的。在医疗应用、图像聚类和异常检测等许多现实问题中，它们都取得了显著的成功和令人信服的表现。本文探讨了多元Beta混合模型的全贝叶斯分析，并提出了一种利用马尔可夫链蒙特卡罗技术估计参数问题的解决方案。我们利用吉布斯采样在大都会黑斯廷斯蒙特卡洛模拟。我们还得到了多元Beta的共轭先验分布。通过具有挑战性的应用，包括细胞图像分类和网络入侵检测，评估了我们提出的方法的性能，并与贝叶斯高斯混合模型进行了比较。实验结果表明，与同类方案相比，该方法是一种有效的解决方案。

{"title":"Fully Bayesian Learning of Multivariate Beta Mixture Models","authors":"Mahsa Amirkhani, Narges Manouchehri, N. Bouguila","doi":"10.1109/IRI49571.2020.00025","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00025","url":null,"abstract":"Mixture models have been widely used as statistical learning paradigms in various unsupervised machine learning applications, where labeling a vast amount of data is impractical and costly. They have shown a significant success and convincing performance in many real-world problems such as medical applications, image clustering and anomaly detection. In this paper, we explore a fully Bayesian analysis of multivariate Beta mixture model and propose a solution for the problem of estimating parameters using Markov Chain Monte Carlo technique. We exploit Gibbs sampling within Metropolis-Hastings for Monte Carlo simulation. We also obtained prior distribution which is a conjugate for multivariate Beta. The performance of our proposed method is evaluated and compared with Bayesian Gaussian mixture model via challenging applications, including cell image categorization and network intrusion detection. Experimental results confirm that the proposed technique can provide an effective solution comparing to similar alternatives.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"140 1","pages":"120-127"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74901130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Multi-Class Cardiovascular Diseases Diagnosis from Electrocardiogram Signals using 1-D Convolution Neural Network 基于一维卷积神经网络的心电图信号多类心血管疾病诊断

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00060

Mehdi Fasihi, M. Nadimi-Shahraki, A. Jannesari

The electrocardiogram (ECG) is an important signal in the health informatics for the detection of cardiac abnormalities. There have been several researches on using machine learning techniques for analyzing ECG. However, they need additional computation owning to ECG signals challenges. We introduce a new architecture of 1-D convolution neural network (CNN) to diagnose arrhythmia diseases automatically. The proposed architecture consists of four convolution layers, three pooling layers, and three fully connected layers evaluated on the arrhythmia dataset. All previous researches are conducted to classify healthy people from people with Arrhythmia disease. In this paper, we propose to go further multiclass classification with two classes of cardiac diseases and one class of healthy people. The results are compared with common 1-D CNN and seven different classifiers. The experimental results demonstrate that the proposed architecture is superior to existing classifiers and also competitive with state of the art in terms of accuracy.

心电图(ECG)是健康信息学中检测心脏异常的重要信号。利用机器学习技术进行心电分析已经有了一些研究。然而，由于心电信号的挑战，它们需要额外的计算。本文提出了一种新的一维卷积神经网络(CNN)结构，用于心律失常疾病的自动诊断。该架构由四个卷积层、三个池化层和三个在心律失常数据集上评估的全连接层组成。以往的研究都是将健康人与心律失常患者进行分类。在本文中，我们提出了进一步的多类分类，两类心脏疾病和一类健康的人。结果与常见的一维CNN和七种不同的分类器进行了比较。实验结果表明，所提出的分类器结构优于现有的分类器，并且在准确率方面具有一定的竞争力。

{"title":"Multi-Class Cardiovascular Diseases Diagnosis from Electrocardiogram Signals using 1-D Convolution Neural Network","authors":"Mehdi Fasihi, M. Nadimi-Shahraki, A. Jannesari","doi":"10.1109/IRI49571.2020.00060","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00060","url":null,"abstract":"The electrocardiogram (ECG) is an important signal in the health informatics for the detection of cardiac abnormalities. There have been several researches on using machine learning techniques for analyzing ECG. However, they need additional computation owning to ECG signals challenges. We introduce a new architecture of 1-D convolution neural network (CNN) to diagnose arrhythmia diseases automatically. The proposed architecture consists of four convolution layers, three pooling layers, and three fully connected layers evaluated on the arrhythmia dataset. All previous researches are conducted to classify healthy people from people with Arrhythmia disease. In this paper, we propose to go further multiclass classification with two classes of cardiac diseases and one class of healthy people. The results are compared with common 1-D CNN and seven different classifiers. The experimental results demonstrate that the proposed architecture is superior to existing classifiers and also competitive with state of the art in terms of accuracy.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"33 1","pages":"372-378"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73696595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Distributed Differentially Private Mutual Information Ranking and Its Applications 分布式差分私有互信息排序及其应用

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00021

Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma

Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.

互信息计算(MI)有助于理解一对随机变量之间共享的信息量。基于MI排名的自动特征选择技术通常用于从超过pb大小的敏感数据集中提取信息，超过数百万个特征和类。一系列一对一的MI计算可以级联产生n倍的MI结果，快速确定信息关系。这种从数十亿用户的数据集中快速确定最具信息量的关系的能力会引起隐私问题。在本文中，我们提出了分布式差分私有互信息(DDP-MI)，这是一种隐私安全的快速批处理MI，适用于各种场景，如特征选择、分割、排序和查询扩展。这种分布式实现采用全局模型差分隐私保护，以提供强大的保证，防止各种隐私攻击。我们还表明，与大规模公共数据集上的标准实现相比，我们的DDP-MI可以大大提高MI计算的效率。

{"title":"Distributed Differentially Private Mutual Information Ranking and Its Applications","authors":"Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma","doi":"10.1109/IRI49571.2020.00021","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00021","url":null,"abstract":"Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"4 1","pages":"90-96"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IRI 2020 TOC

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/iri49571.2020.00004

Rashmi Jha, David Kapp, Thuong Khanh Tran

引用次数: 0

KGdiff: Tracking the Evolution of Knowledge Graphs KGdiff:追踪知识图谱的演变

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00047

Abbas Keshavarzi, K. Kochut

A Knowledge Graph (KG) is a machine-readable, labeled graph-like representation of human knowledge. As the main goal of KG is to represent data by enriching it with computer-processable semantics, the knowledge graph creation usually involves acquiring data from external resources and datasets. In many domains, especially in biomedicine, the data sources continuously evolve, and KG engineers and domain experts must not only track the changes in KG entities and their interconnections but introduce changes to the KG schema and the graph population software. We present a framework to track the KG evolution both in terms of the schema and individuals. KGdiff is a software tool that incrementally collects the relevant meta-data information from a KG and compares it to a prior version the KG. The KG is represented in OWL/RDF/RDFS and the meta-data is collected using domain-independent queries. We evaluate our method on different RDF/OWL data sets (ontologies).

知识图(KG)是一种机器可读的、标记的、类似于图的人类知识表示形式。由于知识图谱的主要目标是用计算机可处理的语义来丰富数据，因此知识图谱的创建通常涉及从外部资源和数据集获取数据。在许多领域，特别是生物医学领域，数据源不断演变，KG工程师和领域专家不仅要跟踪KG实体及其相互关系的变化，还要向KG模式和图种群软件引入变化。我们提出了一个框架来跟踪KG在模式和个体方面的演变。KGdiff是一个软件工具，它增量地从KG收集相关的元数据信息，并将其与KG的先前版本进行比较。KG用OWL/RDF/RDFS表示，元数据使用与领域无关的查询收集。我们在不同的RDF/OWL数据集(本体)上评估我们的方法。

引用次数: 4

Data Driven Relational Constraint Programming 数据驱动的关系约束编程

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00030

Michael Valdron, K. Pu

We propose a data-driven constraint programming environment that merges the power of two separate domains: databases and SAT-solvers. While a database system offers flexible data models and query languages, SAT solvers offer the ability to satisfy logical constraints and optimization objectives. In this paper, we describe a goal-oriented declarative algebra that seamlessly integrates both worlds. Bring from proven practices in functional programming, we express constants, variables and constraints in a unified relational query language. The language is implemented on top of industrial strength database engines and SAT solvers.In order to support iterative constraint programming with debugging, we propose several debugging operators to assist with interactive constraint solving.

我们提出了一个数据驱动的约束编程环境，它融合了两个独立领域的能力:数据库和sat求解器。数据库系统提供灵活的数据模型和查询语言，而SAT求解器提供满足逻辑约束和优化目标的能力。在本文中，我们描述了一个面向目标的声明性代数，它无缝地集成了这两个世界。从函数式编程的成熟实践中，我们用统一的关系查询语言表达常量、变量和约束。该语言是在工业强度数据库引擎和SAT求解器的基础上实现的。为了支持带调试的迭代约束规划，我们提出了几个调试运算符来辅助交互式约束求解。

引用次数: 1

Combating Hard or Soft Disasters with Privacy-Preserving Federated Mobile Buses-and-Drones based Networks 用保护隐私的联邦移动巴士和无人机网络对抗硬或软灾难

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00013

Bo Ma, Jinsong Wu, William Liu, L. Chiaraviglio, Xing Ming

It is foreseeable the popularity of the mobile edge computing enabled infrastructure for wireless networks in the incoming fifth generation (5G) and future sixth generation (6G) wireless networks. Especially after a ‘hard’ disaster such as earthquakes or a ‘soft’ disaster such as COVID-19 pandemic, the existing telecommunication infrastructure, including wired and wireless networks, is often seriously compromised or with infectious disease risks and should-not-close-contact, thus cannot guarantee regular coverage and reliable communications services. These temporarily-missing communications capabilities are crucial to rescuers, health-carers, or affected or infected citizens as the responders need to effectively coordinate and communicate to minimize the loss of lives and property, where the 5G/6G mobile edge network helps. On the other hand, the federated machine learning (FML) methods have been newly developed to address the privacy leakage problems of the traditional machine learning held normally by one centralized organization, associated with the high risks of a single point of hacking. After detailing current state-of-the-art both in privacy-preserving, federated learning, and mobile edge communications networks for ‘hard’ and ‘soft’ disasters, we consider the main challenges that need to be faced. We envision a privacy-preserving federated learning enabled buses-and-drones based mobile edge infrastructure (ppFL-AidLife) for disaster or pandemic emergency communications. The ppFL-AidLife system aims at a rapidly deployable resilient network capable of supporting flexible, privacy-preserving and low-latency communications to serve large-scale disaster situations by utilizing the existing public transport networks, associated with drones to maximally extend their radio coverage to those hard-to-reach disasters or should-not-close-contact pandemic zones.

可以预见，在即将到来的第五代(5G)和未来的第六代(6G)无线网络中，支持移动边缘计算的无线网络基础设施将受到欢迎。特别是在地震等“硬”灾害或COVID-19大流行等“软”灾害发生后，现有的电信基础设施，包括有线和无线网络，往往受到严重损害或存在传染病风险，不应密切接触，因此无法保证定期覆盖和可靠的通信服务。这些暂时缺失的通信能力对救援人员、医护人员或受影响或感染的公民至关重要，因为应急人员需要有效地协调和沟通，以尽量减少生命和财产损失，5G/6G移动边缘网络在这方面提供了帮助。另一方面，联邦机器学习(FML)方法是为了解决传统机器学习的隐私泄露问题而新开发的，传统机器学习通常由一个集中式组织持有，与单点黑客攻击的高风险相关。在详细介绍了隐私保护、联邦学习和移动边缘通信网络在“硬”和“软”灾难方面的最新技术之后，我们考虑了需要面对的主要挑战。我们设想一种基于公共汽车和无人机的保护隐私的联邦学习移动边缘基础设施(ppFL-AidLife)，用于灾难或流行病紧急通信。ppFL-AidLife系统旨在建立一个快速部署的弹性网络，能够支持灵活、保护隐私和低延迟的通信，通过利用现有的公共交通网络，最大限度地将其无线电覆盖范围扩展到难以到达的灾难或不应密切接触的流行病地区，从而服务于大规模灾害情况。

{"title":"Combating Hard or Soft Disasters with Privacy-Preserving Federated Mobile Buses-and-Drones based Networks","authors":"Bo Ma, Jinsong Wu, William Liu, L. Chiaraviglio, Xing Ming","doi":"10.1109/IRI49571.2020.00013","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00013","url":null,"abstract":"It is foreseeable the popularity of the mobile edge computing enabled infrastructure for wireless networks in the incoming fifth generation (5G) and future sixth generation (6G) wireless networks. Especially after a ‘hard’ disaster such as earthquakes or a ‘soft’ disaster such as COVID-19 pandemic, the existing telecommunication infrastructure, including wired and wireless networks, is often seriously compromised or with infectious disease risks and should-not-close-contact, thus cannot guarantee regular coverage and reliable communications services. These temporarily-missing communications capabilities are crucial to rescuers, health-carers, or affected or infected citizens as the responders need to effectively coordinate and communicate to minimize the loss of lives and property, where the 5G/6G mobile edge network helps. On the other hand, the federated machine learning (FML) methods have been newly developed to address the privacy leakage problems of the traditional machine learning held normally by one centralized organization, associated with the high risks of a single point of hacking. After detailing current state-of-the-art both in privacy-preserving, federated learning, and mobile edge communications networks for ‘hard’ and ‘soft’ disasters, we consider the main challenges that need to be faced. We envision a privacy-preserving federated learning enabled buses-and-drones based mobile edge infrastructure (ppFL-AidLife) for disaster or pandemic emergency communications. The ppFL-AidLife system aims at a rapidly deployable resilient network capable of supporting flexible, privacy-preserving and low-latency communications to serve large-scale disaster situations by utilizing the existing public transport networks, associated with drones to maximally extend their radio coverage to those hard-to-reach disasters or should-not-close-contact pandemic zones.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"158 1","pages":"31-36"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86730072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

IRI 2020 Breaker Page IRI 2020断路器页面

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/iri49571.2020.00003

引用次数: 0

Semantic Embeddings for Medical Providers and Fraud Detection 医疗服务提供者的语义嵌入和欺诈检测

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...

Pub Date : 2020-08-01 DOI: 10.1109/IRI49571.2020.00039

Justin M. Johnson, T. Khoshgoftaar

A medical provider’s specialty is a significant predictor for detecting fraudulent providers with machine learning algorithms. When the specialty variable is encoded using a one-hot representation, however, models are subjected to sparse and uninformative feature vectors. We explore three techniques for representing medical provider types with dense, semantic embeddings that capture specialty similarities. The first two methods (GloVe and Med-Word2Vec) use pre-trained word embeddings to convert provider specialty descriptions to short phrase embeddings. Next, we propose a method for constructing semantic provider type embeddings from the procedure-level activity within each specialty group. For each embedding technique, we use Principal Component Analysis to compare the performance of embedding sizes between 32-128. Each embedding technique is evaluated on a highly imbalanced Medicare fraud prediction task using Logistic Regression (LR), Random Forest (RF), Gradient Boosted Tree (GBT), and Multilayer Perceptron (MLP) learners. Experiments are repeated 30 times and confidence intervals show that all three semantic embeddings significantly outperform one-hot representations when using RF and GBT learners. Our contributions include a novel method for embedding medical specialties from procedure codes and a comparison of three semantic embedding techniques for Medicare fraud detection.

医疗服务提供者的专业是使用机器学习算法检测欺诈性提供者的重要预测因素。然而，当使用单热表示对专业变量进行编码时，模型受到稀疏和无信息的特征向量的影响。我们探索了三种技术，用密集的语义嵌入来表示医疗提供者类型，以捕获专业相似性。前两种方法(GloVe和Med-Word2Vec)使用预训练的词嵌入将提供者专业描述转换为短短语嵌入。接下来，我们提出了一种从每个专业组内的过程级活动构造语义提供者类型嵌入的方法。对于每种嵌入技术，我们使用主成分分析来比较32-128之间嵌入尺寸的性能。每个嵌入技术在高度不平衡的医疗保险欺诈预测任务上进行评估，使用逻辑回归(LR)、随机森林(RF)、梯度提升树(GBT)和多层感知器(MLP)学习器。实验重复了30次，置信区间表明，当使用RF和GBT学习器时，所有三种语义嵌入都明显优于单热表示。我们的贡献包括一种从程序代码中嵌入医学专业的新方法，以及三种用于医疗保险欺诈检测的语义嵌入技术的比较。

{"title":"Semantic Embeddings for Medical Providers and Fraud Detection","authors":"Justin M. Johnson, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00039","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00039","url":null,"abstract":"A medical provider’s specialty is a significant predictor for detecting fraudulent providers with machine learning algorithms. When the specialty variable is encoded using a one-hot representation, however, models are subjected to sparse and uninformative feature vectors. We explore three techniques for representing medical provider types with dense, semantic embeddings that capture specialty similarities. The first two methods (GloVe and Med-Word2Vec) use pre-trained word embeddings to convert provider specialty descriptions to short phrase embeddings. Next, we propose a method for constructing semantic provider type embeddings from the procedure-level activity within each specialty group. For each embedding technique, we use Principal Component Analysis to compare the performance of embedding sizes between 32-128. Each embedding technique is evaluated on a highly imbalanced Medicare fraud prediction task using Logistic Regression (LR), Random Forest (RF), Gradient Boosted Tree (GBT), and Multilayer Perceptron (MLP) learners. Experiments are repeated 30 times and confidence intervals show that all three semantic embeddings significantly outperform one-hot representations when using RF and GBT learners. Our contributions include a novel method for embedding medical specialties from procedure codes and a comparison of three semantic embedding techniques for Medicare fraud detection.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"224-230"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84713156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5