首页 > 最新文献

2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献

英文 中文
Towards Agile Integration: Specification-based Data Alignment 迈向敏捷集成:基于规范的数据对齐
C. Giossi, D. Maier, K. Tufte, Elliot Gall, M. Barnes
Utilizing data sets from multiple domains is a common procedure in scientific research. For example, research on the performance of buildings may require data from multiple sources that lack a singular standard for data reporting. The Building Management System might report data at regular 5minute intervals, whereas an air-quality sensor might capture values only when there has been significant change from the previous value. Many systems exist to help integrate multiple data sources into a single system or interface. However, such systems do not necessarily make it easy to modify an integration plan, for example, to accommodate data exploration, new and changing data sets or shifts in the questions of interest. We propose an agile data-integration system to enable quick and adaptive analysis across many data sets, concentrating initially on the data alignment step: combining data values from multiple time-series based data sets whose time schedules. To this end, we adopt a Domain Specific Language approach where we construct a domain model for alignment, provide a specification language for describing alignments in the model and implement an interpreter for specification in that language. Our implementation exploits a rank-based join in SQL that produces faster alignment times than the commonly suggested method of aligning data sets in a database. We present experiments to demonstrate the advantage of our method and exploit data properties for optimization.
利用来自多个领域的数据集是科学研究中的一个常见过程。例如,对建筑物性能的研究可能需要来自多个来源的数据,而这些数据缺乏单一的数据报告标准。楼宇管理系统可能每隔5分钟定期报告数据,而空气质素传感器可能只在与先前的数值有重大变化时才会捕捉数值。有许多系统可以帮助将多个数据源集成到单个系统或接口中。然而,这样的系统不一定使修改集成计划变得容易,例如,以适应数据探索、新的和不断变化的数据集或感兴趣问题的变化。我们提出了一个灵活的数据集成系统,以实现跨许多数据集的快速和自适应分析,最初集中在数据对齐步骤:组合来自多个基于时间序列的数据集的数据值,这些数据集的时间表。为此,我们采用一种领域特定语言方法,在这种方法中,我们构建一个用于对齐的领域模型,提供一种用于描述模型中的对齐的规范语言,并用该语言实现规范的解释器。我们的实现利用SQL中的基于排名的连接,它比通常建议的对齐数据库中的数据集的方法产生更快的对齐时间。我们提出了实验来证明我们的方法的优势,并利用数据属性进行优化。
{"title":"Towards Agile Integration: Specification-based Data Alignment","authors":"C. Giossi, D. Maier, K. Tufte, Elliot Gall, M. Barnes","doi":"10.1109/IRI49571.2020.00055","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00055","url":null,"abstract":"Utilizing data sets from multiple domains is a common procedure in scientific research. For example, research on the performance of buildings may require data from multiple sources that lack a singular standard for data reporting. The Building Management System might report data at regular 5minute intervals, whereas an air-quality sensor might capture values only when there has been significant change from the previous value. Many systems exist to help integrate multiple data sources into a single system or interface. However, such systems do not necessarily make it easy to modify an integration plan, for example, to accommodate data exploration, new and changing data sets or shifts in the questions of interest. We propose an agile data-integration system to enable quick and adaptive analysis across many data sets, concentrating initially on the data alignment step: combining data values from multiple time-series based data sets whose time schedules. To this end, we adopt a Domain Specific Language approach where we construct a domain model for alignment, provide a specification language for describing alignments in the model and implement an interpreter for specification in that language. Our implementation exploits a rank-based join in SQL that produces faster alignment times than the commonly suggested method of aligning data sets in a database. We present experiments to demonstrate the advantage of our method and exploit data properties for optimization.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"62 1","pages":"333-340"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87576325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection Methods of Slow Read DoS Using Full Packet Capture Data 基于全抓包数据的慢读DoS检测方法
Clifford Kemp, Chad L. Calvert, T. Khoshgoftaar
Detecting Denial of Service (DoS) attacks on web servers has become extremely popular with cybercriminals and organized crime groups. A successful DoS attack on network resources reduces availability of service to a web site and backend resources, and could easily result in a loss of millions of dollars in revenue depending on company size. There are many DoS attack methods, each of which is critical to providing an understanding of the nature of the DoS attack class. There has been a rise in recent years of application-layer DoS attack methods that target web servers and are challenging to detect. An attack may be disguised to look like legitimate traffic, except it targets specific application packets or functions. Slow Read DoS attack is one type of slow HTTP attack targeting the application-layer. Slow Read attacks are often used to exploit weaknesses in the HTTP protocol, as it is the most widely used protocol on the Internet. In this paper, we use Full Packet Capture (FPC) datasets for detecting Slow Read DoS attacks with machine learning methods. All data collected originates in a live network environment. Our approach produces FPC features taken from network packets at the IP and TCP layers. Experimental results show that the machine learners were quite successful in identifying the Slow Read attacks with high detection and low false alarm rates using FPC data. Our experiment evaluates FPC datasets to determine the accuracy and efficiency of several detection models for Slow Read attacks. The experiment demonstrates that FPC features are discriminative enough to detect such attacks.
在网络犯罪分子和有组织犯罪集团中,检测网络服务器上的拒绝服务攻击(DoS)已经变得非常流行。对网络资源的成功DoS攻击会降低对网站和后端资源的服务可用性,并且很容易导致数百万美元的收入损失,这取决于公司的规模。有许多DoS攻击方法,每一种方法都对理解DoS攻击类的本质至关重要。近年来,针对web服务器的应用层DoS攻击方法有所增加,并且很难检测到。攻击可能伪装成合法的流量,但攻击目标是特定的应用数据包或功能。慢读DoS攻击是一种针对应用层的HTTP慢读攻击。慢读攻击通常用于利用HTTP协议中的弱点,因为它是Internet上使用最广泛的协议。在本文中,我们使用完整数据包捕获(FPC)数据集通过机器学习方法检测慢读DoS攻击。所有收集的数据都来源于一个实时的网络环境。我们的方法从IP和TCP层的网络数据包中产生FPC特征。实验结果表明,利用FPC数据,机器学习器能够很好地识别出检测率高、虚警率低的慢读攻击。我们的实验评估了FPC数据集,以确定几种慢读攻击检测模型的准确性和效率。实验表明,FPC特征具有足够的判别能力来检测此类攻击。
{"title":"Detection Methods of Slow Read DoS Using Full Packet Capture Data","authors":"Clifford Kemp, Chad L. Calvert, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00010","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00010","url":null,"abstract":"Detecting Denial of Service (DoS) attacks on web servers has become extremely popular with cybercriminals and organized crime groups. A successful DoS attack on network resources reduces availability of service to a web site and backend resources, and could easily result in a loss of millions of dollars in revenue depending on company size. There are many DoS attack methods, each of which is critical to providing an understanding of the nature of the DoS attack class. There has been a rise in recent years of application-layer DoS attack methods that target web servers and are challenging to detect. An attack may be disguised to look like legitimate traffic, except it targets specific application packets or functions. Slow Read DoS attack is one type of slow HTTP attack targeting the application-layer. Slow Read attacks are often used to exploit weaknesses in the HTTP protocol, as it is the most widely used protocol on the Internet. In this paper, we use Full Packet Capture (FPC) datasets for detecting Slow Read DoS attacks with machine learning methods. All data collected originates in a live network environment. Our approach produces FPC features taken from network packets at the IP and TCP layers. Experimental results show that the machine learners were quite successful in identifying the Slow Read attacks with high detection and low false alarm rates using FPC data. Our experiment evaluates FPC datasets to determine the accuracy and efficiency of several detection models for Slow Read attacks. The experiment demonstrates that FPC features are discriminative enough to detect such attacks.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"31 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78789284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-Class Cardiovascular Diseases Diagnosis from Electrocardiogram Signals using 1-D Convolution Neural Network 基于一维卷积神经网络的心电图信号多类心血管疾病诊断
Mehdi Fasihi, M. Nadimi-Shahraki, A. Jannesari
The electrocardiogram (ECG) is an important signal in the health informatics for the detection of cardiac abnormalities. There have been several researches on using machine learning techniques for analyzing ECG. However, they need additional computation owning to ECG signals challenges. We introduce a new architecture of 1-D convolution neural network (CNN) to diagnose arrhythmia diseases automatically. The proposed architecture consists of four convolution layers, three pooling layers, and three fully connected layers evaluated on the arrhythmia dataset. All previous researches are conducted to classify healthy people from people with Arrhythmia disease. In this paper, we propose to go further multiclass classification with two classes of cardiac diseases and one class of healthy people. The results are compared with common 1-D CNN and seven different classifiers. The experimental results demonstrate that the proposed architecture is superior to existing classifiers and also competitive with state of the art in terms of accuracy.
心电图(ECG)是健康信息学中检测心脏异常的重要信号。利用机器学习技术进行心电分析已经有了一些研究。然而,由于心电信号的挑战,它们需要额外的计算。本文提出了一种新的一维卷积神经网络(CNN)结构,用于心律失常疾病的自动诊断。该架构由四个卷积层、三个池化层和三个在心律失常数据集上评估的全连接层组成。以往的研究都是将健康人与心律失常患者进行分类。在本文中,我们提出了进一步的多类分类,两类心脏疾病和一类健康的人。结果与常见的一维CNN和七种不同的分类器进行了比较。实验结果表明,所提出的分类器结构优于现有的分类器,并且在准确率方面具有一定的竞争力。
{"title":"Multi-Class Cardiovascular Diseases Diagnosis from Electrocardiogram Signals using 1-D Convolution Neural Network","authors":"Mehdi Fasihi, M. Nadimi-Shahraki, A. Jannesari","doi":"10.1109/IRI49571.2020.00060","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00060","url":null,"abstract":"The electrocardiogram (ECG) is an important signal in the health informatics for the detection of cardiac abnormalities. There have been several researches on using machine learning techniques for analyzing ECG. However, they need additional computation owning to ECG signals challenges. We introduce a new architecture of 1-D convolution neural network (CNN) to diagnose arrhythmia diseases automatically. The proposed architecture consists of four convolution layers, three pooling layers, and three fully connected layers evaluated on the arrhythmia dataset. All previous researches are conducted to classify healthy people from people with Arrhythmia disease. In this paper, we propose to go further multiclass classification with two classes of cardiac diseases and one class of healthy people. The results are compared with common 1-D CNN and seven different classifiers. The experimental results demonstrate that the proposed architecture is superior to existing classifiers and also competitive with state of the art in terms of accuracy.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"33 1","pages":"372-378"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73696595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Distributed Differentially Private Mutual Information Ranking and Its Applications 分布式差分私有互信息排序及其应用
Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma
Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.
互信息计算(MI)有助于理解一对随机变量之间共享的信息量。基于MI排名的自动特征选择技术通常用于从超过pb大小的敏感数据集中提取信息,超过数百万个特征和类。一系列一对一的MI计算可以级联产生n倍的MI结果,快速确定信息关系。这种从数十亿用户的数据集中快速确定最具信息量的关系的能力会引起隐私问题。在本文中,我们提出了分布式差分私有互信息(DDP-MI),这是一种隐私安全的快速批处理MI,适用于各种场景,如特征选择、分割、排序和查询扩展。这种分布式实现采用全局模型差分隐私保护,以提供强大的保证,防止各种隐私攻击。我们还表明,与大规模公共数据集上的标准实现相比,我们的DDP-MI可以大大提高MI计算的效率。
{"title":"Distributed Differentially Private Mutual Information Ranking and Its Applications","authors":"Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma","doi":"10.1109/IRI49571.2020.00021","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00021","url":null,"abstract":"Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"4 1","pages":"90-96"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IRI 2020 TOC
Rashmi Jha, David Kapp, Thuong Khanh Tran
{"title":"IRI 2020 TOC","authors":"Rashmi Jha, David Kapp, Thuong Khanh Tran","doi":"10.1109/iri49571.2020.00004","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00004","url":null,"abstract":"","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73830219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KGdiff: Tracking the Evolution of Knowledge Graphs KGdiff:追踪知识图谱的演变
Abbas Keshavarzi, K. Kochut
A Knowledge Graph (KG) is a machine-readable, labeled graph-like representation of human knowledge. As the main goal of KG is to represent data by enriching it with computer-processable semantics, the knowledge graph creation usually involves acquiring data from external resources and datasets. In many domains, especially in biomedicine, the data sources continuously evolve, and KG engineers and domain experts must not only track the changes in KG entities and their interconnections but introduce changes to the KG schema and the graph population software. We present a framework to track the KG evolution both in terms of the schema and individuals. KGdiff is a software tool that incrementally collects the relevant meta-data information from a KG and compares it to a prior version the KG. The KG is represented in OWL/RDF/RDFS and the meta-data is collected using domain-independent queries. We evaluate our method on different RDF/OWL data sets (ontologies).
知识图(KG)是一种机器可读的、标记的、类似于图的人类知识表示形式。由于知识图谱的主要目标是用计算机可处理的语义来丰富数据,因此知识图谱的创建通常涉及从外部资源和数据集获取数据。在许多领域,特别是生物医学领域,数据源不断演变,KG工程师和领域专家不仅要跟踪KG实体及其相互关系的变化,还要向KG模式和图种群软件引入变化。我们提出了一个框架来跟踪KG在模式和个体方面的演变。KGdiff是一个软件工具,它增量地从KG收集相关的元数据信息,并将其与KG的先前版本进行比较。KG用OWL/RDF/RDFS表示,元数据使用与领域无关的查询收集。我们在不同的RDF/OWL数据集(本体)上评估我们的方法。
{"title":"KGdiff: Tracking the Evolution of Knowledge Graphs","authors":"Abbas Keshavarzi, K. Kochut","doi":"10.1109/IRI49571.2020.00047","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00047","url":null,"abstract":"A Knowledge Graph (KG) is a machine-readable, labeled graph-like representation of human knowledge. As the main goal of KG is to represent data by enriching it with computer-processable semantics, the knowledge graph creation usually involves acquiring data from external resources and datasets. In many domains, especially in biomedicine, the data sources continuously evolve, and KG engineers and domain experts must not only track the changes in KG entities and their interconnections but introduce changes to the KG schema and the graph population software. We present a framework to track the KG evolution both in terms of the schema and individuals. KGdiff is a software tool that incrementally collects the relevant meta-data information from a KG and compares it to a prior version the KG. The KG is represented in OWL/RDF/RDFS and the meta-data is collected using domain-independent queries. We evaluate our method on different RDF/OWL data sets (ontologies).","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"31 9 1","pages":"279-286"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81635781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Data Driven Relational Constraint Programming 数据驱动的关系约束编程
Michael Valdron, K. Pu
We propose a data-driven constraint programming environment that merges the power of two separate domains: databases and SAT-solvers. While a database system offers flexible data models and query languages, SAT solvers offer the ability to satisfy logical constraints and optimization objectives. In this paper, we describe a goal-oriented declarative algebra that seamlessly integrates both worlds. Bring from proven practices in functional programming, we express constants, variables and constraints in a unified relational query language. The language is implemented on top of industrial strength database engines and SAT solvers.In order to support iterative constraint programming with debugging, we propose several debugging operators to assist with interactive constraint solving.
我们提出了一个数据驱动的约束编程环境,它融合了两个独立领域的能力:数据库和sat求解器。数据库系统提供灵活的数据模型和查询语言,而SAT求解器提供满足逻辑约束和优化目标的能力。在本文中,我们描述了一个面向目标的声明性代数,它无缝地集成了这两个世界。从函数式编程的成熟实践中,我们用统一的关系查询语言表达常量、变量和约束。该语言是在工业强度数据库引擎和SAT求解器的基础上实现的。为了支持带调试的迭代约束规划,我们提出了几个调试运算符来辅助交互式约束求解。
{"title":"Data Driven Relational Constraint Programming","authors":"Michael Valdron, K. Pu","doi":"10.1109/IRI49571.2020.00030","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00030","url":null,"abstract":"We propose a data-driven constraint programming environment that merges the power of two separate domains: databases and SAT-solvers. While a database system offers flexible data models and query languages, SAT solvers offer the ability to satisfy logical constraints and optimization objectives. In this paper, we describe a goal-oriented declarative algebra that seamlessly integrates both worlds. Bring from proven practices in functional programming, we express constants, variables and constraints in a unified relational query language. The language is implemented on top of industrial strength database engines and SAT solvers.In order to support iterative constraint programming with debugging, we propose several debugging operators to assist with interactive constraint solving.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"8 1","pages":"156-163"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91072677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combating Hard or Soft Disasters with Privacy-Preserving Federated Mobile Buses-and-Drones based Networks 用保护隐私的联邦移动巴士和无人机网络对抗硬或软灾难
Bo Ma, Jinsong Wu, William Liu, L. Chiaraviglio, Xing Ming
It is foreseeable the popularity of the mobile edge computing enabled infrastructure for wireless networks in the incoming fifth generation (5G) and future sixth generation (6G) wireless networks. Especially after a ‘hard’ disaster such as earthquakes or a ‘soft’ disaster such as COVID-19 pandemic, the existing telecommunication infrastructure, including wired and wireless networks, is often seriously compromised or with infectious disease risks and should-not-close-contact, thus cannot guarantee regular coverage and reliable communications services. These temporarily-missing communications capabilities are crucial to rescuers, health-carers, or affected or infected citizens as the responders need to effectively coordinate and communicate to minimize the loss of lives and property, where the 5G/6G mobile edge network helps. On the other hand, the federated machine learning (FML) methods have been newly developed to address the privacy leakage problems of the traditional machine learning held normally by one centralized organization, associated with the high risks of a single point of hacking. After detailing current state-of-the-art both in privacy-preserving, federated learning, and mobile edge communications networks for ‘hard’ and ‘soft’ disasters, we consider the main challenges that need to be faced. We envision a privacy-preserving federated learning enabled buses-and-drones based mobile edge infrastructure (ppFL-AidLife) for disaster or pandemic emergency communications. The ppFL-AidLife system aims at a rapidly deployable resilient network capable of supporting flexible, privacy-preserving and low-latency communications to serve large-scale disaster situations by utilizing the existing public transport networks, associated with drones to maximally extend their radio coverage to those hard-to-reach disasters or should-not-close-contact pandemic zones.
可以预见,在即将到来的第五代(5G)和未来的第六代(6G)无线网络中,支持移动边缘计算的无线网络基础设施将受到欢迎。特别是在地震等“硬”灾害或COVID-19大流行等“软”灾害发生后,现有的电信基础设施,包括有线和无线网络,往往受到严重损害或存在传染病风险,不应密切接触,因此无法保证定期覆盖和可靠的通信服务。这些暂时缺失的通信能力对救援人员、医护人员或受影响或感染的公民至关重要,因为应急人员需要有效地协调和沟通,以尽量减少生命和财产损失,5G/6G移动边缘网络在这方面提供了帮助。另一方面,联邦机器学习(FML)方法是为了解决传统机器学习的隐私泄露问题而新开发的,传统机器学习通常由一个集中式组织持有,与单点黑客攻击的高风险相关。在详细介绍了隐私保护、联邦学习和移动边缘通信网络在“硬”和“软”灾难方面的最新技术之后,我们考虑了需要面对的主要挑战。我们设想一种基于公共汽车和无人机的保护隐私的联邦学习移动边缘基础设施(ppFL-AidLife),用于灾难或流行病紧急通信。ppFL-AidLife系统旨在建立一个快速部署的弹性网络,能够支持灵活、保护隐私和低延迟的通信,通过利用现有的公共交通网络,最大限度地将其无线电覆盖范围扩展到难以到达的灾难或不应密切接触的流行病地区,从而服务于大规模灾害情况。
{"title":"Combating Hard or Soft Disasters with Privacy-Preserving Federated Mobile Buses-and-Drones based Networks","authors":"Bo Ma, Jinsong Wu, William Liu, L. Chiaraviglio, Xing Ming","doi":"10.1109/IRI49571.2020.00013","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00013","url":null,"abstract":"It is foreseeable the popularity of the mobile edge computing enabled infrastructure for wireless networks in the incoming fifth generation (5G) and future sixth generation (6G) wireless networks. Especially after a ‘hard’ disaster such as earthquakes or a ‘soft’ disaster such as COVID-19 pandemic, the existing telecommunication infrastructure, including wired and wireless networks, is often seriously compromised or with infectious disease risks and should-not-close-contact, thus cannot guarantee regular coverage and reliable communications services. These temporarily-missing communications capabilities are crucial to rescuers, health-carers, or affected or infected citizens as the responders need to effectively coordinate and communicate to minimize the loss of lives and property, where the 5G/6G mobile edge network helps. On the other hand, the federated machine learning (FML) methods have been newly developed to address the privacy leakage problems of the traditional machine learning held normally by one centralized organization, associated with the high risks of a single point of hacking. After detailing current state-of-the-art both in privacy-preserving, federated learning, and mobile edge communications networks for ‘hard’ and ‘soft’ disasters, we consider the main challenges that need to be faced. We envision a privacy-preserving federated learning enabled buses-and-drones based mobile edge infrastructure (ppFL-AidLife) for disaster or pandemic emergency communications. The ppFL-AidLife system aims at a rapidly deployable resilient network capable of supporting flexible, privacy-preserving and low-latency communications to serve large-scale disaster situations by utilizing the existing public transport networks, associated with drones to maximally extend their radio coverage to those hard-to-reach disasters or should-not-close-contact pandemic zones.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"158 1","pages":"31-36"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86730072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
IRI 2020 Breaker Page IRI 2020断路器页面
{"title":"IRI 2020 Breaker Page","authors":"","doi":"10.1109/iri49571.2020.00003","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00003","url":null,"abstract":"","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"43 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83489308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic Embeddings for Medical Providers and Fraud Detection 医疗服务提供者的语义嵌入和欺诈检测
Justin M. Johnson, T. Khoshgoftaar
A medical provider’s specialty is a significant predictor for detecting fraudulent providers with machine learning algorithms. When the specialty variable is encoded using a one-hot representation, however, models are subjected to sparse and uninformative feature vectors. We explore three techniques for representing medical provider types with dense, semantic embeddings that capture specialty similarities. The first two methods (GloVe and Med-Word2Vec) use pre-trained word embeddings to convert provider specialty descriptions to short phrase embeddings. Next, we propose a method for constructing semantic provider type embeddings from the procedure-level activity within each specialty group. For each embedding technique, we use Principal Component Analysis to compare the performance of embedding sizes between 32-128. Each embedding technique is evaluated on a highly imbalanced Medicare fraud prediction task using Logistic Regression (LR), Random Forest (RF), Gradient Boosted Tree (GBT), and Multilayer Perceptron (MLP) learners. Experiments are repeated 30 times and confidence intervals show that all three semantic embeddings significantly outperform one-hot representations when using RF and GBT learners. Our contributions include a novel method for embedding medical specialties from procedure codes and a comparison of three semantic embedding techniques for Medicare fraud detection.
医疗服务提供者的专业是使用机器学习算法检测欺诈性提供者的重要预测因素。然而,当使用单热表示对专业变量进行编码时,模型受到稀疏和无信息的特征向量的影响。我们探索了三种技术,用密集的语义嵌入来表示医疗提供者类型,以捕获专业相似性。前两种方法(GloVe和Med-Word2Vec)使用预训练的词嵌入将提供者专业描述转换为短短语嵌入。接下来,我们提出了一种从每个专业组内的过程级活动构造语义提供者类型嵌入的方法。对于每种嵌入技术,我们使用主成分分析来比较32-128之间嵌入尺寸的性能。每个嵌入技术在高度不平衡的医疗保险欺诈预测任务上进行评估,使用逻辑回归(LR)、随机森林(RF)、梯度提升树(GBT)和多层感知器(MLP)学习器。实验重复了30次,置信区间表明,当使用RF和GBT学习器时,所有三种语义嵌入都明显优于单热表示。我们的贡献包括一种从程序代码中嵌入医学专业的新方法,以及三种用于医疗保险欺诈检测的语义嵌入技术的比较。
{"title":"Semantic Embeddings for Medical Providers and Fraud Detection","authors":"Justin M. Johnson, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00039","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00039","url":null,"abstract":"A medical provider’s specialty is a significant predictor for detecting fraudulent providers with machine learning algorithms. When the specialty variable is encoded using a one-hot representation, however, models are subjected to sparse and uninformative feature vectors. We explore three techniques for representing medical provider types with dense, semantic embeddings that capture specialty similarities. The first two methods (GloVe and Med-Word2Vec) use pre-trained word embeddings to convert provider specialty descriptions to short phrase embeddings. Next, we propose a method for constructing semantic provider type embeddings from the procedure-level activity within each specialty group. For each embedding technique, we use Principal Component Analysis to compare the performance of embedding sizes between 32-128. Each embedding technique is evaluated on a highly imbalanced Medicare fraud prediction task using Logistic Regression (LR), Random Forest (RF), Gradient Boosted Tree (GBT), and Multilayer Perceptron (MLP) learners. Experiments are repeated 30 times and confidence intervals show that all three semantic embeddings significantly outperform one-hot representations when using RF and GBT learners. Our contributions include a novel method for embedding medical specialties from procedure codes and a comparison of three semantic embedding techniques for Medicare fraud detection.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"224-230"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84713156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1