2016 IEEE First International Conference on Data Science in Cyberspace (DSC)最新文献

英文中文

Convolutional Neural Network Based on Principal Component Analysis Initialization for Image Classification 基于主成分分析初始化的卷积神经网络图像分类

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.18

Xudie Ren, Haonan Guo, Guan-Chen He, Xu Xu, C. Di, Sheng-Hong Li

One kind of Deep Learning models-convolutional neural network, which can reduce the complexity of network structure and the number of parameters to be determined through local receptive fields, weight sharing and pooling operation has achieved state of art results in image classification problems. But this model has gradient diffusion problem, which can cause slow updating of the underlying parameters during the process of training. To solve the problem above and make improvements, this paper presents a model of convolutional neural network based on principal component analysis initialization for image classification. Principal component analysis is usually used to reduce the dimension of the raw input images and the complexity of calculating. This paper proposes a use of principal component analysis to extract eigenvectors without supervision and initialize the convolutional kernels, which is combined with the training process of the convolutional neural network. Such kind of initialization values contains image information and reduces the effect of gradient diffusion problem due to the bad initial parameters. According to the image classification experiments on Mnist and Cifar-10 datasets, the model proposed in this paper reduces the processes of iteration and optimization. It also has simple structure as well as less training time compared with the models of traditional convolutional neural network and using Auto-Encoders to initialize.

卷积神经网络是一种深度学习模型，它可以通过局部接受场、权值共享和池化操作来降低网络结构的复杂性和需要确定的参数数量，在图像分类问题中取得了先进的结果。但该模型存在梯度扩散问题，在训练过程中会导致底层参数更新缓慢。为了解决上述问题并进行改进，本文提出了一种基于主成分分析初始化的卷积神经网络图像分类模型。主成分分析通常用于降低原始输入图像的维数和计算复杂度。本文结合卷积神经网络的训练过程，提出了一种利用主成分分析在无监督的情况下提取特征向量并初始化卷积核的方法。这样的初始值包含了图像信息，减少了由于初始参数不好而导致的梯度扩散问题的影响。通过在Mnist和Cifar-10数据集上的图像分类实验，本文提出的模型减少了迭代和优化的过程。与传统的卷积神经网络模型和使用auto - encoder进行初始化相比，该模型结构简单，训练时间短。

{"title":"Convolutional Neural Network Based on Principal Component Analysis Initialization for Image Classification","authors":"Xudie Ren, Haonan Guo, Guan-Chen He, Xu Xu, C. Di, Sheng-Hong Li","doi":"10.1109/DSC.2016.18","DOIUrl":"https://doi.org/10.1109/DSC.2016.18","url":null,"abstract":"One kind of Deep Learning models-convolutional neural network, which can reduce the complexity of network structure and the number of parameters to be determined through local receptive fields, weight sharing and pooling operation has achieved state of art results in image classification problems. But this model has gradient diffusion problem, which can cause slow updating of the underlying parameters during the process of training. To solve the problem above and make improvements, this paper presents a model of convolutional neural network based on principal component analysis initialization for image classification. Principal component analysis is usually used to reduce the dimension of the raw input images and the complexity of calculating. This paper proposes a use of principal component analysis to extract eigenvectors without supervision and initialize the convolutional kernels, which is combined with the training process of the convolutional neural network. Such kind of initialization values contains image information and reduces the effect of gradient diffusion problem due to the bad initial parameters. According to the image classification experiments on Mnist and Cifar-10 datasets, the model proposed in this paper reduces the processes of iteration and optimization. It also has simple structure as well as less training time compared with the models of traditional convolutional neural network and using Auto-Encoders to initialize.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125133428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Survey on Software Vulnerability Analysis Method Based on Machine Learning 基于机器学习的软件漏洞分析方法综述

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.33

Gong Jie, Kuang Xiao-hui, Liu Qiang

With the increasingly rich of vulnerability related data and the extensive application of machine learning methods, software vulnerability analysis methods based on machine learning is becoming an important research area of information security. In this paper, the up-to-date and well-known works in this research area were analyzed deeply. A framework for software vulnerability analysis based on machine learning was proposed. And the existing works were described and compared, the limitations of these works were discussed. The future research directions on software vulnerability analysis based on machine learning were put forward in the end.

随着漏洞相关数据的日益丰富和机器学习方法的广泛应用，基于机器学习的软件漏洞分析方法正成为信息安全的一个重要研究领域。本文对这一研究领域最新的知名著作进行了深入的分析。提出了一种基于机器学习的软件漏洞分析框架。并对已有的研究成果进行了描述和比较，讨论了这些研究成果的局限性。最后提出了基于机器学习的软件漏洞分析的未来研究方向。

引用次数: 16

Fingerprinting Web Browser for Tracing Anonymous Web Attackers 指纹浏览器跟踪匿名Web攻击者

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.78

Xiaofeng Liu, Qixu Liu, Xiaoxi Wang, Zhaopeng Jia

As web attackers hide themselves by using multi-step springboard (e.g., VPN, encrypted proxy) or anonymous network (i.e. Tor network), it raises a big obstacle for traceability and forensics. Furthermore, traditional forensics methods based on traffic and log analysis are just useful for analyzing attack events but useless for fingerprinting an attacker. Because of this, the browser fingerprinting technique which makes use of slight differences among different browsers was come up with. However, although this technique is effective for tracing attackers, countermeasures have been proposed, such as blocking extensions, spoofing extensions and Blink (a dynamic reconfiguration tool). These countermeasures will lead to changes of fingerprints. To solve the instability of browser fingerprints, we present an enhanced solution aiming at tracing attackers continuously even if the fingerprint changes within a particular period of time. By introducing secondary attributes, employing browser storage mechanisms and designing correlation algorithms, we implement the prototype system to examine the accuracy of our approach. Experimental results show that our proposed solution has the ability to associate different fingerprints from a single platform and the accuracy of tracing anonymous web attackers increases by 24.5% than traditional fingerprinting techniques.

由于网络攻击者通过多步跳板(如VPN、加密代理)或匿名网络(如Tor网络)来隐藏自己，这给可追溯性和取证带来了很大的障碍。此外，传统的基于流量和日志分析的取证方法只能用于分析攻击事件，而不能用于识别攻击者。因此，利用不同浏览器之间细微差异的浏览器指纹识别技术应运而生。然而，尽管该技术对于跟踪攻击者是有效的，但已经提出了对策，例如阻止扩展，欺骗扩展和Blink(一种动态重新配置工具)。这些对策会导致指纹的变化。为了解决浏览器指纹的不稳定性问题，我们提出了一种增强的解决方案，即使指纹在特定时间内发生变化，也可以持续跟踪攻击者。通过引入辅助属性、采用浏览器存储机制和设计关联算法，我们实现了原型系统来检验我们方法的准确性。实验结果表明，该方法能够将来自同一平台的不同指纹进行关联，追踪匿名网络攻击者的准确率比传统指纹识别技术提高了24.5%。

{"title":"Fingerprinting Web Browser for Tracing Anonymous Web Attackers","authors":"Xiaofeng Liu, Qixu Liu, Xiaoxi Wang, Zhaopeng Jia","doi":"10.1109/DSC.2016.78","DOIUrl":"https://doi.org/10.1109/DSC.2016.78","url":null,"abstract":"As web attackers hide themselves by using multi-step springboard (e.g., VPN, encrypted proxy) or anonymous network (i.e. Tor network), it raises a big obstacle for traceability and forensics. Furthermore, traditional forensics methods based on traffic and log analysis are just useful for analyzing attack events but useless for fingerprinting an attacker. Because of this, the browser fingerprinting technique which makes use of slight differences among different browsers was come up with. However, although this technique is effective for tracing attackers, countermeasures have been proposed, such as blocking extensions, spoofing extensions and Blink (a dynamic reconfiguration tool). These countermeasures will lead to changes of fingerprints. To solve the instability of browser fingerprints, we present an enhanced solution aiming at tracing attackers continuously even if the fingerprint changes within a particular period of time. By introducing secondary attributes, employing browser storage mechanisms and designing correlation algorithms, we implement the prototype system to examine the accuracy of our approach. Experimental results show that our proposed solution has the ability to associate different fingerprints from a single platform and the accuracy of tracing anonymous web attackers increases by 24.5% than traditional fingerprinting techniques.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123944425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

The Visualization Analysis and Vulnerability Repair Research for the Module Dependency Managerial of VxWorks 5.5 Operating System VxWorks 5.5操作系统模块依赖管理的可视化分析及漏洞修复研究

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.24

Peng Wang, Liang Chen, Peng Zou, Li Li, Junlei Bao

In the process of unloading the module under VxWorks5.5, the vulnerability existing in the module dependency management mechanism of this operating system allows the operator to carry out the unloading operation which violates the module dependency. It often causes serious software errors and even system downtime. In order to repair the vulnerability, firstly we make a visualization analysis of the module dependency management used the oriented graph and cross link list, secondly design the memory map, then invent a safe module-uninstall process based on the dependency management mechanism and we carry out the trial and verification. When unloading the module, the process can check module dependency at first and terminate the unloading operation violating the dependence, which finally repair the vulnerability effectively.

在VxWorks5.5下卸载模块的过程中，该操作系统模块依赖管理机制存在漏洞，允许操作人员进行违反模块依赖的卸载操作。它经常导致严重的软件错误，甚至系统停机。为了修复该漏洞，首先利用面向图和交链表对模块依赖管理进行可视化分析，其次设计内存映射，然后基于依赖管理机制发明一个安全的模块卸载流程，并进行了试验验证。卸载模块时，进程可以先检查模块依赖关系，并终止违反依赖关系的卸载操作，最终有效修复漏洞。

引用次数: 0

Parallelization of Latent Group Model for Group Recommendation Algorithm 群组推荐算法中潜在群组模型的并行化

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.54

Xuelin Zeng, Bin Wu, Jinghan Shi, Chang Liu, Qian Guo

Recommendation system was proposed to solve the problem of information overload. Group recommendation is demanded as well as individual recommendation. Accuracy and efficiency come as main challenges in this field. Recently, group recommendation algorithm based on latent factor model has been proposed, which assumes that users are influenced implicitly by some latent factors. Existing method detects groups by considering latent factors and makes up users' profile in the form of latent factor. Then users' latent factor profiles were aggregated into a group profile and matrix multiplication was used for group recommendation. One of the core parts of this model is matrix factorization. Due to the high computational overhead of matrix factorization, it is relatively weak in big data processing. In this paper, we propose a Parallel Latent Group Model (PLGM) to improve the ability of processing large-scale data and to enhance the reliability and scalability. There are two models of matrix factorization in our consideration -- SGD and ALS. We implement parallel matrix factorization based on SGD on spark and compare it with ALS in MLlib. The strength and weakness of each model are analyzed based on the experimental result. Besides, different user profile aggregation strategies are studied in this paper and the best one is adopted to the model instead of the previous one. PLGM and LGM are compared in both accuracy and efficiency. Empirical studies on real datasets from MovieLens and Dianping.com demonstrate the effectiveness and efficiency of our improvement.

推荐系统是为了解决信息过载问题而提出的。除了个人推荐外，还需要群体推荐。准确性和效率是该领域面临的主要挑战。近年来，提出了基于潜在因素模型的群组推荐算法，该算法假设用户受到某些潜在因素的隐式影响。现有方法通过考虑潜在因素来检测群体，并以潜在因素的形式构成用户的轮廓。然后，将用户潜在因素信息聚合成群组信息，采用矩阵乘法进行群组推荐。该模型的核心部分之一是矩阵分解。由于矩阵分解的计算开销较大，在大数据处理中相对较弱。本文提出了一种并行潜群模型(PLGM)，以提高处理大规模数据的能力，提高可靠性和可扩展性。在我们的考虑中有两个矩阵分解模型——SGD和ALS。我们在spark上实现了基于SGD的并行矩阵分解，并与MLlib中的ALS进行了比较。根据实验结果，分析了各模型的优缺点。此外，本文还研究了不同的用户画像聚合策略，并采用最优的聚合策略代替了之前的聚合策略。PLGM和LGM在精度和效率上进行了比较。对MovieLens和大众点评网真实数据集的实证研究证明了我们改进的有效性和效率。

{"title":"Parallelization of Latent Group Model for Group Recommendation Algorithm","authors":"Xuelin Zeng, Bin Wu, Jinghan Shi, Chang Liu, Qian Guo","doi":"10.1109/DSC.2016.54","DOIUrl":"https://doi.org/10.1109/DSC.2016.54","url":null,"abstract":"Recommendation system was proposed to solve the problem of information overload. Group recommendation is demanded as well as individual recommendation. Accuracy and efficiency come as main challenges in this field. Recently, group recommendation algorithm based on latent factor model has been proposed, which assumes that users are influenced implicitly by some latent factors. Existing method detects groups by considering latent factors and makes up users' profile in the form of latent factor. Then users' latent factor profiles were aggregated into a group profile and matrix multiplication was used for group recommendation. One of the core parts of this model is matrix factorization. Due to the high computational overhead of matrix factorization, it is relatively weak in big data processing. In this paper, we propose a Parallel Latent Group Model (PLGM) to improve the ability of processing large-scale data and to enhance the reliability and scalability. There are two models of matrix factorization in our consideration -- SGD and ALS. We implement parallel matrix factorization based on SGD on spark and compare it with ALS in MLlib. The strength and weakness of each model are analyzed based on the experimental result. Besides, different user profile aggregation strategies are studied in this paper and the best one is adopted to the model instead of the previous one. PLGM and LGM are compared in both accuracy and efficiency. Empirical studies on real datasets from MovieLens and Dianping.com demonstrate the effectiveness and efficiency of our improvement.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132436502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

BPPGD: Budgeted Parallel Primal Gradient Descent Kernel SVM on Spark 基于Spark的预算并行原始梯度下降核支持向量机

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.36

Jinchen Sai, Bai Wang, Bin Wu

Stochastic Gradient Descent (SGD) is the best known method to optimize the primal objective for linear support vector machines (SVM) to dispose large data. However, when equipped with kernel functions, SGD performance is vulnerable that causes unbounded linear growth in model size and update time with data size. This paper describes a budgeted parallel pack gradient descent algorithm (BPPGD) that can improve SVM optimize problem with Gaussian Radial Basis Function (RBF) to large-scale data and run efficiently on Apache Spark with high degree of parallelization. Apache Spark is a fast and general engine for large-scale data processing which has advantage on big data parallel computing and dealing with iterative algorithms. BPPGD algorithm has constant time complexity per update. It uses a new distributed hash table -- IndexedRDD to increase the parallel degree, packing strategy to improve SGD performance with reducing the number of communication and removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BPPGD achieves higher accuracy than P-packSVM (Zhu et al., 2009) and BSGD (Zhuang et al., 2012) algorithms on Spark environment, and it takes shorter time.

随机梯度下降法(SGD)是线性支持向量机(SVM)处理大数据时最常用的优化原始目标的方法。然而，当配备内核函数时，SGD性能很脆弱，导致模型大小和更新时间随数据大小无界线性增长。本文提出了一种预算并行分组梯度下降算法(BPPGD)，该算法可以改善基于高斯径向基函数(RBF)的支持向量机大规模数据优化问题，并在Apache Spark上高效运行，具有高度并行化。Apache Spark是一种快速通用的大规模数据处理引擎，在大数据并行计算和迭代算法处理方面具有优势。BPPGD算法每次更新具有恒定的时间复杂度。它使用了一种新的分布式哈希表——IndexedRDD来增加并行度，采用打包策略来减少通信次数以提高SGD性能，采用删除预算维护方法来保持支持向量(SVs)的数量。实验结果表明，在Spark环境下，BPPGD比P-packSVM (Zhu et al.， 2009)和BSGD (Zhuang et al.， 2012)算法的准确率更高，且耗时更短。

{"title":"BPPGD: Budgeted Parallel Primal Gradient Descent Kernel SVM on Spark","authors":"Jinchen Sai, Bai Wang, Bin Wu","doi":"10.1109/DSC.2016.36","DOIUrl":"https://doi.org/10.1109/DSC.2016.36","url":null,"abstract":"Stochastic Gradient Descent (SGD) is the best known method to optimize the primal objective for linear support vector machines (SVM) to dispose large data. However, when equipped with kernel functions, SGD performance is vulnerable that causes unbounded linear growth in model size and update time with data size. This paper describes a budgeted parallel pack gradient descent algorithm (BPPGD) that can improve SVM optimize problem with Gaussian Radial Basis Function (RBF) to large-scale data and run efficiently on Apache Spark with high degree of parallelization. Apache Spark is a fast and general engine for large-scale data processing which has advantage on big data parallel computing and dealing with iterative algorithms. BPPGD algorithm has constant time complexity per update. It uses a new distributed hash table -- IndexedRDD to increase the parallel degree, packing strategy to improve SGD performance with reducing the number of communication and removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BPPGD achieves higher accuracy than P-packSVM (Zhu et al., 2009) and BSGD (Zhuang et al., 2012) algorithms on Spark environment, and it takes shorter time.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134509696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A New Scheme Based on HSSL for Solving the Stochastic Point Location Problem 一种基于HSSL的随机点定位新方案

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.42

Jinchao Huang, Yan Yan, Ying Guo, Shenghong Li

This paper deals with the stochastic point location (SPL) problem which can be described as: a learning mechanism (LM) determines the optimal point on the line and it only receives the stochastic information from the environment, which guides LM the direction it should move. Scholars have proposed various methods to solve this problem, and the latest method hierarchical stochastic searching on the line (HSSL) proposed by Oommen has greatly improved the performance of LM. The research is based on the method HSSL. The method HSSL includes a decision table which determines the next search interval after LM receives the information from the environment. When LM receives [R, R, R] or [L, L, L], the decision table considers the two cases as inconsistent. However the two cases are more likely to be the effective information. Therefore, in this paper, some changes are made to the two cases, and a new decision table is proposed to let the LM make full use of the information from the environment. The new scheme has been simulated, and the results obtained prove the out-performance of the new decision table.

本文研究的随机点定位(SPL)问题可以描述为:一个学习机制(LM)决定线上的最优点，它只接收来自环境的随机信息，并引导LM向它应该移动的方向移动。学者们提出了各种方法来解决这一问题，Oommen提出的最新方法分层随机在线搜索(HSSL)大大提高了LM的性能。该研究基于HSSL方法。HSSL方法包括一个决策表，该决策表决定LM从环境接收到信息后的下一个搜索间隔。当LM接收到[R, R, R]或[L, L, L]时，决策表认为这两种情况不一致。然而，这两种情况更可能是有效的信息。因此，本文对这两种情况做了一些改变，提出了一个新的决策表，让LM充分利用来自环境的信息。对新方案进行了仿真，结果证明了新决策表的优越性。

引用次数: 0

Integrating Relationships and Attributes: A Model of Multilayer Networks 整合关系与属性:多层网络模型

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.51

Wen Zhou, Weidong Bao, Xiaomin Zhu, Ji Wang, Chao Chen

Various network relationships in many complex social systems can be described effectively by multilayer networks, but we find that there are interactions between individuals attributes and their social relationships by a principle of homophily, which hence impact the process of information spread and social influence in complex social systems. In order to integrate individuals relationships and attributes in complex social systems effectively, we extract the hidden information of individuals attributes to build a relationships-attributes-based model of multi-layer networks. Proposing that using information entropy which satisfies the conditions of degree distribution and community features to evaluate information values for each network in the multilayer networks, we construct a more reasonable integrated network to solve the problems of data compression reduction of multilayer networks. In addition, we analyze the relationships-attributes-based multilayer networks from the perspectives of the structure of the multilayer networks and the structure of the integrated network on two empirical data. The results verify the correlation between attribute networks and relationship networks, and give more insights into the importance of the proposed relationships-attributes-based model of multilayer networks and the positive role of the integrated network in synthesizing the relationships-attributes-based multilayer networks.

许多复杂社会系统中的各种网络关系可以用多层网络来有效描述，但我们发现个体属性与其社会关系之间存在着同质性原理的相互作用，从而影响着复杂社会系统中信息传播和社会影响的过程。为了有效地整合复杂社会系统中的个体关系和属性，我们提取了个体属性的隐藏信息，构建了基于关系-属性的多层网络模型。提出利用满足度分布和群体特征条件的信息熵来评价多层网络中各网络的信息价值，构建更合理的集成网络，解决多层网络的数据压缩问题。此外，本文还从多层网络的结构和集成网络的结构两方面对基于关系属性的多层网络进行了分析。研究结果验证了属性网络与关系网络之间的相关性，进一步揭示了基于关系属性的多层网络模型的重要性，以及集成网络在综合基于关系属性的多层网络中的积极作用。

{"title":"Integrating Relationships and Attributes: A Model of Multilayer Networks","authors":"Wen Zhou, Weidong Bao, Xiaomin Zhu, Ji Wang, Chao Chen","doi":"10.1109/DSC.2016.51","DOIUrl":"https://doi.org/10.1109/DSC.2016.51","url":null,"abstract":"Various network relationships in many complex social systems can be described effectively by multilayer networks, but we find that there are interactions between individuals attributes and their social relationships by a principle of homophily, which hence impact the process of information spread and social influence in complex social systems. In order to integrate individuals relationships and attributes in complex social systems effectively, we extract the hidden information of individuals attributes to build a relationships-attributes-based model of multi-layer networks. Proposing that using information entropy which satisfies the conditions of degree distribution and community features to evaluate information values for each network in the multilayer networks, we construct a more reasonable integrated network to solve the problems of data compression reduction of multilayer networks. In addition, we analyze the relationships-attributes-based multilayer networks from the perspectives of the structure of the multilayer networks and the structure of the integrated network on two empirical data. The results verify the correlation between attribute networks and relationship networks, and give more insights into the importance of the proposed relationships-attributes-based model of multilayer networks and the positive role of the integrated network in synthesizing the relationships-attributes-based multilayer networks.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115164166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

OPSDS: A Semantic Data Integration and Service System Based on Domain Ontology OPSDS:基于领域本体的语义数据集成与服务系统

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.15

Xin Liu, Chungjin Hu, Jianyi Huang, Feng Liu

For the distributed, heterogeneous, relational complex data sources of petroleum engineering, we present an oil production engineering semantic-based data integration system (OPSDS). OPSDS establishes a semantic data integration and service system based on domain ontology on the premise of building a global semantic model and realizing the global semantic search. The global semantic data model applied to various oil fields is set up by ontology extraction, ontology evolution, ontology combination and semantic constraints. The domain-oriented data integration to provide the data access and shared service is realized by ontology mapping, query transformation, and data cleaning. Users and upper applications can have a direct access to underlying complex data sources in times of need through the global semantic data model, and the cleaned data can be returned in a unified format. OPSDS has been realized and got extensive use in many platforms of China National Petroleum Corporation(CNPC). It has been found that the method can not only provide the comprehensive and real-time data support for oil and gas wells, but also improve the production and recovery efficiency with good application.

针对石油工程中分布式、异构、关系复杂的数据源，提出了一种基于语义的采油工程数据集成系统(OPSDS)。OPSDS在构建全局语义模型和实现全局语义搜索的前提下，建立了基于领域本体的语义数据集成与服务系统。通过本体提取、本体演化、本体组合和语义约束等步骤，建立了适用于各油田的全局语义数据模型。通过本体映射、查询转换和数据清理，实现面向领域的数据集成，提供数据访问和共享服务。用户和上层应用程序可以在需要时通过全局语义数据模型直接访问底层复杂数据源，并且可以以统一格式返回清理后的数据。OPSDS已在中国石油天然气集团公司多个平台上实现并得到广泛应用。实践证明，该方法不仅可以为油气井提供全面、实时的数据支持，而且可以提高生产和采收率，应用效果良好。

{"title":"OPSDS: A Semantic Data Integration and Service System Based on Domain Ontology","authors":"Xin Liu, Chungjin Hu, Jianyi Huang, Feng Liu","doi":"10.1109/DSC.2016.15","DOIUrl":"https://doi.org/10.1109/DSC.2016.15","url":null,"abstract":"For the distributed, heterogeneous, relational complex data sources of petroleum engineering, we present an oil production engineering semantic-based data integration system (OPSDS). OPSDS establishes a semantic data integration and service system based on domain ontology on the premise of building a global semantic model and realizing the global semantic search. The global semantic data model applied to various oil fields is set up by ontology extraction, ontology evolution, ontology combination and semantic constraints. The domain-oriented data integration to provide the data access and shared service is realized by ontology mapping, query transformation, and data cleaning. Users and upper applications can have a direct access to underlying complex data sources in times of need through the global semantic data model, and the cleaned data can be returned in a unified format. OPSDS has been realized and got extensive use in many platforms of China National Petroleum Corporation(CNPC). It has been found that the method can not only provide the comprehensive and real-time data support for oil and gas wells, but also improve the production and recovery efficiency with good application.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"267 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122982954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

Pub Date : 2016-06-13 DOI: 10.1109/DSC.2016.13

Chao An, Jiuming Huang, Shoufeng Chang, Zhijie Huang

Modeling sentence similarity all along is a challengeable task in the field of natural language processing (NLP), since ambiguity and variability of linguistic expression. Specifically, in the field of community question answering (CQA), homologous hotspot is focusing on question retrieval. To get the most similar question compared with user's query, we proposed a question model building with Bidirectional Long Short-Term Memory (BLSTM) neural networks, which as well can be used in other fields, such as sentence similarity computation, paraphrase detection, question answering and so on. We evaluated our model in labeled Yahoo! Answers data, and results show that our method achieves significant improvement over existing methods without using external resources, such as WordNet or parsers.

由于语言表达的模糊性和可变性，句子相似度建模一直是自然语言处理(NLP)领域的一项具有挑战性的任务。具体来说，在社区问答(CQA)领域，相应的热点集中在问题检索方面。为了获得与用户查询最相似的问题，我们提出了一种基于双向长短期记忆(Bidirectional Long - short - Memory, BLSTM)神经网络的问题模型构建方法，该方法也可用于句子相似度计算、意译检测、问题回答等领域。我们在标记为Yahoo!回答数据，结果表明我们的方法在不使用外部资源(如WordNet或解析器)的情况下比现有方法取得了显着改进。

引用次数: 5

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE First International Conference on Data Science in Cyberspace (DSC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀