Xudie Ren, Haonan Guo, Guan-Chen He, Xu Xu, C. Di, Sheng-Hong Li
One kind of Deep Learning models-convolutional neural network, which can reduce the complexity of network structure and the number of parameters to be determined through local receptive fields, weight sharing and pooling operation has achieved state of art results in image classification problems. But this model has gradient diffusion problem, which can cause slow updating of the underlying parameters during the process of training. To solve the problem above and make improvements, this paper presents a model of convolutional neural network based on principal component analysis initialization for image classification. Principal component analysis is usually used to reduce the dimension of the raw input images and the complexity of calculating. This paper proposes a use of principal component analysis to extract eigenvectors without supervision and initialize the convolutional kernels, which is combined with the training process of the convolutional neural network. Such kind of initialization values contains image information and reduces the effect of gradient diffusion problem due to the bad initial parameters. According to the image classification experiments on Mnist and Cifar-10 datasets, the model proposed in this paper reduces the processes of iteration and optimization. It also has simple structure as well as less training time compared with the models of traditional convolutional neural network and using Auto-Encoders to initialize.
{"title":"Convolutional Neural Network Based on Principal Component Analysis Initialization for Image Classification","authors":"Xudie Ren, Haonan Guo, Guan-Chen He, Xu Xu, C. Di, Sheng-Hong Li","doi":"10.1109/DSC.2016.18","DOIUrl":"https://doi.org/10.1109/DSC.2016.18","url":null,"abstract":"One kind of Deep Learning models-convolutional neural network, which can reduce the complexity of network structure and the number of parameters to be determined through local receptive fields, weight sharing and pooling operation has achieved state of art results in image classification problems. But this model has gradient diffusion problem, which can cause slow updating of the underlying parameters during the process of training. To solve the problem above and make improvements, this paper presents a model of convolutional neural network based on principal component analysis initialization for image classification. Principal component analysis is usually used to reduce the dimension of the raw input images and the complexity of calculating. This paper proposes a use of principal component analysis to extract eigenvectors without supervision and initialize the convolutional kernels, which is combined with the training process of the convolutional neural network. Such kind of initialization values contains image information and reduces the effect of gradient diffusion problem due to the bad initial parameters. According to the image classification experiments on Mnist and Cifar-10 datasets, the model proposed in this paper reduces the processes of iteration and optimization. It also has simple structure as well as less training time compared with the models of traditional convolutional neural network and using Auto-Encoders to initialize.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125133428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasingly rich of vulnerability related data and the extensive application of machine learning methods, software vulnerability analysis methods based on machine learning is becoming an important research area of information security. In this paper, the up-to-date and well-known works in this research area were analyzed deeply. A framework for software vulnerability analysis based on machine learning was proposed. And the existing works were described and compared, the limitations of these works were discussed. The future research directions on software vulnerability analysis based on machine learning were put forward in the end.
{"title":"Survey on Software Vulnerability Analysis Method Based on Machine Learning","authors":"Gong Jie, Kuang Xiao-hui, Liu Qiang","doi":"10.1109/DSC.2016.33","DOIUrl":"https://doi.org/10.1109/DSC.2016.33","url":null,"abstract":"With the increasingly rich of vulnerability related data and the extensive application of machine learning methods, software vulnerability analysis methods based on machine learning is becoming an important research area of information security. In this paper, the up-to-date and well-known works in this research area were analyzed deeply. A framework for software vulnerability analysis based on machine learning was proposed. And the existing works were described and compared, the limitations of these works were discussed. The future research directions on software vulnerability analysis based on machine learning were put forward in the end.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116024628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As web attackers hide themselves by using multi-step springboard (e.g., VPN, encrypted proxy) or anonymous network (i.e. Tor network), it raises a big obstacle for traceability and forensics. Furthermore, traditional forensics methods based on traffic and log analysis are just useful for analyzing attack events but useless for fingerprinting an attacker. Because of this, the browser fingerprinting technique which makes use of slight differences among different browsers was come up with. However, although this technique is effective for tracing attackers, countermeasures have been proposed, such as blocking extensions, spoofing extensions and Blink (a dynamic reconfiguration tool). These countermeasures will lead to changes of fingerprints. To solve the instability of browser fingerprints, we present an enhanced solution aiming at tracing attackers continuously even if the fingerprint changes within a particular period of time. By introducing secondary attributes, employing browser storage mechanisms and designing correlation algorithms, we implement the prototype system to examine the accuracy of our approach. Experimental results show that our proposed solution has the ability to associate different fingerprints from a single platform and the accuracy of tracing anonymous web attackers increases by 24.5% than traditional fingerprinting techniques.
{"title":"Fingerprinting Web Browser for Tracing Anonymous Web Attackers","authors":"Xiaofeng Liu, Qixu Liu, Xiaoxi Wang, Zhaopeng Jia","doi":"10.1109/DSC.2016.78","DOIUrl":"https://doi.org/10.1109/DSC.2016.78","url":null,"abstract":"As web attackers hide themselves by using multi-step springboard (e.g., VPN, encrypted proxy) or anonymous network (i.e. Tor network), it raises a big obstacle for traceability and forensics. Furthermore, traditional forensics methods based on traffic and log analysis are just useful for analyzing attack events but useless for fingerprinting an attacker. Because of this, the browser fingerprinting technique which makes use of slight differences among different browsers was come up with. However, although this technique is effective for tracing attackers, countermeasures have been proposed, such as blocking extensions, spoofing extensions and Blink (a dynamic reconfiguration tool). These countermeasures will lead to changes of fingerprints. To solve the instability of browser fingerprints, we present an enhanced solution aiming at tracing attackers continuously even if the fingerprint changes within a particular period of time. By introducing secondary attributes, employing browser storage mechanisms and designing correlation algorithms, we implement the prototype system to examine the accuracy of our approach. Experimental results show that our proposed solution has the ability to associate different fingerprints from a single platform and the accuracy of tracing anonymous web attackers increases by 24.5% than traditional fingerprinting techniques.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123944425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Wang, Liang Chen, Peng Zou, Li Li, Junlei Bao
In the process of unloading the module under VxWorks5.5, the vulnerability existing in the module dependency management mechanism of this operating system allows the operator to carry out the unloading operation which violates the module dependency. It often causes serious software errors and even system downtime. In order to repair the vulnerability, firstly we make a visualization analysis of the module dependency management used the oriented graph and cross link list, secondly design the memory map, then invent a safe module-uninstall process based on the dependency management mechanism and we carry out the trial and verification. When unloading the module, the process can check module dependency at first and terminate the unloading operation violating the dependence, which finally repair the vulnerability effectively.
{"title":"The Visualization Analysis and Vulnerability Repair Research for the Module Dependency Managerial of VxWorks 5.5 Operating System","authors":"Peng Wang, Liang Chen, Peng Zou, Li Li, Junlei Bao","doi":"10.1109/DSC.2016.24","DOIUrl":"https://doi.org/10.1109/DSC.2016.24","url":null,"abstract":"In the process of unloading the module under VxWorks5.5, the vulnerability existing in the module dependency management mechanism of this operating system allows the operator to carry out the unloading operation which violates the module dependency. It often causes serious software errors and even system downtime. In order to repair the vulnerability, firstly we make a visualization analysis of the module dependency management used the oriented graph and cross link list, secondly design the memory map, then invent a safe module-uninstall process based on the dependency management mechanism and we carry out the trial and verification. When unloading the module, the process can check module dependency at first and terminate the unloading operation violating the dependence, which finally repair the vulnerability effectively.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121521753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuelin Zeng, Bin Wu, Jinghan Shi, Chang Liu, Qian Guo
Recommendation system was proposed to solve the problem of information overload. Group recommendation is demanded as well as individual recommendation. Accuracy and efficiency come as main challenges in this field. Recently, group recommendation algorithm based on latent factor model has been proposed, which assumes that users are influenced implicitly by some latent factors. Existing method detects groups by considering latent factors and makes up users' profile in the form of latent factor. Then users' latent factor profiles were aggregated into a group profile and matrix multiplication was used for group recommendation. One of the core parts of this model is matrix factorization. Due to the high computational overhead of matrix factorization, it is relatively weak in big data processing. In this paper, we propose a Parallel Latent Group Model (PLGM) to improve the ability of processing large-scale data and to enhance the reliability and scalability. There are two models of matrix factorization in our consideration -- SGD and ALS. We implement parallel matrix factorization based on SGD on spark and compare it with ALS in MLlib. The strength and weakness of each model are analyzed based on the experimental result. Besides, different user profile aggregation strategies are studied in this paper and the best one is adopted to the model instead of the previous one. PLGM and LGM are compared in both accuracy and efficiency. Empirical studies on real datasets from MovieLens and Dianping.com demonstrate the effectiveness and efficiency of our improvement.
{"title":"Parallelization of Latent Group Model for Group Recommendation Algorithm","authors":"Xuelin Zeng, Bin Wu, Jinghan Shi, Chang Liu, Qian Guo","doi":"10.1109/DSC.2016.54","DOIUrl":"https://doi.org/10.1109/DSC.2016.54","url":null,"abstract":"Recommendation system was proposed to solve the problem of information overload. Group recommendation is demanded as well as individual recommendation. Accuracy and efficiency come as main challenges in this field. Recently, group recommendation algorithm based on latent factor model has been proposed, which assumes that users are influenced implicitly by some latent factors. Existing method detects groups by considering latent factors and makes up users' profile in the form of latent factor. Then users' latent factor profiles were aggregated into a group profile and matrix multiplication was used for group recommendation. One of the core parts of this model is matrix factorization. Due to the high computational overhead of matrix factorization, it is relatively weak in big data processing. In this paper, we propose a Parallel Latent Group Model (PLGM) to improve the ability of processing large-scale data and to enhance the reliability and scalability. There are two models of matrix factorization in our consideration -- SGD and ALS. We implement parallel matrix factorization based on SGD on spark and compare it with ALS in MLlib. The strength and weakness of each model are analyzed based on the experimental result. Besides, different user profile aggregation strategies are studied in this paper and the best one is adopted to the model instead of the previous one. PLGM and LGM are compared in both accuracy and efficiency. Empirical studies on real datasets from MovieLens and Dianping.com demonstrate the effectiveness and efficiency of our improvement.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132436502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stochastic Gradient Descent (SGD) is the best known method to optimize the primal objective for linear support vector machines (SVM) to dispose large data. However, when equipped with kernel functions, SGD performance is vulnerable that causes unbounded linear growth in model size and update time with data size. This paper describes a budgeted parallel pack gradient descent algorithm (BPPGD) that can improve SVM optimize problem with Gaussian Radial Basis Function (RBF) to large-scale data and run efficiently on Apache Spark with high degree of parallelization. Apache Spark is a fast and general engine for large-scale data processing which has advantage on big data parallel computing and dealing with iterative algorithms. BPPGD algorithm has constant time complexity per update. It uses a new distributed hash table -- IndexedRDD to increase the parallel degree, packing strategy to improve SGD performance with reducing the number of communication and removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BPPGD achieves higher accuracy than P-packSVM (Zhu et al., 2009) and BSGD (Zhuang et al., 2012) algorithms on Spark environment, and it takes shorter time.
随机梯度下降法(SGD)是线性支持向量机(SVM)处理大数据时最常用的优化原始目标的方法。然而,当配备内核函数时,SGD性能很脆弱,导致模型大小和更新时间随数据大小无界线性增长。本文提出了一种预算并行分组梯度下降算法(BPPGD),该算法可以改善基于高斯径向基函数(RBF)的支持向量机大规模数据优化问题,并在Apache Spark上高效运行,具有高度并行化。Apache Spark是一种快速通用的大规模数据处理引擎,在大数据并行计算和迭代算法处理方面具有优势。BPPGD算法每次更新具有恒定的时间复杂度。它使用了一种新的分布式哈希表——IndexedRDD来增加并行度,采用打包策略来减少通信次数以提高SGD性能,采用删除预算维护方法来保持支持向量(SVs)的数量。实验结果表明,在Spark环境下,BPPGD比P-packSVM (Zhu et al., 2009)和BSGD (Zhuang et al., 2012)算法的准确率更高,且耗时更短。
{"title":"BPPGD: Budgeted Parallel Primal Gradient Descent Kernel SVM on Spark","authors":"Jinchen Sai, Bai Wang, Bin Wu","doi":"10.1109/DSC.2016.36","DOIUrl":"https://doi.org/10.1109/DSC.2016.36","url":null,"abstract":"Stochastic Gradient Descent (SGD) is the best known method to optimize the primal objective for linear support vector machines (SVM) to dispose large data. However, when equipped with kernel functions, SGD performance is vulnerable that causes unbounded linear growth in model size and update time with data size. This paper describes a budgeted parallel pack gradient descent algorithm (BPPGD) that can improve SVM optimize problem with Gaussian Radial Basis Function (RBF) to large-scale data and run efficiently on Apache Spark with high degree of parallelization. Apache Spark is a fast and general engine for large-scale data processing which has advantage on big data parallel computing and dealing with iterative algorithms. BPPGD algorithm has constant time complexity per update. It uses a new distributed hash table -- IndexedRDD to increase the parallel degree, packing strategy to improve SGD performance with reducing the number of communication and removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BPPGD achieves higher accuracy than P-packSVM (Zhu et al., 2009) and BSGD (Zhuang et al., 2012) algorithms on Spark environment, and it takes shorter time.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134509696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper deals with the stochastic point location (SPL) problem which can be described as: a learning mechanism (LM) determines the optimal point on the line and it only receives the stochastic information from the environment, which guides LM the direction it should move. Scholars have proposed various methods to solve this problem, and the latest method hierarchical stochastic searching on the line (HSSL) proposed by Oommen has greatly improved the performance of LM. The research is based on the method HSSL. The method HSSL includes a decision table which determines the next search interval after LM receives the information from the environment. When LM receives [R, R, R] or [L, L, L], the decision table considers the two cases as inconsistent. However the two cases are more likely to be the effective information. Therefore, in this paper, some changes are made to the two cases, and a new decision table is proposed to let the LM make full use of the information from the environment. The new scheme has been simulated, and the results obtained prove the out-performance of the new decision table.
{"title":"A New Scheme Based on HSSL for Solving the Stochastic Point Location Problem","authors":"Jinchao Huang, Yan Yan, Ying Guo, Shenghong Li","doi":"10.1109/DSC.2016.42","DOIUrl":"https://doi.org/10.1109/DSC.2016.42","url":null,"abstract":"This paper deals with the stochastic point location (SPL) problem which can be described as: a learning mechanism (LM) determines the optimal point on the line and it only receives the stochastic information from the environment, which guides LM the direction it should move. Scholars have proposed various methods to solve this problem, and the latest method hierarchical stochastic searching on the line (HSSL) proposed by Oommen has greatly improved the performance of LM. The research is based on the method HSSL. The method HSSL includes a decision table which determines the next search interval after LM receives the information from the environment. When LM receives [R, R, R] or [L, L, L], the decision table considers the two cases as inconsistent. However the two cases are more likely to be the effective information. Therefore, in this paper, some changes are made to the two cases, and a new decision table is proposed to let the LM make full use of the information from the environment. The new scheme has been simulated, and the results obtained prove the out-performance of the new decision table.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132632887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen Zhou, Weidong Bao, Xiaomin Zhu, Ji Wang, Chao Chen
Various network relationships in many complex social systems can be described effectively by multilayer networks, but we find that there are interactions between individuals attributes and their social relationships by a principle of homophily, which hence impact the process of information spread and social influence in complex social systems. In order to integrate individuals relationships and attributes in complex social systems effectively, we extract the hidden information of individuals attributes to build a relationships-attributes-based model of multi-layer networks. Proposing that using information entropy which satisfies the conditions of degree distribution and community features to evaluate information values for each network in the multilayer networks, we construct a more reasonable integrated network to solve the problems of data compression reduction of multilayer networks. In addition, we analyze the relationships-attributes-based multilayer networks from the perspectives of the structure of the multilayer networks and the structure of the integrated network on two empirical data. The results verify the correlation between attribute networks and relationship networks, and give more insights into the importance of the proposed relationships-attributes-based model of multilayer networks and the positive role of the integrated network in synthesizing the relationships-attributes-based multilayer networks.
{"title":"Integrating Relationships and Attributes: A Model of Multilayer Networks","authors":"Wen Zhou, Weidong Bao, Xiaomin Zhu, Ji Wang, Chao Chen","doi":"10.1109/DSC.2016.51","DOIUrl":"https://doi.org/10.1109/DSC.2016.51","url":null,"abstract":"Various network relationships in many complex social systems can be described effectively by multilayer networks, but we find that there are interactions between individuals attributes and their social relationships by a principle of homophily, which hence impact the process of information spread and social influence in complex social systems. In order to integrate individuals relationships and attributes in complex social systems effectively, we extract the hidden information of individuals attributes to build a relationships-attributes-based model of multi-layer networks. Proposing that using information entropy which satisfies the conditions of degree distribution and community features to evaluate information values for each network in the multilayer networks, we construct a more reasonable integrated network to solve the problems of data compression reduction of multilayer networks. In addition, we analyze the relationships-attributes-based multilayer networks from the perspectives of the structure of the multilayer networks and the structure of the integrated network on two empirical data. The results verify the correlation between attribute networks and relationship networks, and give more insights into the importance of the proposed relationships-attributes-based model of multilayer networks and the positive role of the integrated network in synthesizing the relationships-attributes-based multilayer networks.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115164166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For the distributed, heterogeneous, relational complex data sources of petroleum engineering, we present an oil production engineering semantic-based data integration system (OPSDS). OPSDS establishes a semantic data integration and service system based on domain ontology on the premise of building a global semantic model and realizing the global semantic search. The global semantic data model applied to various oil fields is set up by ontology extraction, ontology evolution, ontology combination and semantic constraints. The domain-oriented data integration to provide the data access and shared service is realized by ontology mapping, query transformation, and data cleaning. Users and upper applications can have a direct access to underlying complex data sources in times of need through the global semantic data model, and the cleaned data can be returned in a unified format. OPSDS has been realized and got extensive use in many platforms of China National Petroleum Corporation(CNPC). It has been found that the method can not only provide the comprehensive and real-time data support for oil and gas wells, but also improve the production and recovery efficiency with good application.
{"title":"OPSDS: A Semantic Data Integration and Service System Based on Domain Ontology","authors":"Xin Liu, Chungjin Hu, Jianyi Huang, Feng Liu","doi":"10.1109/DSC.2016.15","DOIUrl":"https://doi.org/10.1109/DSC.2016.15","url":null,"abstract":"For the distributed, heterogeneous, relational complex data sources of petroleum engineering, we present an oil production engineering semantic-based data integration system (OPSDS). OPSDS establishes a semantic data integration and service system based on domain ontology on the premise of building a global semantic model and realizing the global semantic search. The global semantic data model applied to various oil fields is set up by ontology extraction, ontology evolution, ontology combination and semantic constraints. The domain-oriented data integration to provide the data access and shared service is realized by ontology mapping, query transformation, and data cleaning. Users and upper applications can have a direct access to underlying complex data sources in times of need through the global semantic data model, and the cleaned data can be returned in a unified format. OPSDS has been realized and got extensive use in many platforms of China National Petroleum Corporation(CNPC). It has been found that the method can not only provide the comprehensive and real-time data support for oil and gas wells, but also improve the production and recovery efficiency with good application.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"267 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122982954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao An, Jiuming Huang, Shoufeng Chang, Zhijie Huang
Modeling sentence similarity all along is a challengeable task in the field of natural language processing (NLP), since ambiguity and variability of linguistic expression. Specifically, in the field of community question answering (CQA), homologous hotspot is focusing on question retrieval. To get the most similar question compared with user's query, we proposed a question model building with Bidirectional Long Short-Term Memory (BLSTM) neural networks, which as well can be used in other fields, such as sentence similarity computation, paraphrase detection, question answering and so on. We evaluated our model in labeled Yahoo! Answers data, and results show that our method achieves significant improvement over existing methods without using external resources, such as WordNet or parsers.
由于语言表达的模糊性和可变性,句子相似度建模一直是自然语言处理(NLP)领域的一项具有挑战性的任务。具体来说,在社区问答(CQA)领域,相应的热点集中在问题检索方面。为了获得与用户查询最相似的问题,我们提出了一种基于双向长短期记忆(Bidirectional Long - short - Memory, BLSTM)神经网络的问题模型构建方法,该方法也可用于句子相似度计算、意译检测、问题回答等领域。我们在标记为Yahoo!回答数据,结果表明我们的方法在不使用外部资源(如WordNet或解析器)的情况下比现有方法取得了显着改进。
{"title":"Question Similarity Modeling with Bidirectional Long Short-Term Memory Neural Network","authors":"Chao An, Jiuming Huang, Shoufeng Chang, Zhijie Huang","doi":"10.1109/DSC.2016.13","DOIUrl":"https://doi.org/10.1109/DSC.2016.13","url":null,"abstract":"Modeling sentence similarity all along is a challengeable task in the field of natural language processing (NLP), since ambiguity and variability of linguistic expression. Specifically, in the field of community question answering (CQA), homologous hotspot is focusing on question retrieval. To get the most similar question compared with user's query, we proposed a question model building with Bidirectional Long Short-Term Memory (BLSTM) neural networks, which as well can be used in other fields, such as sentence similarity computation, paraphrase detection, question answering and so on. We evaluated our model in labeled Yahoo! Answers data, and results show that our method achieves significant improvement over existing methods without using external resources, such as WordNet or parsers.","PeriodicalId":295898,"journal":{"name":"2016 IEEE First International Conference on Data Science in Cyberspace (DSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125514862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}