首页 > 最新文献

2013 International Conference on Information Science and Cloud Computing Companion最新文献

英文 中文
An Improved Association Rules Mining Algorithm Based on Power Set and Hadoop 基于Power Set和Hadoop的改进关联规则挖掘算法
W. Mao, Weibin Guo
The association rules mining has an very important impact in data mining. As the rapid growth of datasets, the required memory increase seriously and the operating efficiency declines rapidly. Cloud computing provides efficient and cheap solutions to analyze and implement the association rules mining algorithms in parallel. This paper proposes an improved association mining algorithm based on power set and MapReduce programming model, which can process massive datasets with a cluster of machines on Hadoop platform. The results of the numerical experiments show that the proposed algorithm can achieve higher efficiency in the association rules mining.
关联规则挖掘在数据挖掘中有着非常重要的作用。随着数据集的快速增长,对内存的需求严重增加,操作效率迅速下降。云计算为并行分析和实现关联规则挖掘算法提供了高效、廉价的解决方案。本文提出了一种改进的基于幂集和MapReduce编程模型的关联挖掘算法,该算法可以在Hadoop平台上处理大量机器集群的数据集。数值实验结果表明,该算法能够达到较高的关联规则挖掘效率。
{"title":"An Improved Association Rules Mining Algorithm Based on Power Set and Hadoop","authors":"W. Mao, Weibin Guo","doi":"10.1109/ISCC-C.2013.39","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.39","url":null,"abstract":"The association rules mining has an very important impact in data mining. As the rapid growth of datasets, the required memory increase seriously and the operating efficiency declines rapidly. Cloud computing provides efficient and cheap solutions to analyze and implement the association rules mining algorithms in parallel. This paper proposes an improved association mining algorithm based on power set and MapReduce programming model, which can process massive datasets with a cluster of machines on Hadoop platform. The results of the numerical experiments show that the proposed algorithm can achieve higher efficiency in the association rules mining.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129109040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Recognition Methods of Housing Vacancy Based on Digital Image Processing 基于数字图像处理的房屋空置率识别方法
Wei Yao, Guifa Teng, Hui Li
This paper, based on computer image processing technology, researches the statistical method to the housing vacancy rate, making use of residential building at night images. This method needs three steps, the first step is image preprocessing, to enhance, denoise and correct the building image in the night scene, using the methods of histogram equalization, wavelet transform, the Radon transform and the connection point. The second step is the image threshold segmentation, to segment the images of dark and bright windows with the fixed threshold method and improve the between-cluster variance method. The third step is through the image fusion technology, making use of closed area centroid coordinates in the horizontal and vertical coordinates from big to small order, then determining the location and the number of dark and bright windows, and finally concluding the vacancy rate. Finally, to achieve the hybrid programming of Matlab and Visual c++ by using the application of Matrix, we realize the above functions. We make comparative analysis to the conclusions from this method, and by comparing with the present commonly used methods, we verify the feasibility of the proposed method in this paper.
本文以计算机图像处理技术为基础,利用住宅建筑的夜间图像,研究了住宅空置率的统计方法。该方法分为三个步骤,第一步是图像预处理,利用直方图均衡化、小波变换、Radon变换和连接点等方法对夜景中的建筑物图像进行增强、去噪和校正。第二步是图像阈值分割,采用固定阈值法对暗窗和亮窗图像进行分割,并对聚类间方差法进行改进。第三步是通过图像融合技术,利用水平和垂直坐标中由大到小的封闭区域质心坐标,确定暗窗和亮窗的位置和数量,最后得出空置率。最后,利用矩阵的应用,实现了Matlab与Visual c++的混合编程,实现了上述功能。对该方法得出的结论进行了对比分析,并与目前常用的方法进行了比较,验证了本文所提出方法的可行性。
{"title":"Recognition Methods of Housing Vacancy Based on Digital Image Processing","authors":"Wei Yao, Guifa Teng, Hui Li","doi":"10.1109/ISCC-C.2013.106","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.106","url":null,"abstract":"This paper, based on computer image processing technology, researches the statistical method to the housing vacancy rate, making use of residential building at night images. This method needs three steps, the first step is image preprocessing, to enhance, denoise and correct the building image in the night scene, using the methods of histogram equalization, wavelet transform, the Radon transform and the connection point. The second step is the image threshold segmentation, to segment the images of dark and bright windows with the fixed threshold method and improve the between-cluster variance method. The third step is through the image fusion technology, making use of closed area centroid coordinates in the horizontal and vertical coordinates from big to small order, then determining the location and the number of dark and bright windows, and finally concluding the vacancy rate. Finally, to achieve the hybrid programming of Matlab and Visual c++ by using the application of Matrix, we realize the above functions. We make comparative analysis to the conclusions from this method, and by comparing with the present commonly used methods, we verify the feasibility of the proposed method in this paper.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127751138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Instant-Based Qur'an Memorizer Application Interface 基于即时的《古兰经》记忆应用界面
Z. Adhoni, H. Al Hamad, A. A. Siddiqi
In this paper, we describe an Instant-Based Qur'an Memorizer Application Interface, which aims at providing a unifying framework for building Qur'an memorizer application. It includes all features for memorizing the Qur'an, and it is feasible to be used in latest handsets. Its unique feature of instant creation gives the user to memorize any Surah or Ayah from the Qur'an. We describe the core components and design patterns of the proposed memorizer with emphasis on key design criteria. These criteria aim at providing the necessary scalability and performance on the one hand, and quality assurance of the Qur'an text on the other.
本文描述了一个基于即时的《古兰经》记忆库应用接口,旨在为构建《古兰经》记忆库应用提供一个统一的框架。它包含了记忆古兰经的所有功能,并且可以在最新的手机中使用。其独特的即时创建功能使用户能够记住古兰经中的任何苏拉或Ayah。我们描述了该记忆器的核心组件和设计模式,重点介绍了关键的设计标准。这些标准旨在一方面提供必要的可扩展性和性能,另一方面保证古兰经文本的质量。
{"title":"An Instant-Based Qur'an Memorizer Application Interface","authors":"Z. Adhoni, H. Al Hamad, A. A. Siddiqi","doi":"10.1109/ISCC-C.2013.14","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.14","url":null,"abstract":"In this paper, we describe an Instant-Based Qur'an Memorizer Application Interface, which aims at providing a unifying framework for building Qur'an memorizer application. It includes all features for memorizing the Qur'an, and it is feasible to be used in latest handsets. Its unique feature of instant creation gives the user to memorize any Surah or Ayah from the Qur'an. We describe the core components and design patterns of the proposed memorizer with emphasis on key design criteria. These criteria aim at providing the necessary scalability and performance on the one hand, and quality assurance of the Qur'an text on the other.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114065188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Revealing Research Themes and their Evolutionary Trends Using Bibliometric Data Based on Strategic Diagrams 基于策略图的文献计量数据揭示研究主题及其演化趋势
H. Han, Jie Gui, Shuo Xu
The paper aims to use strategic diagram technique to detect research themes and reveal their evolutionary trends in a scientific field using bibliometric data under practical application. Keywords are selected not only from author-provided and machine-indexed keywords, but also extracted from the full text so as to eliminate the "indexer effect". The keywords are then clustered to detect research themes, which are classified into four categories in a strategic diagram to reveal the research situations according to their strategic positions. Moreover, the strategic diagrams based on analysis of temporal dynamics are used to find out the thematic evolution through the similarity index to detect similar themes of adjacent phases, and the provenance and influence indexes to evaluate interactions of similar themes. Experimental results showed that the method is effective and useful in revealing research themes and their evolutionary trends in a scientific field.
本文的目的是在实际应用中,利用文献计量学数据,运用策略图技术来发现科学领域的研究主题,揭示其演变趋势。关键词不仅从作者提供的和机器索引的关键词中选择,而且从全文中提取,以消除“索引器效应”。然后对关键词进行聚类,以检测研究主题,并根据其战略位置在战略图中分为四类,以揭示研究情况。利用基于时间动态分析的策略图,通过相似度指数来发现相邻阶段的相似主题,通过物源和影响指数来评价相似主题的相互作用,来了解主题的演变。实验结果表明,该方法在揭示科学领域的研究主题及其演变趋势方面是有效的。
{"title":"Revealing Research Themes and their Evolutionary Trends Using Bibliometric Data Based on Strategic Diagrams","authors":"H. Han, Jie Gui, Shuo Xu","doi":"10.1109/ISCC-C.2013.121","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.121","url":null,"abstract":"The paper aims to use strategic diagram technique to detect research themes and reveal their evolutionary trends in a scientific field using bibliometric data under practical application. Keywords are selected not only from author-provided and machine-indexed keywords, but also extracted from the full text so as to eliminate the \"indexer effect\". The keywords are then clustered to detect research themes, which are classified into four categories in a strategic diagram to reveal the research situations according to their strategic positions. Moreover, the strategic diagrams based on analysis of temporal dynamics are used to find out the thematic evolution through the similarity index to detect similar themes of adjacent phases, and the provenance and influence indexes to evaluate interactions of similar themes. Experimental results showed that the method is effective and useful in revealing research themes and their evolutionary trends in a scientific field.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132556284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Predicting the Subcellular Localization of Proteins with Multiple Sites Based on N-Terminal Signals 基于n端信号的多位点蛋白质亚细胞定位预测
Xumi Qu, Yuehui Chen, Shanping Qiao
Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.
蛋白质的亚细胞定位是生物信息学的一个重要属性,与蛋白质的功能、信号转导和生物过程密切相关。在这一研究领域,近年来取得了很大的进展。然而,目前的预测方法还存在一些不足。如提取的特征信息不够完整,无法达到较高的预测准确率,一些重要的蛋白质信息和氨基酸序列的相关性通常被忽略等。有些蛋白质不只有一个位置,它们可能有两个位置或三个位置,甚至更多,但被认为只有一个位置。在本研究中,我们根据蛋白质序列的n端分选信号将其分成两部分,并分别提取其伪氨基酸组成特征。然后我们使用多标签KNN,简称ML-KNN来处理有两个,三个甚至更多位置的蛋白质。经切刀试验,结果满意。
{"title":"Predicting the Subcellular Localization of Proteins with Multiple Sites Based on N-Terminal Signals","authors":"Xumi Qu, Yuehui Chen, Shanping Qiao","doi":"10.1109/ISCC-C.2013.101","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.101","url":null,"abstract":"Sub cellular localization of proteins is an important attribute in bioinformatics, closely related to its functions, signal transduction and biological process. In this research field, great progress has been made in recent years. However, some shortcomings still exist in the prediction methods. Such as the extracted features information is not complete enough to achieve a higher prediction accuracy rate, some important protein information and the correlation of the amino acid sequence are usually ignored and so on. Some proteins do not have only one location, they may have two locations or three and even more, but were considered to have only one location. In this study, we divide a protein sequence into two parts according to its N-terminal sorting signals and extract their pseudo amino acid composition features respectively. And then we use the multi-label KNN, shorted for ML-KNN to deal with the proteins which have two, three or even more locations. The results are satisfied by Jack Knife test.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131756745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improvement of the Data Mining Algorithm of Rough Set under the Framework of Map/Reduce Map/Reduce框架下粗糙集数据挖掘算法的改进
Ying Wang, Jiqing Liu, Qiongqiong Liu
In order to solve the problem that there is a shortage of space and computing power of the traditional spatial data mining algorithm during the processing for massive spatial data information, a combination of Rough set and distributed framework is used in the process of spatial data mining. In this paper, parallel improvement is taken into the algorithm of the traditional Rough set for spatial data mining based on the basic theory of rough set and the Map/Reduce framework, which is efficient and cheap. Then, a spatial data example is utilized to show the feasibility of the improved parallel algorithm. Empirical results show that the improved parallel algorithm of Rough set for spatial data mining can not only effectively improve the efficiency of the algorithm but also meet the need of people to deal with massive spatial data which is hardly to the algorithm of traditional Rough set. Improved Rough set parallel algorithm for spatial data mining can effectively solve the problem of shortage for massive spatial data storage and computing power mining.
为了解决传统空间数据挖掘算法在处理海量空间数据信息时空间和计算能力不足的问题,在空间数据挖掘过程中采用了粗糙集与分布式框架相结合的方法。本文基于粗糙集的基本理论和Map/Reduce框架,对传统的空间数据挖掘粗糙集算法进行并行改进,提高了算法的效率和成本。最后,通过一个空间数据算例验证了改进算法的可行性。实验结果表明,改进的空间数据挖掘粗糙集并行算法不仅能有效地提高算法效率,而且能满足人们处理海量空间数据的需求,这是传统粗糙集算法难以做到的。改进的粗糙集并行空间数据挖掘算法可以有效地解决海量空间数据存储不足和计算能力挖掘不足的问题。
{"title":"Improvement of the Data Mining Algorithm of Rough Set under the Framework of Map/Reduce","authors":"Ying Wang, Jiqing Liu, Qiongqiong Liu","doi":"10.1109/ISCC-C.2013.80","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.80","url":null,"abstract":"In order to solve the problem that there is a shortage of space and computing power of the traditional spatial data mining algorithm during the processing for massive spatial data information, a combination of Rough set and distributed framework is used in the process of spatial data mining. In this paper, parallel improvement is taken into the algorithm of the traditional Rough set for spatial data mining based on the basic theory of rough set and the Map/Reduce framework, which is efficient and cheap. Then, a spatial data example is utilized to show the feasibility of the improved parallel algorithm. Empirical results show that the improved parallel algorithm of Rough set for spatial data mining can not only effectively improve the efficiency of the algorithm but also meet the need of people to deal with massive spatial data which is hardly to the algorithm of traditional Rough set. Improved Rough set parallel algorithm for spatial data mining can effectively solve the problem of shortage for massive spatial data storage and computing power mining.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131154024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Present Situation and Prospect of Data Warehouse Architecture under the Background of Big Data 大数据背景下数据仓库架构的现状与展望
Lihua Sun, Mu Hu, K. Ren, Mingming Ren
Compared with the traditional data warehouse applications, the big data analysis is characterized by its large data size and complex query analysis. In order to design the data warehouse architecture suitable for the big data analysis, this paper analyzes and summarizes the current mainstream implementation platform-parallel database, MapReduce and the hybrid architecture based on the above-mentioned two architectures. Moreover, it presents respectively their advantages and disadvantages and describes various researches of and the author's efforts on the big data analysis to make prospects for the future study.
与传统的数据仓库应用相比,大数据分析具有数据量大、查询分析复杂的特点。为了设计适合大数据分析的数据仓库架构,本文对当前主流的实现平台——并行数据库、MapReduce以及基于上述两种架构的混合架构进行了分析和总结。并分别介绍了它们的优缺点,描述了大数据分析的各种研究和作者的努力,对未来的研究进行了展望。
{"title":"Present Situation and Prospect of Data Warehouse Architecture under the Background of Big Data","authors":"Lihua Sun, Mu Hu, K. Ren, Mingming Ren","doi":"10.1109/ISCC-C.2013.102","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.102","url":null,"abstract":"Compared with the traditional data warehouse applications, the big data analysis is characterized by its large data size and complex query analysis. In order to design the data warehouse architecture suitable for the big data analysis, this paper analyzes and summarizes the current mainstream implementation platform-parallel database, MapReduce and the hybrid architecture based on the above-mentioned two architectures. Moreover, it presents respectively their advantages and disadvantages and describes various researches of and the author's efforts on the big data analysis to make prospects for the future study.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Reliability Analysis of Embedded Systems 嵌入式系统的可靠性分析
Zhongzheng You
This article starts with the introduction of the essence of the reliability of embedded system. By introducing some characteristics of embedded system such as failure rate, reliability and mean time to failure to analyze the reliability of embedded system, and set up the model of a single system, series system and parallel system. The models founded were simulated with Simulink software. Finally, the results of the simulation and the example validations indicate that series-parallel hybrid structure is very necessary in order to improve the reliability of embedded system and make the system has a long service life.
本文首先介绍了嵌入式系统可靠性的本质。通过引入嵌入式系统的故障率、可靠性和平均无故障时间等特性,分析了嵌入式系统的可靠性,并建立了单系统、串联系统和并联系统的可靠性模型。用Simulink软件对所建立的模型进行了仿真。最后,仿真和实例验证结果表明,为了提高嵌入式系统的可靠性和延长系统的使用寿命,串并联混合结构是非常必要的。
{"title":"The Reliability Analysis of Embedded Systems","authors":"Zhongzheng You","doi":"10.1109/ISCC-C.2013.142","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.142","url":null,"abstract":"This article starts with the introduction of the essence of the reliability of embedded system. By introducing some characteristics of embedded system such as failure rate, reliability and mean time to failure to analyze the reliability of embedded system, and set up the model of a single system, series system and parallel system. The models founded were simulated with Simulink software. Finally, the results of the simulation and the example validations indicate that series-parallel hybrid structure is very necessary in order to improve the reliability of embedded system and make the system has a long service life.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation 基于多特征组合的三阶段聚类框架中文人名消歧
Fei Wang, Yi Yang, Zhaocai Ma, Lian Li
To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
为了解决人名歧义问题,提高人名消歧义的性能,本文提出了一种三阶段聚类算法。在第一阶段,使用组织和位置(OLs)来聚类关于同一个人的文档,因此一些更相似的文本将被分配到一个类别。这个阶段是简单的基于ol相似性的文档聚类。在第二阶段,将聚类文档用作新数据源,从中提取一些新特征(如合著者姓名)。我们使用这些新提取的特征在文档之间进行额外的聚类。同时,提出了一种基于共同作者关系的社会网络构建方法来解决姓名歧义问题。在第三阶段,采用基于内容的层次聚类算法对网页进行聚类,然后对有用内容(包括标题、摘要和关键词)进行分析,消除歧义性名称。实验结果表明,本文提出的三阶段聚类算法可以有效地提高人名消歧的性能。
{"title":"A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation","authors":"Fei Wang, Yi Yang, Zhaocai Ma, Lian Li","doi":"10.1109/ISCC-C.2013.33","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.33","url":null,"abstract":"To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117071730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Commodity Futures Price Prediction and Trading Strategies -- A Signal Noise Difference Approach 商品期货价格预测与交易策略——一种信号噪声差分方法
Jinhao Zheng, Shoukang Peng
This paper introduces the signal noise difference method and applies this method into the commodity futures price prediction. Based on the prediction rules mined from the data of 25 potential prediction indicators of SHFE CU, a corresponding transaction strategy is established. And we use the market data from 2009 to 2013 to test our transaction strategy, which obtains a result of 147.85% annual yield. In addition, several improvements are discussed to optimize this model.
本文介绍了信号噪声差分法,并将其应用于商品期货价格预测中。根据上海期货交易所期货期货25个潜在预测指标数据挖掘的预测规律,建立相应的交易策略。并利用2009 - 2013年的市场数据对我们的交易策略进行了测试,得到了147.85%的年收益率。此外,还讨论了对该模型进行优化的几点改进。
{"title":"Commodity Futures Price Prediction and Trading Strategies -- A Signal Noise Difference Approach","authors":"Jinhao Zheng, Shoukang Peng","doi":"10.1109/ISCC-C.2013.60","DOIUrl":"https://doi.org/10.1109/ISCC-C.2013.60","url":null,"abstract":"This paper introduces the signal noise difference method and applies this method into the commodity futures price prediction. Based on the prediction rules mined from the data of 25 potential prediction indicators of SHFE CU, a corresponding transaction strategy is established. And we use the market data from 2009 to 2013 to test our transaction strategy, which obtains a result of 147.85% annual yield. In addition, several improvements are discussed to optimize this model.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123226319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2013 International Conference on Information Science and Cloud Computing Companion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1