首页 > 最新文献

International journal of database theory and application最新文献

英文 中文
Forensic Analysis of Offline Signatures Using Multilayer Perceptron and Random Forest 基于多层感知机和随机森林的离线签名取证分析
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.13
Abdul Salam Shah, Masood Shah, M. Fayaz, F. Wahid, H. Khan, Asadullah Shah
Forensic applications having great importance in the digital era, for the investigation of different types of crimes. The forensic analysis includes Deoxyribonucleic Acid (DNA) test, crime scene video and images,, forged documents analysis, computer-based data recovery, fingerprint identifications, handwritten signature verification and facial recognition. The signatures are divided into two types i.e. genuine and forgery. The forgery signature can lead to the huge amount of financial losses and create other legal issues as well. The process of forensic investigation for the verification of genune signature and detection of forgery signature in law related departements has been manula and the same can be automated using digital image processing techniques, and automated forensic signature verificatiob applications. The signatures represent any person's authority to the forged signature may also be used in a crime. Research has been done to automate the forensic investigation process, but due to the internal verification of signatures, the automation of signature verification still remains a challenging problem for researchers. In this paper, we have further extended previous research carried out in [1-2] and proposed a Forensic signature verification model based on two classifiers i.e. Multi-layer Perception (MLP) and Random Forest for the classification of genuine and forgery signatures.
在数字时代,法医应用对于调查不同类型的犯罪具有重要意义。法医分析包括脱氧核糖核酸(DNA)测试、犯罪现场视频和图像、伪造文件分析、计算机数据恢复、指纹识别、手写签名验证和面部识别。签名分为真签名和伪造签名两种。伪造签名会导致巨额的经济损失,还会引发其他法律问题。法律相关部门对真品签名的鉴定和伪造签名的检测的法医调查过程已经程序化,同样可以利用数字图像处理技术实现自动化的法医签名鉴定应用。签名代表任何人的权力,伪造的签名也可以用于犯罪。法医调查过程的自动化已经有了研究,但由于签名的内部验证,签名验证的自动化仍然是一个具有挑战性的问题。在本文中,我们进一步扩展了文献[1-2]中的研究,提出了一种基于多层感知(Multi-layer Perception, MLP)和随机森林两种分类器的法医学签名验证模型,用于真伪签名的分类。
{"title":"Forensic Analysis of Offline Signatures Using Multilayer Perceptron and Random Forest","authors":"Abdul Salam Shah, Masood Shah, M. Fayaz, F. Wahid, H. Khan, Asadullah Shah","doi":"10.14257/IJDTA.2017.10.1.13","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.13","url":null,"abstract":"Forensic applications having great importance in the digital era, for the investigation of different types of crimes. The forensic analysis includes Deoxyribonucleic Acid (DNA) test, crime scene video and images,, forged documents analysis, computer-based data recovery, fingerprint identifications, handwritten signature verification and facial recognition. The signatures are divided into two types i.e. genuine and forgery. The forgery signature can lead to the huge amount of financial losses and create other legal issues as well. The process of forensic investigation for the verification of genune signature and detection of forgery signature in law related departements has been manula and the same can be automated using digital image processing techniques, and automated forensic signature verificatiob applications. The signatures represent any person's authority to the forged signature may also be used in a crime. Research has been done to automate the forensic investigation process, but due to the internal verification of signatures, the automation of signature verification still remains a challenging problem for researchers. In this paper, we have further extended previous research carried out in [1-2] and proposed a Forensic signature verification model based on two classifiers i.e. Multi-layer Perception (MLP) and Random Forest for the classification of genuine and forgery signatures.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"94 1","pages":"139-148"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76618218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mapping the Semi-Structured Data to the Structured Data for Inverted Index Compression 倒排索引压缩中半结构化数据到结构化数据的映射
Pub Date : 2017-01-31 DOI: 10.14257/IJDTA.2017.10.1.22
B. Usharani
Semi-structured data is used for representing the data over the Internet. In this paper, an implementation is given for, how to convert XML documents to SQL tables, processing the relational data to get the desired result using SQL queries, and stores the results back to XML and finally displaying the XML data in the web page.
半结构化数据用于表示Internet上的数据。本文给出了如何将XML文档转换为SQL表,使用SQL查询对关系数据进行处理,得到期望的结果,并将结果存储回XML,最后在web页面中显示XML数据的实现。
{"title":"Mapping the Semi-Structured Data to the Structured Data for Inverted Index Compression","authors":"B. Usharani","doi":"10.14257/IJDTA.2017.10.1.22","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.1.22","url":null,"abstract":"Semi-structured data is used for representing the data over the Internet. In this paper, an implementation is given for, how to convert XML documents to SQL tables, processing the relational data to get the desired result using SQL queries, and stores the results back to XML and finally displaying the XML data in the web page.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"9 1","pages":"235-244"},"PeriodicalIF":0.0,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79667103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Research On Mobile Medical Integration System for Children 儿童移动医疗集成系统研究
Pub Date : 2016-12-31 DOI: 10.14257/ijdta.2016.9.12.22
Zhou Lianru, Jiao Xiongfei, L. Lijun, Liang Yuanqin
In view of the valuable medical resources at home and abroad, especially considering the fact that children's medical resources can't meet the needs,the author designed the mobile medical integration system with the help of mobile Internet platform and cloud computing platform.The system is divided into two parts of the mobile terminal and cloud computing platform, which are used respectively by the guardian and the doctor. The health monitoring terminal designed for children are wearable watches whileAPP is developed for the guardian and the doctors.Cloud platform designed a platform for data storage, message processing, functional applications and other modules, forming a “cloud+client” service model.This system makes the children's disease prevention, emergency treatment and medical treatment behavior become more convenient and fast, protects the healthy growth of children, provides reference and solution for the future development of medical care.
鉴于国内外宝贵的医疗资源,特别是考虑到儿童医疗资源无法满足需求,笔者借助移动互联网平台和云计算平台设计了移动医疗集成系统。系统分为移动端和云计算平台两部分,分别由监护人和医生使用。为儿童设计的健康监测终端是可穿戴手表,为监护人和医生开发的app。云平台设计了数据存储、消息处理、功能应用等模块的平台,形成了“云+客户端”的服务模式。该系统使儿童的疾病预防、急救和就医行为变得更加方便快捷,保护了儿童的健康成长,为未来医疗保健的发展提供了参考和解决方案。
{"title":"Research On Mobile Medical Integration System for Children","authors":"Zhou Lianru, Jiao Xiongfei, L. Lijun, Liang Yuanqin","doi":"10.14257/ijdta.2016.9.12.22","DOIUrl":"https://doi.org/10.14257/ijdta.2016.9.12.22","url":null,"abstract":"In view of the valuable medical resources at home and abroad, especially considering the fact that children's medical resources can't meet the needs,the author designed the mobile medical integration system with the help of mobile Internet platform and cloud computing platform.The system is divided into two parts of the mobile terminal and cloud computing platform, which are used respectively by the guardian and the doctor. The health monitoring terminal designed for children are wearable watches whileAPP is developed for the guardian and the doctors.Cloud platform designed a platform for data storage, message processing, functional applications and other modules, forming a “cloud+client” service model.This system makes the children's disease prevention, emergency treatment and medical treatment behavior become more convenient and fast, protects the healthy growth of children, provides reference and solution for the future development of medical care.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"26 1","pages":"241-252"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72827213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying Z-Curve Technique to Compute Skyline Set in Multi Criteria Decision Making System 应用z曲线技术计算多准则决策系统中的天际线集
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.02
T. V. Saradhi, K. Subrahmanyam, VENKATESWARA RAO PEDDADA, Hye Jin Kim
The skyline queries are the best tools to be used in distributed multi criteria decision making of web based applications for user commendations. However, as the Data dimensions are increasing size of dominance set and skyline set is also increasing. Increasing dimensionality becomes the major problem with real word databases. In skyline computation major cost depends on finding dominance tests between high dimensional objects and the order in which they are accessing. Space filling Z-curve is the best suitable way to address the challenges in skyline computation. In this proposed work, we incorporated Z-curve with optimized skyline boundary detection algorithm to effective access and early pruning. In this paper efficient hybrid index structure was proposed which takes the advantage of sorting and partition approaches to improve the storage and search efficiency. Experimental results show that our propose approach is better than the previous static skyline computation techniques in terms of searching and finding skyline set.
天际线查询是用于基于web的应用程序的分布式多标准决策的最佳工具。然而,随着数据维度的增加,优势集和天际线集的大小也在增加。增加维数成为现实词数据库的主要问题。在天际线计算中,主要的成本取决于寻找高维对象之间的优势测试及其访问顺序。空间填充z曲线是解决天际线计算挑战的最合适的方法。在本文中,我们将z曲线与优化的天际线边界检测算法结合起来,进行有效的访问和早期修剪。本文提出了一种高效的混合索引结构,利用排序和分区的方法来提高存储和搜索效率。实验结果表明,该方法在搜索和发现天际线集方面优于以往的静态天际线计算技术。
{"title":"Applying Z-Curve Technique to Compute Skyline Set in Multi Criteria Decision Making System","authors":"T. V. Saradhi, K. Subrahmanyam, VENKATESWARA RAO PEDDADA, Hye Jin Kim","doi":"10.14257/IJDTA.2016.9.12.02","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.02","url":null,"abstract":"The skyline queries are the best tools to be used in distributed multi criteria decision making of web based applications for user commendations. However, as the Data dimensions are increasing size of dominance set and skyline set is also increasing. Increasing dimensionality becomes the major problem with real word databases. In skyline computation major cost depends on finding dominance tests between high dimensional objects and the order in which they are accessing. Space filling Z-curve is the best suitable way to address the challenges in skyline computation. In this proposed work, we incorporated Z-curve with optimized skyline boundary detection algorithm to effective access and early pruning. In this paper efficient hybrid index structure was proposed which takes the advantage of sorting and partition approaches to improve the storage and search efficiency. Experimental results show that our propose approach is better than the previous static skyline computation techniques in terms of searching and finding skyline set.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"6 1","pages":"9-22"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82349962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Pig Vs. Hive Use Case Analysis 猪与蜂箱用例分析
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.24
D. Kendal, Oded Koren, N. Perel
Corporations are changing their practices to data-driven big data initiatives, as big data analytics has provided companies with the ability to grow their businesses and increase competition. As the importance of data analytics grew, so accordingly did the size of the data to analyze, thus demanding a more powerful data platform. This paper shows a case study of two High Level Query Languages that are constructed on top of Hadoop MapReduce; Pig and Hive. By creating a query in each query language, both resulting in an identical output, and by running each query 30 times on 2 different sized files (120 runs total), this comparison provides a statistically significant conclusion.
随着大数据分析为企业提供了发展业务和增加竞争的能力,企业正在将其实践转变为数据驱动的大数据计划。随着数据分析重要性的提高,需要分析的数据量也随之增加,因此需要一个更强大的数据平台。本文展示了基于Hadoop MapReduce构建的两种高级查询语言的案例研究;小猪和蜂巢。通过在每种查询语言中创建查询,两种查询语言都会产生相同的输出,并且在两个不同大小的文件上运行每个查询30次(总共运行120次),这种比较提供了统计上显著的结论。
{"title":"Pig Vs. Hive Use Case Analysis","authors":"D. Kendal, Oded Koren, N. Perel","doi":"10.14257/IJDTA.2016.9.12.24","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.24","url":null,"abstract":"Corporations are changing their practices to data-driven big data initiatives, as big data analytics has provided companies with the ability to grow their businesses and increase competition. As the importance of data analytics grew, so accordingly did the size of the data to analyze, thus demanding a more powerful data platform. This paper shows a case study of two High Level Query Languages that are constructed on top of Hadoop MapReduce; Pig and Hive. By creating a query in each query language, both resulting in an identical output, and by running each query 30 times on 2 different sized files (120 runs total), this comparison provides a statistically significant conclusion.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"12 1 1","pages":"267-276"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89120449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Research on the E-commerce Platform Performance and Green Supply Chain based on Data Mining and SVM 基于数据挖掘和支持向量机的电子商务平台绩效与绿色供应链研究
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.14
Ying Xu, Xuemei Zhang, Hong Zhang
In the network environment, supply chain management has greatly reduced the product development cycle, reduce the inventory. With the continuous development of information technology, e-commerce logistics platform has become the main factor affecting the development of logistics industry. In this paper, the authors research on the E-commerce platform performance and green supply chain based on data mining and SVM. The green supply chain considers the environmental problems in every link of the supply chain, and promotes the coordinated development of economy and environment. The result shows that the most critical factor that affects the satisfaction of consumer to B2C e-commerce platform is the accurate, complete and reliable logistics service.
在网络环境下,供应链管理大大缩短了产品开发周期,减少了库存。随着信息技术的不断发展,电子商务物流平台已经成为影响物流业发展的主要因素。本文基于数据挖掘和支持向量机对电子商务平台绩效与绿色供应链进行了研究。绿色供应链考虑到供应链各个环节的环境问题,促进经济与环境的协调发展。结果表明,影响消费者对B2C电子商务平台满意度的最关键因素是准确、完整、可靠的物流服务。
{"title":"Research on the E-commerce Platform Performance and Green Supply Chain based on Data Mining and SVM","authors":"Ying Xu, Xuemei Zhang, Hong Zhang","doi":"10.14257/IJDTA.2016.9.12.14","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.14","url":null,"abstract":"In the network environment, supply chain management has greatly reduced the product development cycle, reduce the inventory. With the continuous development of information technology, e-commerce logistics platform has become the main factor affecting the development of logistics industry. In this paper, the authors research on the E-commerce platform performance and green supply chain based on data mining and SVM. The green supply chain considers the environmental problems in every link of the supply chain, and promotes the coordinated development of economy and environment. The result shows that the most critical factor that affects the satisfaction of consumer to B2C e-commerce platform is the accurate, complete and reliable logistics service.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"1 1","pages":"141-150"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88754052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Text Clustering Algorithm based on Weeds and Differential Optimization 基于杂草和微分优化的文本聚类算法
Pub Date : 2016-12-31 DOI: 10.14257/ijdta.2016.9.12.12
Lipeng Yang, Fuzhang Wang, Chunmei Fan
Invasive weed optimization (IWO) is a swarm optimization algorithm with both explorative and exploitive power where the diverisity of the population is obtained by allowing the reproduction and mutation of individuals with poor fitness .Differential optimization algorithm is a random parallel algorithm according to a vector change that can make individuals change toward outstanding individuals with global convergence.For k-means algorithm , the traditional algorirhm is prone to get stuck at local optimum and is sensitive to random initialization. Based on the aforementiond background a novel optimization algorithm based hybriding DE and IWO which denoted IWODE-KM is employed to optimize the parameters of k-means and is further applied to chinese text clustering. Experiment results shows that the proposed method outperforms both of its ancestors.
入侵杂草优化算法(Invasive weed optimization, IWO)是一种具有探索性和剥削性的群体优化算法,通过允许适应度较差的个体繁殖和突变来获得种群的多样性。差分优化算法是一种随机并行算法,根据矢量变化使个体向全局收敛的优秀个体变化。对于k-means算法,传统算法容易陷入局部最优且对随机初始化敏感。基于上述背景,提出了一种新的基于混合DE和IWO的优化算法(IWODE-KM)来优化k-means参数,并将其进一步应用于中文文本聚类。实验结果表明,该方法的性能优于前两种方法。
{"title":"A Text Clustering Algorithm based on Weeds and Differential Optimization","authors":"Lipeng Yang, Fuzhang Wang, Chunmei Fan","doi":"10.14257/ijdta.2016.9.12.12","DOIUrl":"https://doi.org/10.14257/ijdta.2016.9.12.12","url":null,"abstract":"Invasive weed optimization (IWO) is a swarm optimization algorithm with both explorative and exploitive power where the diverisity of the population is obtained by allowing the reproduction and mutation of individuals with poor fitness .Differential optimization algorithm is a random parallel algorithm according to a vector change that can make individuals change toward outstanding individuals with global convergence.For k-means algorithm , the traditional algorirhm is prone to get stuck at local optimum and is sensitive to random initialization. Based on the aforementiond background a novel optimization algorithm based hybriding DE and IWO which denoted IWODE-KM is employed to optimize the parameters of k-means and is further applied to chinese text clustering. Experiment results shows that the proposed method outperforms both of its ancestors.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"21 1","pages":"121-130"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74959128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Green Mining Algorithm for Big Data Based on Random Matrix 基于随机矩阵的大数据绿色挖掘算法
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.08
Wang Can-wei
Due to big data with related multi-dimensional characteristics, the effective means how to build processing mechanisms and algorithms are still problems; so that the algorithms on big data processing huge resources and time cost of computing, resulting in wasting of energy; for this problem the present study proposes a large data processing algorithm of random matrix theory application, can effectively improve the processing efficiency, thereby increasing the utilization of energy. Results show that the proposed algorithm can effectively reduce the amount of calculation, thus saving and calculating the required energy.
由于大数据具有相关的多维特性,如何构建处理机制和算法的有效手段仍是问题;使算法对大数据的处理耗费巨大的计算资源和时间,造成能量的浪费;针对这一问题,本研究提出了一种应用随机矩阵理论的大数据处理算法,可以有效地提高处理效率,从而提高能源利用率。结果表明,该算法可以有效地减少计算量,从而节省计算所需的能量。
{"title":"Green Mining Algorithm for Big Data Based on Random Matrix","authors":"Wang Can-wei","doi":"10.14257/IJDTA.2016.9.12.08","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.08","url":null,"abstract":"Due to big data with related multi-dimensional characteristics, the effective means how to build processing mechanisms and algorithms are still problems; so that the algorithms on big data processing huge resources and time cost of computing, resulting in wasting of energy; for this problem the present study proposes a large data processing algorithm of random matrix theory application, can effectively improve the processing efficiency, thereby increasing the utilization of energy. Results show that the proposed algorithm can effectively reduce the amount of calculation, thus saving and calculating the required energy.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"9 1","pages":"79-88"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77923873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastMap Projection for High-Dimensional Data: A Cluster Ensemble Approach 高维数据的快速映射投影:一种聚类集成方法
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.28
Imran Khan, Kamen Ivanov, Qingshan Jiang
High-dimensional data with many features present a significant challenge to current clustering algorithms.Sparsity, noise, and correlation of features are common properties of high-dimensional data.Another essential aspect is that clusters in such data often exist in various subspaces. Ensemble clusteringis emerging as a leading technique for improving robustness, stability, and accuracy of high-dimensional data clusterings. In this paper, we propose FastMap projection for generating subspace component data sets from high-dimensional data. By using component data sets, we create component clusterings and provides a new objective function that ensembles them by maximizing the average similarity between component clusterings and final clustering. Compared with the random sampling and random projection methods, the component clusterings by FastMap projection showed high average clustering accuracy without sacrificing clustering diversity in synthetic data analysis. We conducted a series of experimentson real-world data sets from microarray, text, and image domains employing three subspace component data generation methods, three consensus functions, and a proposed objective function for ensemble clustering. The experiment results consistently demonstrated that the FastMap projection method with the proposed objection function provided the best ensemble clustering results for all data sets.
具有多种特征的高维数据对当前的聚类算法提出了重大挑战。特征的稀疏性、噪声和相关性是高维数据的共同特性。另一个重要方面是,此类数据中的集群通常存在于不同的子空间中。集成聚类正在成为提高高维数据聚类的鲁棒性、稳定性和准确性的主要技术。在本文中,我们提出了FastMap投影,用于从高维数据生成子空间组件数据集。通过使用组件数据集,我们创建组件聚类,并提供一个新的目标函数,通过最大化组件聚类和最终聚类之间的平均相似度来集成它们。与随机抽样和随机投影方法相比,FastMap投影方法在不牺牲聚类多样性的情况下,具有较高的平均聚类精度。我们对来自微阵列、文本和图像领域的真实数据集进行了一系列实验,采用了三种子空间分量数据生成方法、三种共识函数和一个集成聚类的目标函数。实验结果一致表明,基于目标函数的FastMap投影方法对所有数据集的集成聚类效果最好。
{"title":"FastMap Projection for High-Dimensional Data: A Cluster Ensemble Approach","authors":"Imran Khan, Kamen Ivanov, Qingshan Jiang","doi":"10.14257/IJDTA.2016.9.12.28","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.28","url":null,"abstract":"High-dimensional data with many features present a significant challenge to current clustering algorithms.Sparsity, noise, and correlation of features are common properties of high-dimensional data.Another essential aspect is that clusters in such data often exist in various subspaces. Ensemble clusteringis emerging as a leading technique for improving robustness, stability, and accuracy of high-dimensional data clusterings. In this paper, we propose FastMap projection for generating subspace component data sets from high-dimensional data. By using component data sets, we create component clusterings and provides a new objective function that ensembles them by maximizing the average similarity between component clusterings and final clustering. Compared with the random sampling and random projection methods, the component clusterings by FastMap projection showed high average clustering accuracy without sacrificing clustering diversity in synthetic data analysis. We conducted a series of experiments\u0000on real-world data sets from microarray, text, and image domains employing three subspace component data generation methods, three consensus functions, and a proposed objective function for ensemble clustering. The experiment results consistently demonstrated that the FastMap projection method with the proposed objection function provided the best ensemble clustering results for all data sets.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"18 1","pages":"311-330"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73098036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Design of the Multi-Scale Data Fusion Algorithm Based on Time Series Analysis 基于时间序列分析的多尺度数据融合算法设计
Pub Date : 2016-12-31 DOI: 10.14257/IJDTA.2016.9.12.09
Chunxia Wang
Time series is an indicator at different times on different values, arranged in chronological sequence. The basic idea of the multi-scale analysis by orthogonal transformation, and it is such as wavelet transform signal decomposition analysis on different scales. The timing analysis method is achieved through the model method. The process parameters of the dynamic data time-domain analysis method is a parametric model to fit the observed data, and then use this model to analyze the observational data and produce data system. The paper presents the design of the multi-scale data fusion algorithm based on time series analysis. Finally, the advantages of the new algorithm are elaborated from the estimation accuracy and simulation demonstrated the effectiveness of the new algorithm.
时间序列是指在不同时间对不同数值的指示,按时间顺序排列。多尺度分析的基本思想是通过正交变换,以及小波变换等信号在不同尺度上的分解分析。时序分析方法是通过模型法实现的。动态数据时域分析方法的过程参数是对观测数据进行参数化拟合,然后利用该模型对观测数据进行分析,生成数据系统。提出了一种基于时间序列分析的多尺度数据融合算法。最后从估计精度和仿真验证了新算法的有效性两方面阐述了新算法的优点。
{"title":"The Design of the Multi-Scale Data Fusion Algorithm Based on Time Series Analysis","authors":"Chunxia Wang","doi":"10.14257/IJDTA.2016.9.12.09","DOIUrl":"https://doi.org/10.14257/IJDTA.2016.9.12.09","url":null,"abstract":"Time series is an indicator at different times on different values, arranged in chronological sequence. The basic idea of the multi-scale analysis by orthogonal transformation, and it is such as wavelet transform signal decomposition analysis on different scales. The timing analysis method is achieved through the model method. The process parameters of the dynamic data time-domain analysis method is a parametric model to fit the observed data, and then use this model to analyze the observational data and produce data system. The paper presents the design of the multi-scale data fusion algorithm based on time series analysis. Finally, the advantages of the new algorithm are elaborated from the estimation accuracy and simulation demonstrated the effectiveness of the new algorithm.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"84 1","pages":"89-100"},"PeriodicalIF":0.0,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76141569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International journal of database theory and application
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1