首页 > 最新文献

2010 Ninth International Conference on Machine Learning and Applications最新文献

英文 中文
Discovering Knowledge Rules with Multi-Objective Evolutionary Computing 基于多目标进化计算的知识规则发现
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.25
Rafael Giusti, Gustavo E. A. P. A. Batista
Most Machine Learning systems target into inducing classifiers with optimal coverage and precision measures. Although this constitutes a good approach for prediction, it might not provide good results when the user is more interested in description. In this case, the induced models should present other properties such as novelty, interestingness and so forth. In this paper we present a research work based in Multi-Objective Evolutionary Computing to construct individual knowledge rules targeting arbitrary user-defined criteria via objective quality measures such as precision, support, novelty etc. This paper also presents a comparison among multi-objective and ranking composition techniques. It is shown that multi-objective-based methods attain better results than ranking-based methods, both in terms of solution dominance and diversity of solutions in the Pareto front.
大多数机器学习系统的目标是引入具有最佳覆盖率和精度测量的分类器。虽然这是一种很好的预测方法,但当用户对描述更感兴趣时,它可能无法提供良好的结果。在这种情况下,诱导模型应该呈现其他属性,如新颖性、趣味性等。本文提出了一种基于多目标进化计算的研究工作,通过精度、支持度、新颖性等客观质量度量来构建针对任意用户定义标准的单个知识规则。本文还比较了多目标合成技术和排序合成技术。结果表明,基于多目标的方法在Pareto前沿的解优势度和解多样性方面都优于基于排序的方法。
{"title":"Discovering Knowledge Rules with Multi-Objective Evolutionary Computing","authors":"Rafael Giusti, Gustavo E. A. P. A. Batista","doi":"10.1109/ICMLA.2010.25","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.25","url":null,"abstract":"Most Machine Learning systems target into inducing classifiers with optimal coverage and precision measures. Although this constitutes a good approach for prediction, it might not provide good results when the user is more interested in description. In this case, the induced models should present other properties such as novelty, interestingness and so forth. In this paper we present a research work based in Multi-Objective Evolutionary Computing to construct individual knowledge rules targeting arbitrary user-defined criteria via objective quality measures such as precision, support, novelty etc. This paper also presents a comparison among multi-objective and ranking composition techniques. It is shown that multi-objective-based methods attain better results than ranking-based methods, both in terms of solution dominance and diversity of solutions in the Pareto front.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131372601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Smoothing Gene Expression Using Biological Networks 利用生物网络平滑基因表达
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.85
Yue Fan, M. Kon, Shinuk Kim, C. DeLisi
Gene expression (micro array) data have been used widely in bioinformatics. The expression data of a large number of genes from small numbers of subjects are used to identify informative biomarkers that may predict or help in diagnosing some disorders. More recently, increasing amounts of information from underlying relationships of the expressed genes have become available, and workers have started to investigate algorithms which can use such a priori information to improve classification or regression based on gene expression. In this paper, we describe three novel machine learning algorithms for regularizing (smoothing) micro array expression values defined on gene sets with known prior network or metric structures, and which exploit this gene interaction information. These regularized expression values can be used with any machine classifier with the goal of better classification. In this paper, standard smoothing (denoising) techniques previously developed for functions on Euclidean spaces are extended to allow smoothing of micro array expression feature vectors using distance measures defined by biological networks. Such a priori smoothing (denoising) of the feature vectors using metrics on the index space (here the space of genes) yields better signal to noise ratios in the data. When tested on two breast cancer datasets, support vector machine classifiers trained on the smoothed expression values obtain better areas under ROC curves in two cancer datasets.
基因表达(微阵列)数据在生物信息学中有着广泛的应用。来自少数受试者的大量基因的表达数据被用于识别可能预测或帮助诊断某些疾病的信息性生物标志物。最近,越来越多的来自表达基因的潜在关系的信息已经可用,并且工作者已经开始研究可以使用这种先验信息来改进基于基因表达的分类或回归的算法。在本文中,我们描述了三种新的机器学习算法,用于正则化(平滑)定义在具有已知先验网络或度量结构的基因集上的微阵列表达值,并利用这些基因相互作用信息。这些正则表达式值可以用于任何机器分类器,以实现更好的分类。在本文中,先前开发的用于欧几里得空间上的函数的标准平滑(去噪)技术被扩展到允许使用由生物网络定义的距离度量平滑微阵列表达特征向量。使用索引空间(这里是基因空间)上的度量对特征向量进行先验平滑(去噪),可以在数据中产生更好的信噪比。在两个乳腺癌数据集上进行测试时,在平滑表达值上训练的支持向量机分类器在两个癌症数据集的ROC曲线下获得了更好的面积。
{"title":"Smoothing Gene Expression Using Biological Networks","authors":"Yue Fan, M. Kon, Shinuk Kim, C. DeLisi","doi":"10.1109/ICMLA.2010.85","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.85","url":null,"abstract":"Gene expression (micro array) data have been used widely in bioinformatics. The expression data of a large number of genes from small numbers of subjects are used to identify informative biomarkers that may predict or help in diagnosing some disorders. More recently, increasing amounts of information from underlying relationships of the expressed genes have become available, and workers have started to investigate algorithms which can use such a priori information to improve classification or regression based on gene expression. In this paper, we describe three novel machine learning algorithms for regularizing (smoothing) micro array expression values defined on gene sets with known prior network or metric structures, and which exploit this gene interaction information. These regularized expression values can be used with any machine classifier with the goal of better classification. In this paper, standard smoothing (denoising) techniques previously developed for functions on Euclidean spaces are extended to allow smoothing of micro array expression feature vectors using distance measures defined by biological networks. Such a priori smoothing (denoising) of the feature vectors using metrics on the index space (here the space of genes) yields better signal to noise ratios in the data. When tested on two breast cancer datasets, support vector machine classifiers trained on the smoothed expression values obtain better areas under ROC curves in two cancer datasets.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125304600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic Batch Size Selection for Batch Mode Active Learning in Biometrics 生物识别中批模式主动学习的动态批大小选择
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.10
Shayok Chakraborty, V. Balasubramanian, S. Panchanathan
Robust biometric recognition is of paramount importance in security and surveillance applications. In face based biometric systems, data is usually collected using a video camera with high frame rate and thus the captured data has high redundancy. Selecting the appropriate instances from this data to update a classification model, is a significant, yet valuable challenge. Active learning methods have gained popularity in identifying the salient and exemplar data instances from superfluous sets. Batch mode active learning schemes attempt to select a batch of samples simultaneously rather than updating the model after selecting every single data point. Existing work on batch mode active learning assume a fixed batch size, which is not a practical assumption in biometric recognition applications. In this paper, we propose a novel framework to dynamically select the batch size using clustering based unsupervised learning techniques. We also present a batch mode active learning strategy specially suited to handle the high redundancy in biometric datasets. The results obtained on the challenging VidTIMIT and MOBIO datasets corroborate the superiority of dynamic batch size selection over static batch size and also certify the potential of the proposed active learning scheme in being used for real world biometric recognition applications.
鲁棒的生物识别在安全和监控应用中具有至关重要的意义。在基于人脸的生物识别系统中,通常使用高帧率的摄像机采集数据,因此捕获的数据具有很高的冗余性。从这些数据中选择合适的实例来更新分类模型是一项重要但有价值的挑战。主动学习方法在从多余的数据集中识别显著和典型数据实例方面得到了广泛的应用。批处理模式主动学习方案试图同时选择一批样本,而不是在选择每个数据点后更新模型。现有的批模式主动学习研究假设了一个固定的批大小,这在生物识别应用中是不现实的。在本文中,我们提出了一个使用基于聚类的无监督学习技术动态选择批大小的新框架。我们还提出了一种批处理模式主动学习策略,特别适用于处理生物特征数据集的高冗余。在具有挑战性的VidTIMIT和MOBIO数据集上获得的结果证实了动态批大小选择比静态批大小选择的优越性,也证明了所提出的主动学习方案在用于现实世界生物识别应用中的潜力。
{"title":"Dynamic Batch Size Selection for Batch Mode Active Learning in Biometrics","authors":"Shayok Chakraborty, V. Balasubramanian, S. Panchanathan","doi":"10.1109/ICMLA.2010.10","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.10","url":null,"abstract":"Robust biometric recognition is of paramount importance in security and surveillance applications. In face based biometric systems, data is usually collected using a video camera with high frame rate and thus the captured data has high redundancy. Selecting the appropriate instances from this data to update a classification model, is a significant, yet valuable challenge. Active learning methods have gained popularity in identifying the salient and exemplar data instances from superfluous sets. Batch mode active learning schemes attempt to select a batch of samples simultaneously rather than updating the model after selecting every single data point. Existing work on batch mode active learning assume a fixed batch size, which is not a practical assumption in biometric recognition applications. In this paper, we propose a novel framework to dynamically select the batch size using clustering based unsupervised learning techniques. We also present a batch mode active learning strategy specially suited to handle the high redundancy in biometric datasets. The results obtained on the challenging VidTIMIT and MOBIO datasets corroborate the superiority of dynamic batch size selection over static batch size and also certify the potential of the proposed active learning scheme in being used for real world biometric recognition applications.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126589556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multimodal Parameter-exploring Policy Gradients 多模态参数探索策略梯度
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.24
Frank Sehnke, Alex Graves, Christian Osendorfer, J. Schmidhuber
Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estimates encountered in normal policy gradient methods. It has been shown to drastically speed up convergence for several large-scale reinforcement learning tasks. However the independent normal distributions used by PGPE to search through parameter space are inadequate for some problems with multimodal reward surfaces. This paper extends the basic PGPE algorithm to use multimodal mixture distributions for each parameter, while remaining efficient. Experimental results on the Rastrigin function and the inverted pendulum benchmark demonstrate the advantages of this modification, with faster convergence to better optima.
策略梯度与参数探索(PGPE)是一种新的无模型强化学习方法,它缓解了常规策略梯度方法中遇到的高方差梯度估计问题。它已经被证明大大加快了几个大规模强化学习任务的收敛速度。然而,PGPE在参数空间中搜索的独立正态分布对于一些具有多模态奖励曲面的问题来说是不够的。本文扩展了基本的PGPE算法,在保持效率的同时对每个参数使用多模态混合分布。在Rastrigin函数和倒立摆基准上的实验结果表明了这种改进的优点,收敛速度更快,优化效果更好。
{"title":"Multimodal Parameter-exploring Policy Gradients","authors":"Frank Sehnke, Alex Graves, Christian Osendorfer, J. Schmidhuber","doi":"10.1109/ICMLA.2010.24","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.24","url":null,"abstract":"Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estimates encountered in normal policy gradient methods. It has been shown to drastically speed up convergence for several large-scale reinforcement learning tasks. However the independent normal distributions used by PGPE to search through parameter space are inadequate for some problems with multimodal reward surfaces. This paper extends the basic PGPE algorithm to use multimodal mixture distributions for each parameter, while remaining efficient. Experimental results on the Rastrigin function and the inverted pendulum benchmark demonstrate the advantages of this modification, with faster convergence to better optima.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116764486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Validating Meronymy Hypotheses with Support Vector Machines and Graph Kernels 用支持向量机和图核验证同名假设
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.43
Tim vor der Brück, H. Helbig
There is a substantial body of work on the extraction of relations from texts, most of which is based on pattern matching or on applying tree kernel functions to syntactic structures. Whereas pattern application is usually more efficient, tree kernels can be superior when assessed by the F-measure. In this paper, we introduce a hybrid approach to extracting meronymy relations, which is based on both patterns and kernel functions. In a first step, meronymy relation hypotheses are extracted from a text corpus by applying patterns. In a second step these relation hypotheses are validated by using several shallow features and a graph kernel approach. In contrast to other meronymy extraction and validation methods which are based on surface or syntactic representations we use a purely semantic approach based on semantic networks. This involves analyzing each sentence of the Wikipedia corpus by a deep syntactico-semantic parser and converting it into a semantic network. Meronymy relation hypotheses are extracted from the semantic networks by means of an automated theorem prover, which employs a set of logical axioms and patterns in the form of semantic networks. The meronymy candidates are then validated by means of a graph kernel approach based on common walks. The evaluation shows that this method achieves considerably higher accuracy, recall, and F-measure than a method using purely shallow validation.
在从文本中提取关系方面有大量的工作,其中大部分是基于模式匹配或将树核函数应用于语法结构。虽然模式应用程序通常更有效,但当用f度量进行评估时,树核可能更优越。本文提出了一种基于模式和核函数的复合式同名关系提取方法。在第一步中,通过应用模式从文本语料库中提取同名关系假设。在第二步中,使用几个浅特征和图核方法验证这些关系假设。与其他基于表面或句法表示的同义词提取和验证方法相比,我们使用基于语义网络的纯语义方法。这包括使用深度语法语义解析器分析维基百科语料库中的每个句子,并将其转换为语义网络。通过自动定理证明器从语义网络中提取关系假设,自动定理证明器采用语义网络形式的一组逻辑公理和模式。然后,通过基于共同行走的图核方法验证候选名称。评估表明,该方法比纯粹使用浅层验证的方法具有更高的准确性、召回率和F-measure。
{"title":"Validating Meronymy Hypotheses with Support Vector Machines and Graph Kernels","authors":"Tim vor der Brück, H. Helbig","doi":"10.1109/ICMLA.2010.43","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.43","url":null,"abstract":"There is a substantial body of work on the extraction of relations from texts, most of which is based on pattern matching or on applying tree kernel functions to syntactic structures. Whereas pattern application is usually more efficient, tree kernels can be superior when assessed by the F-measure. In this paper, we introduce a hybrid approach to extracting meronymy relations, which is based on both patterns and kernel functions. In a first step, meronymy relation hypotheses are extracted from a text corpus by applying patterns. In a second step these relation hypotheses are validated by using several shallow features and a graph kernel approach. In contrast to other meronymy extraction and validation methods which are based on surface or syntactic representations we use a purely semantic approach based on semantic networks. This involves analyzing each sentence of the Wikipedia corpus by a deep syntactico-semantic parser and converting it into a semantic network. Meronymy relation hypotheses are extracted from the semantic networks by means of an automated theorem prover, which employs a set of logical axioms and patterns in the form of semantic networks. The meronymy candidates are then validated by means of a graph kernel approach based on common walks. The evaluation shows that this method achieves considerably higher accuracy, recall, and F-measure than a method using purely shallow validation.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123300367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic Segmentation of the Prostate Using a Genetic Algorithm for Prostate Cancer Treatment Planning 基于遗传算法的前列腺自动分割用于前列腺癌治疗规划
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.115
Melanie Mitchell, J. Tanyi, A. Hung
This paper presents a genetic algorithm (GA) for combining representations of learned priors such as shape, regional properties and relative location of organs into a single framework in order to perform automated segmentation of the prostate. Prostate segmentation is typically performed manually by an expert physician and is used to determine the locations for radioactive seed placement during radiotherapy treatment planning. The GA accounts for the uncertainty in the definitions of tumor margins by combining known representations of shape, texture and relative location of organs to perform automatic segmentation in two (2D) as well as three dimensions (3D).
本文提出了一种遗传算法(GA),将学习到的器官形状、区域属性和相对位置等先验表示结合到一个框架中,以实现前列腺的自动分割。前列腺分割通常由专业医师手动执行,用于确定放射治疗计划中放射性粒子放置的位置。遗传算法通过结合已知的器官形状、纹理和相对位置的表示来执行二维(2D)和三维(3D)的自动分割,从而解决了肿瘤边缘定义的不确定性。
{"title":"Automatic Segmentation of the Prostate Using a Genetic Algorithm for Prostate Cancer Treatment Planning","authors":"Melanie Mitchell, J. Tanyi, A. Hung","doi":"10.1109/ICMLA.2010.115","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.115","url":null,"abstract":"This paper presents a genetic algorithm (GA) for combining representations of learned priors such as shape, regional properties and relative location of organs into a single framework in order to perform automated segmentation of the prostate. Prostate segmentation is typically performed manually by an expert physician and is used to determine the locations for radioactive seed placement during radiotherapy treatment planning. The GA accounts for the uncertainty in the definitions of tumor margins by combining known representations of shape, texture and relative location of organs to perform automatic segmentation in two (2D) as well as three dimensions (3D).","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129615961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predicting End-to-end Network Load 预测端到端网络负载
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.145
A. Vashist, S. Mau, A. Poylisher, R. Chadha, Abhrajit Ghosh
Due to their limited and fluctuating bandwidth, mobile ad hoc networks (MANETs) are inherently resource-constrained. As traffic load increases, we need to decide when to throttle the traffic to maximize user satisfaction while keeping the network operational. The state-of-the-art for making these decisions is based on network measurements and so employs a reactive approach to deteriorating network state by reducing the amount of traffic admitted into the network. However, a better approach is to avoid congestion before it occurs by predicting future network traffic using user and application information from the overlaying social network. We use machine learning methods to predict the source and destination of near future traffic load.
由于其有限和波动的带宽,移动自组织网络(manet)固有的资源受限。随着流量负载的增加,我们需要决定何时限制流量,以在保持网络运行的同时最大化用户满意度。做出这些决策的最先进技术是基于网络测量,因此通过减少允许进入网络的流量来采用反应性方法来恶化网络状态。然而,更好的方法是在拥塞发生之前避免拥塞,方法是使用来自覆盖的社交网络的用户和应用程序信息来预测未来的网络流量。我们使用机器学习方法来预测近期交通负荷的来源和目的地。
{"title":"Predicting End-to-end Network Load","authors":"A. Vashist, S. Mau, A. Poylisher, R. Chadha, Abhrajit Ghosh","doi":"10.1109/ICMLA.2010.145","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.145","url":null,"abstract":"Due to their limited and fluctuating bandwidth, mobile ad hoc networks (MANETs) are inherently resource-constrained. As traffic load increases, we need to decide when to throttle the traffic to maximize user satisfaction while keeping the network operational. The state-of-the-art for making these decisions is based on network measurements and so employs a reactive approach to deteriorating network state by reducing the amount of traffic admitted into the network. However, a better approach is to avoid congestion before it occurs by predicting future network traffic using user and application information from the overlaying social network. We use machine learning methods to predict the source and destination of near future traffic load.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128863266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU 基于多gpu的大数据集欧几里得距离矩阵分块计算方法
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.38
Qi Li, V. Kecman, R. Salman
Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, due to the limited size of device memory, many GPU based algorithms have low capability in solving problems with large datasets. In this paper, a chunking method is proposed to calculate Euclidean distance matrix on large datasets. This is not only designed for scalability in multi-GPU environment but also to maximize the computational capability of each individual GPU device. We first implement a fast GPU algorithm that is suitable for calculating sub matrices of Euclidean distance matrix. Then we utilize a Map-Reduce like framework to split the final distance matrix calculation into many small independent jobs of calculating partial distance matrices, which can be efficiently solved by our GPU algorithm. The framework also dynamically allocates GPU resources to those independent jobs for maximum performance. The experimental results have shown a speed up of 15x on datasets which contain more than half million data points.
计算欧几里得距离矩阵是一项数据密集型操作,对于大型数据集来说,计算变得令人望而却步。近年来图形处理单元(gpu)的发展使其在大量并行处理核心的科学计算问题上产生了卓越的性能。然而,由于设备内存的限制,许多基于GPU的算法在解决大数据集问题时能力较低。本文提出了一种计算大型数据集欧几里得距离矩阵的分块方法。这不仅是为了在多GPU环境下的可扩展性而设计的,也是为了最大限度地提高每个单独GPU设备的计算能力。首先实现了一种适合于计算欧氏距离矩阵子矩阵的快速GPU算法。然后,我们利用类似Map-Reduce的框架将最终的距离矩阵计算分解为许多独立的计算部分距离矩阵的小任务,这些任务可以通过我们的GPU算法有效地解决。该框架还动态地为这些独立的作业分配GPU资源,以获得最大的性能。实验结果表明,在包含超过50万个数据点的数据集上,速度提高了15倍。
{"title":"A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU","authors":"Qi Li, V. Kecman, R. Salman","doi":"10.1109/ICMLA.2010.38","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.38","url":null,"abstract":"Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, due to the limited size of device memory, many GPU based algorithms have low capability in solving problems with large datasets. In this paper, a chunking method is proposed to calculate Euclidean distance matrix on large datasets. This is not only designed for scalability in multi-GPU environment but also to maximize the computational capability of each individual GPU device. We first implement a fast GPU algorithm that is suitable for calculating sub matrices of Euclidean distance matrix. Then we utilize a Map-Reduce like framework to split the final distance matrix calculation into many small independent jobs of calculating partial distance matrices, which can be efficiently solved by our GPU algorithm. The framework also dynamically allocates GPU resources to those independent jobs for maximum performance. The experimental results have shown a speed up of 15x on datasets which contain more than half million data points.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128925188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
A Relative Tendency Based Stock Market Prediction System 基于相对趋势的股票市场预测系统
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.151
ManChon U, K. Rasheed
Researchers have known for some time that non-linearity exists in the financial markets and that neural networks can be used to forecast market returns. In this article, we present a novel stock market prediction system which focuses on forecasting the relative tendency growth between different stocks and indices rather than purely predicting their values. This research utilizes artificial neural network models for estimation. The results are examined for their ability to provide an effective forecast of future values. Certain techniques, such as sliding windows and chaos theory, are employed for data preparation and pre-processing. Our system successfully predicted the relative tendency growth of different stocks with up to 99.01% accuracy.
研究人员早就知道金融市场存在非线性,神经网络可以用来预测市场回报。本文提出了一种新的股票市场预测系统,它侧重于预测不同股票和指数之间的相对趋势增长,而不是单纯地预测它们的价值。本研究利用人工神经网络模型进行估计。检验结果是否能够提供对未来价值的有效预测。某些技术,如滑动窗口和混沌理论,用于数据的准备和预处理。该系统成功地预测了不同股票的相对趋势增长,准确率高达99.01%。
{"title":"A Relative Tendency Based Stock Market Prediction System","authors":"ManChon U, K. Rasheed","doi":"10.1109/ICMLA.2010.151","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.151","url":null,"abstract":"Researchers have known for some time that non-linearity exists in the financial markets and that neural networks can be used to forecast market returns. In this article, we present a novel stock market prediction system which focuses on forecasting the relative tendency growth between different stocks and indices rather than purely predicting their values. This research utilizes artificial neural network models for estimation. The results are examined for their ability to provide an effective forecast of future values. Certain techniques, such as sliding windows and chaos theory, are employed for data preparation and pre-processing. Our system successfully predicted the relative tendency growth of different stocks with up to 99.01% accuracy.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129983862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Constrained Nonnegative Tensor Factorization for Clustering 约束非负张量分解聚类
Pub Date : 2010-12-12 DOI: 10.1109/ICMLA.2010.152
Wei Peng
Constrained clustering through matrix factorization has been shown to largely improve clustering accuracy by incorporating prior knowledge into the factorization process. Although it has been well studied, none of them deal with constrained multi-way data factorization. Multi-way data or Tensors are encoded as high-order data structures. They can be seen as the generalization of matrices. One typical tensor is multiple two-way data/matrices in different time periods. To the best of our knowledge, this paper is the first work developing two general formulation of constrained nonnegative tensor factorization. An extensive experiment conducts a comparative study on the proposed constrained nonnegative tensor factorization and other state-of-the-art algorithms.
通过矩阵分解的约束聚类通过将先验知识纳入到分解过程中,大大提高了聚类的精度。虽然它已经被很好地研究,但它们都没有处理约束的多路数据分解。多路数据或张量被编码为高阶数据结构。它们可以看作是矩阵的推广。一个典型的张量是不同时间段的多个双向数据/矩阵。据我们所知,本文是第一个建立约束非负张量分解的两个一般公式的工作。一个广泛的实验对提出的约束非负张量分解和其他最先进的算法进行了比较研究。
{"title":"Constrained Nonnegative Tensor Factorization for Clustering","authors":"Wei Peng","doi":"10.1109/ICMLA.2010.152","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.152","url":null,"abstract":"Constrained clustering through matrix factorization has been shown to largely improve clustering accuracy by incorporating prior knowledge into the factorization process. Although it has been well studied, none of them deal with constrained multi-way data factorization. Multi-way data or Tensors are encoded as high-order data structures. They can be seen as the generalization of matrices. One typical tensor is multiple two-way data/matrices in different time periods. To the best of our knowledge, this paper is the first work developing two general formulation of constrained nonnegative tensor factorization. An extensive experiment conducts a comparative study on the proposed constrained nonnegative tensor factorization and other state-of-the-art algorithms.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127772472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2010 Ninth International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1