首页 > 最新文献

2006 5th International Conference on Machine Learning and Applications (ICMLA'06)最新文献

英文 中文
A New Machine Learning Technique Based on Straight Line Segments 一种新的基于直线段的机器学习技术
J. Ribeiro, R. F. Hashimoto
This paper presents a new supervised machine learning technique based on distances between points and straight lines segments. Basically, given a training data set, this technique estimates a function where its value is calculated using the distance between points and two sets of straight line segments. A training algorithm has been developed to find these sets of straight line segments that minimize the mean square error. This technique has been applied on two real pattern recognition problems: (1) breast cancer data set to classify tumors as benign or malignant; (2) wine data set to classify wines in one of the three different cultivators from which they could be derived. This technique was also tested with two artificial data sets in order to show its ability to solve approximation function problems. The obtained results show that this technique has a good performance in all of these problems and they indicate that it is a good candidate to be used in machine learning applications
提出了一种基于点与直线段之间距离的监督式机器学习方法。基本上,给定一个训练数据集,该技术估计一个函数,其值是使用点和两组直线段之间的距离计算的。已经开发了一种训练算法来找到这些使均方误差最小的直线段集。该技术已应用于两个实际的模式识别问题:(1)乳腺癌数据集对肿瘤进行良性或恶性分类;(2)葡萄酒数据集,用于对三种不同栽培器中的葡萄酒进行分类。该方法还在两个人工数据集上进行了测试,以显示其解决近似函数问题的能力。得到的结果表明,该技术在所有这些问题上都有很好的性能,这表明它是机器学习应用的一个很好的候选者
{"title":"A New Machine Learning Technique Based on Straight Line Segments","authors":"J. Ribeiro, R. F. Hashimoto","doi":"10.1109/ICMLA.2006.8","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.8","url":null,"abstract":"This paper presents a new supervised machine learning technique based on distances between points and straight lines segments. Basically, given a training data set, this technique estimates a function where its value is calculated using the distance between points and two sets of straight line segments. A training algorithm has been developed to find these sets of straight line segments that minimize the mean square error. This technique has been applied on two real pattern recognition problems: (1) breast cancer data set to classify tumors as benign or malignant; (2) wine data set to classify wines in one of the three different cultivators from which they could be derived. This technique was also tested with two artificial data sets in order to show its ability to solve approximation function problems. The obtained results show that this technique has a good performance in all of these problems and they indicate that it is a good candidate to be used in machine learning applications","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115268690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Ensemble Classifiers for Medical Diagnosis of Knee Osteoarthritis Using Gait Data 基于步态数据的膝骨关节炎医学诊断集成分类器
Nigar Sen Köktas, N. Yalabik, G. Yavuzer
Automated or semi-automated gait analysis systems are important in assisting physicians for diagnosis of various diseases. The objective of this study is to discuss ensemble methods for gait classification as a part of preliminary studies of designing a semi-automated diagnosis system. For this purpose gait data is collected from 110 sick subjects (having knee osteoarthritis (OA)) and 91 age-matched normal subjects. A set of multilayer perceptrons (MLPs) is trained by using joint angle and time-distance parameters of gait as features. Large dimensional feature vector is decomposed into feature subsets and the ones selected by gait expert are used to categorize subjects into two classes; healthy and patient. Ensemble of MLPs is built using these distinct feature subsets and diversification of classifiers is analyzed by cross-validation approach and confusion matrices. High diversifications observed in the confusion matrices suggested that using combining methods would help. Indeed, when a proper combining rule is applied to decomposed sets, more accurate results are obtained. The result suggests that ensemble of MLPs could be applied in the automated diagnosis of gait disorders in a clinical context
自动化或半自动步态分析系统在协助医生诊断各种疾病方面非常重要。本研究的目的是讨论步态分类的集成方法,作为设计半自动诊断系统的初步研究的一部分。为此,我们收集了110名患病受试者(患有膝骨关节炎)和91名年龄匹配的正常受试者的步态数据。以关节角度和步态时间距离参数为特征,训练了一组多层感知器。将大维特征向量分解为特征子集,利用步态专家选择的特征子集将受试者分为两类;健康又有耐心。使用这些不同的特征子集构建mlp集合,并通过交叉验证方法和混淆矩阵分析分类器的多样性。在混淆矩阵中观察到的高度多样化表明,使用组合方法将有所帮助。事实上,当对分解集应用适当的组合规则时,可以得到更准确的结果。结果表明,mlp的集合可以应用于临床环境中步态障碍的自动诊断
{"title":"Ensemble Classifiers for Medical Diagnosis of Knee Osteoarthritis Using Gait Data","authors":"Nigar Sen Köktas, N. Yalabik, G. Yavuzer","doi":"10.1109/ICMLA.2006.22","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.22","url":null,"abstract":"Automated or semi-automated gait analysis systems are important in assisting physicians for diagnosis of various diseases. The objective of this study is to discuss ensemble methods for gait classification as a part of preliminary studies of designing a semi-automated diagnosis system. For this purpose gait data is collected from 110 sick subjects (having knee osteoarthritis (OA)) and 91 age-matched normal subjects. A set of multilayer perceptrons (MLPs) is trained by using joint angle and time-distance parameters of gait as features. Large dimensional feature vector is decomposed into feature subsets and the ones selected by gait expert are used to categorize subjects into two classes; healthy and patient. Ensemble of MLPs is built using these distinct feature subsets and diversification of classifiers is analyzed by cross-validation approach and confusion matrices. High diversifications observed in the confusion matrices suggested that using combining methods would help. Indeed, when a proper combining rule is applied to decomposed sets, more accurate results are obtained. The result suggests that ensemble of MLPs could be applied in the automated diagnosis of gait disorders in a clinical context","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123641348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Information of Binding Sites Improves Prediction of Protein-Protein Interaction 结合位点的信息改善了蛋白质相互作用的预测
Tapan P. Patel, Manoj Pillay, Rahul Jawa, Li Liao
Protein-protein interaction is essential to cellular functions. In this work, we describe a simple, novel approach to improve the accuracy of predicting protein-protein interaction by incorporating the binding site information. First, we assess the importance of the seven attributes that are used by Bradford et. al (2005) for predicting protein binding sites. The leave-one-out cross validation experiments and principal component analysis indicate that some attributes such as residue propensity and hydrophobicity are more important than other attributes such as curvedness and shape index in differentiating a binding patch from nonbinding patch. Second, we incorporate those attributes to predict protein-protein interaction by simple concatenation of the attribute vectors of candidate interacting partners. A support vector machine is trained to predict the interacting partners. This is combined with using the attributes directly derived from the primary sequence at the binding sites. The results from the leave-one-out cross validation experiments show significant improvement in prediction accuracy by incorporating the structural information at the binding sites
蛋白质之间的相互作用对细胞功能至关重要。在这项工作中,我们描述了一种简单、新颖的方法,通过结合结合位点信息来提高预测蛋白质-蛋白质相互作用的准确性。首先,我们评估了Bradford等人(2005)用于预测蛋白质结合位点的七个属性的重要性。通过留一交叉验证实验和主成分分析表明,残留倾向和疏水性等属性比曲线度和形状指数等属性更能有效地区分结合贴片与非结合贴片。其次,我们结合这些属性,通过简单的候选相互作用伙伴属性向量的连接来预测蛋白质-蛋白质相互作用。训练支持向量机来预测交互伙伴。这与使用直接从结合位点的初级序列派生的属性相结合。留一交叉验证实验结果表明,结合结合位点的结构信息显著提高了预测精度
{"title":"Information of Binding Sites Improves Prediction of Protein-Protein Interaction","authors":"Tapan P. Patel, Manoj Pillay, Rahul Jawa, Li Liao","doi":"10.1109/ICMLA.2006.29","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.29","url":null,"abstract":"Protein-protein interaction is essential to cellular functions. In this work, we describe a simple, novel approach to improve the accuracy of predicting protein-protein interaction by incorporating the binding site information. First, we assess the importance of the seven attributes that are used by Bradford et. al (2005) for predicting protein binding sites. The leave-one-out cross validation experiments and principal component analysis indicate that some attributes such as residue propensity and hydrophobicity are more important than other attributes such as curvedness and shape index in differentiating a binding patch from nonbinding patch. Second, we incorporate those attributes to predict protein-protein interaction by simple concatenation of the attribute vectors of candidate interacting partners. A support vector machine is trained to predict the interacting partners. This is combined with using the attributes directly derived from the primary sequence at the binding sites. The results from the leave-one-out cross validation experiments show significant improvement in prediction accuracy by incorporating the structural information at the binding sites","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121421936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Feature Selection Algorithm for Detecting Subtype Specific Functional Sites from Protein Sequences for Smad Receptor Binding 一种检测Smad受体结合蛋白序列亚型特异性功能位点的特征选择算法
E. Marchiori, W. Pirovano, J. Heringa, K. Feenstra
Multiple sequence alignments are often used to reveal functionally important residues within a protein family. In particular, they can be very useful for identification of key residues that determine functional differences between protein subclasses (subtype specific sites). This paper proposes a new algorithm for selecting subtype specific sites from a set of aligned protein sequences. The algorithm combines a feature selection technique with neighbor position information for selecting and ranking a set of putative relevant sites. The algorithm is applied to a dataset of protein sequences from the MH2 domain of the SMAD family of transcriptor factors. Validation of the results on the basis of the known interaction and function of the sites shows that the algorithm successfully identifies the known (from literature) subtype specific sites and new putative ones
多序列比对通常用于揭示蛋白质家族中功能重要的残基。特别是,它们对于确定蛋白质亚类(亚型特异性位点)之间功能差异的关键残基的鉴定非常有用。本文提出了一种从一组排列的蛋白质序列中选择亚型特异性位点的新算法。该算法将特征选择技术与邻居位置信息相结合,对一组假定的相关站点进行选择和排序。该算法应用于SMAD转录因子家族MH2结构域的蛋白质序列数据集。根据已知位点的相互作用和功能对结果进行验证,结果表明该算法成功地识别出已知(来自文献)亚型特异性位点和新的假定位点
{"title":"A Feature Selection Algorithm for Detecting Subtype Specific Functional Sites from Protein Sequences for Smad Receptor Binding","authors":"E. Marchiori, W. Pirovano, J. Heringa, K. Feenstra","doi":"10.1109/ICMLA.2006.7","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.7","url":null,"abstract":"Multiple sequence alignments are often used to reveal functionally important residues within a protein family. In particular, they can be very useful for identification of key residues that determine functional differences between protein subclasses (subtype specific sites). This paper proposes a new algorithm for selecting subtype specific sites from a set of aligned protein sequences. The algorithm combines a feature selection technique with neighbor position information for selecting and ranking a set of putative relevant sites. The algorithm is applied to a dataset of protein sequences from the MH2 domain of the SMAD family of transcriptor factors. Validation of the results on the basis of the known interaction and function of the sites shows that the algorithm successfully identifies the known (from literature) subtype specific sites and new putative ones","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126720852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Horizon Detection Using Machine Learning Techniques 利用机器学习技术进行地平线检测
Sergiy Fefilatyev, Volha Smarodzinava, L. Hall, Dmitry Goldgof
Detecting a horizon in an image is an important part of many image related applications such as detecting ships on the horizon, flight control, and port security. Most of the existing solutions for the problem only use image processing methods to identify a horizon line in an image. This results in good accuracy for many cases and is fast in computation. However, for some images with difficult environmental conditions like a foggy or cloudy sky these image processing methods are inherently inaccurate in identifying the correct horizon. This paper investigates how to detect the horizon line in a set of images using a machine learning approach. The performance of the SVM, J48, and naive Bayes classifiers, used for the problem, has been compared. Accuracy of 90-99% in identifying horizon was achieved on image data set of 20 images
在图像中检测地平线是许多图像相关应用的重要组成部分,如探测地平线上的船舶、飞行控制和港口安全。大多数现有的解决方案仅使用图像处理方法来识别图像中的地平线。这种方法在许多情况下具有良好的精度,并且计算速度快。然而,对于一些环境条件困难的图像,如雾天或多云的天空,这些图像处理方法在识别正确的地平线方面本质上是不准确的。本文研究了如何使用机器学习方法在一组图像中检测地平线。对用于该问题的SVM、J48和朴素贝叶斯分类器的性能进行了比较。在20幅图像数据集上,对地平线的识别准确率达到90-99%
{"title":"Horizon Detection Using Machine Learning Techniques","authors":"Sergiy Fefilatyev, Volha Smarodzinava, L. Hall, Dmitry Goldgof","doi":"10.1109/ICMLA.2006.25","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.25","url":null,"abstract":"Detecting a horizon in an image is an important part of many image related applications such as detecting ships on the horizon, flight control, and port security. Most of the existing solutions for the problem only use image processing methods to identify a horizon line in an image. This results in good accuracy for many cases and is fast in computation. However, for some images with difficult environmental conditions like a foggy or cloudy sky these image processing methods are inherently inaccurate in identifying the correct horizon. This paper investigates how to detect the horizon line in a set of images using a machine learning approach. The performance of the SVM, J48, and naive Bayes classifiers, used for the problem, has been compared. Accuracy of 90-99% in identifying horizon was achieved on image data set of 20 images","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116940782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
An Adaptable Time Warping Distance for Time Series Learning 一种用于时间序列学习的自适应时间翘曲距离
R. Gaudin, N. Nicoloyannis
Most machine learning and data mining algorithms for time series datasets need a suitable distance measure. In addition to classic p-norm distance, numerous other distance measures exist and the most popular is dynamic time warping. Here we propose a new distance measure, called adaptable time warping (ATW), which generalizes all previous time warping distances. We present a learning process using a genetic algorithm that adapts ATW in a locally optimal way, according to the current classification issue we have to resolve. It's possible to prove that ATW with optimal parameters is at least equivalent or at best superior to the other time warping distances for all classification problems. We show this assertion by performing comparative tests on two real datasets. The originality of this work is that we propose a whole learning process directly based on the distance measure rather than on the time series themselves
大多数时间序列数据集的机器学习和数据挖掘算法都需要一个合适的距离度量。除了经典的p-范数距离之外,还有许多其他的距离测量方法,其中最流行的是动态时间翘曲。本文提出了一种新的距离度量,称为自适应时间翘曲(ATW),它对以前所有的时间翘曲距离进行了推广。根据当前需要解决的分类问题,我们提出了一个使用遗传算法以局部最优方式适应ATW的学习过程。对于所有分类问题,有可能证明具有最优参数的ATW至少与其他时间翘曲距离相等或最好优于其他时间翘曲距离。我们通过对两个真实数据集进行比较测试来证明这一断言。这项工作的独创性在于,我们提出了一个直接基于距离测量的整个学习过程,而不是基于时间序列本身
{"title":"An Adaptable Time Warping Distance for Time Series Learning","authors":"R. Gaudin, N. Nicoloyannis","doi":"10.1109/ICMLA.2006.12","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.12","url":null,"abstract":"Most machine learning and data mining algorithms for time series datasets need a suitable distance measure. In addition to classic p-norm distance, numerous other distance measures exist and the most popular is dynamic time warping. Here we propose a new distance measure, called adaptable time warping (ATW), which generalizes all previous time warping distances. We present a learning process using a genetic algorithm that adapts ATW in a locally optimal way, according to the current classification issue we have to resolve. It's possible to prove that ATW with optimal parameters is at least equivalent or at best superior to the other time warping distances for all classification problems. We show this assertion by performing comparative tests on two real datasets. The originality of this work is that we propose a whole learning process directly based on the distance measure rather than on the time series themselves","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122987834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An Intelligent Automatic Fingerprint Recognition System Design 智能指纹自动识别系统设计
Necla Özkaya, Ş. Sağiroğlu, M. Wani
This work presents an intelligent automatic fingerprint identification and verification system based on artificial neural networks. In this work, the design processes of the system have been presented step by step. In order to make the system automatic, software was developed for fingerprint identification and verification processes. 100 fingerprint images were used to test and evaluate the system. The results have shown that the task was achieved with high accuracy
本文提出了一种基于人工神经网络的智能指纹自动识别与验证系统。在本工作中,逐步介绍了系统的设计过程。为了使系统自动化,开发了指纹识别和验证软件。利用100张指纹图像对该系统进行了测试和评价。结果表明,该任务具有较高的精度
{"title":"An Intelligent Automatic Fingerprint Recognition System Design","authors":"Necla Özkaya, Ş. Sağiroğlu, M. Wani","doi":"10.1109/ICMLA.2006.15","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.15","url":null,"abstract":"This work presents an intelligent automatic fingerprint identification and verification system based on artificial neural networks. In this work, the design processes of the system have been presented step by step. In order to make the system automatic, software was developed for fingerprint identification and verification processes. 100 fingerprint images were used to test and evaluate the system. The results have shown that the task was achieved with high accuracy","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132457769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Printing Workflow Recommendation Tool--Exploiting Correlations between Highly Sparse Case Logs 一个打印工作流推荐工具——利用高度稀疏的案例日志之间的相关性
Ming Zhong, Tong Sun
As a user preference prediction mechanism, recommendation techniques have been widely used to support personalized information filtering in current e-commerce applications. We build a recommendation tool into the existing Xerox printing workflow configuration system in order to provide new users with a number of solutions that are possibly of their interests. Such solution recommendations can significantly improve the system efficiency and accuracy by reducing workflow generation overhead and helping users quickly identify their needs. In our work, the main challenge is the high sparsity inherent to our application data - most fields have missing values due to a customer's lack of background or uncertainty on their specific needs. We address this problem by using latent semantic indexing (LSI) to merge original sparse data records into dense and semantic records. The generated dense data are then grouped into clusters based on their correlations. These clusters, together with their user patterns and representative workflows, are used to support efficient online workflow recommendation. Our implemented tool is able to achieve 83% accuracy on a dataset of 4569 case logs with 91% average sparseness
推荐技术作为一种用户偏好预测机制,在当前的电子商务应用中被广泛用于支持个性化信息过滤。我们在现有的施乐打印工作流程配置系统中构建了一个推荐工具,以便为新用户提供一些可能感兴趣的解决方案。这样的解决方案建议可以通过减少工作流生成开销和帮助用户快速识别他们的需求来显著提高系统效率和准确性。在我们的工作中,主要的挑战是我们的应用程序数据固有的高稀疏性——由于客户缺乏背景或不确定他们的特定需求,大多数字段都缺少值。我们通过使用潜在语义索引(LSI)将原始的稀疏数据记录合并为密集的语义记录来解决这个问题。然后根据它们的相关性将生成的密集数据分组到簇中。这些集群及其用户模式和代表性工作流用于支持有效的在线工作流推荐。我们实现的工具能够在包含4569个案例日志的数据集上实现83%的准确率,平均稀疏度为91%
{"title":"A Printing Workflow Recommendation Tool--Exploiting Correlations between Highly Sparse Case Logs","authors":"Ming Zhong, Tong Sun","doi":"10.1109/ICMLA.2006.10","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.10","url":null,"abstract":"As a user preference prediction mechanism, recommendation techniques have been widely used to support personalized information filtering in current e-commerce applications. We build a recommendation tool into the existing Xerox printing workflow configuration system in order to provide new users with a number of solutions that are possibly of their interests. Such solution recommendations can significantly improve the system efficiency and accuracy by reducing workflow generation overhead and helping users quickly identify their needs. In our work, the main challenge is the high sparsity inherent to our application data - most fields have missing values due to a customer's lack of background or uncertainty on their specific needs. We address this problem by using latent semantic indexing (LSI) to merge original sparse data records into dense and semantic records. The generated dense data are then grouped into clusters based on their correlations. These clusters, together with their user patterns and representative workflows, are used to support efficient online workflow recommendation. Our implemented tool is able to achieve 83% accuracy on a dataset of 4569 case logs with 91% average sparseness","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123198754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning the Threshold in Hierarchical Agglomerative Clustering 层次聚类阈值的学习
K. Daniels, C. Giraud-Carrier
Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise
大多数分区聚类算法需要先验地设置所需聚类的数量。这不仅有点违反直觉,而且除了在最简单的情况下,它也很困难。相比之下,分层集群可以创建具有不同数量集群的分区。实际的最终分区取决于所使用的相似性度量的阈值。给定一个聚类质量度量,人们可以通过半监督学习的形式有效地发现一个适当的阈值。本文给出了一种利用f测度和标记样本的小子集的完全链接层次聚集聚类的解决方案。实证评价表明前景看好
{"title":"Learning the Threshold in Hierarchical Agglomerative Clustering","authors":"K. Daniels, C. Giraud-Carrier","doi":"10.1109/ICMLA.2006.33","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.33","url":null,"abstract":"Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115405305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Market Mechanism Designs with Heterogeneous Trading Agents 具有异质交易主体的市场机制设计
Zengchang Qin
Market mechanism design research is playing an important role in computational economics for resolving multi-agent allocation problems. A genetic algorithm was used to design auction mechanisms in order to automatically generate a desired market mechanism in agent based E-markets. In previous research, a hybrid market was studied, in which the probability that buyers rather than sellers are able to quote on a given time step, this probability was adapted by the GA which attempted to minimise Smith's coefficient of convergence. However, in previous experiments, all trading agents involved are of the same type or have identical preferences. This assumption does not hold in real-world markets which are always populated with heterogeneous agents. In this paper, the research of using evolutionary computing methods for auction designs is extended by using heterogeneous trading agents
市场机制设计研究在解决多主体分配问题的计算经济学中起着重要的作用。为了在基于agent的电子市场中自动生成期望的市场机制,采用遗传算法设计拍卖机制。在之前的研究中,我们研究了一个混合市场,在这个市场中,买家而不是卖家能够在给定的时间步长上报价的概率,这个概率被试图最小化史密斯收敛系数的遗传算法所适应。然而,在之前的实验中,所有参与的交易主体都是相同类型或具有相同偏好的。这种假设在现实世界的市场中并不成立,因为市场中总是充斥着异质的代理人。本文将进化计算方法用于拍卖设计的研究扩展到使用异构交易代理
{"title":"Market Mechanism Designs with Heterogeneous Trading Agents","authors":"Zengchang Qin","doi":"10.1109/ICMLA.2006.34","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.34","url":null,"abstract":"Market mechanism design research is playing an important role in computational economics for resolving multi-agent allocation problems. A genetic algorithm was used to design auction mechanisms in order to automatically generate a desired market mechanism in agent based E-markets. In previous research, a hybrid market was studied, in which the probability that buyers rather than sellers are able to quote on a given time step, this probability was adapted by the GA which attempted to minimise Smith's coefficient of convergence. However, in previous experiments, all trading agents involved are of the same type or have identical preferences. This assumption does not hold in real-world markets which are always populated with heterogeneous agents. In this paper, the research of using evolutionary computing methods for auction designs is extended by using heterogeneous trading agents","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121635829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2006 5th International Conference on Machine Learning and Applications (ICMLA'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1