首页 > 最新文献

2011 10th International Conference on Machine Learning and Applications and Workshops最新文献

英文 中文
Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters 基于k-均值的变聚类演化数据流扩展算法
J. Silva, Eduardo R. Hruschka
Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.
文献中已经提出了许多基于广泛使用的k-Means的数据流聚类算法。它们中的大多数假设簇的数量k是已知的,并且是用户先验地固定的。为了放松这个在实际应用中通常不现实的假设,我们描述了一个允许从数据中自动估计k的算法框架。我们通过使用三种最先进的聚类数据流算法(Stream LSearch, CluStream和Stream k++)以及两种众所周知的估计聚类数量的算法(即:有序多次运行k-Means (OMRk)和平分k-Means (BkM))来说明所提出框架的潜力。作为额外的贡献,我们通过实验比较了合成数据流和真实数据流中产生的算法实例。统计显著性分析表明,OMRk产生最好的数据分区,而BkM的计算效率更高。此外,Stream k++与OMRk的结合在准确性和效率之间取得了最佳的平衡。
{"title":"Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters","authors":"J. Silva, Eduardo R. Hruschka","doi":"10.1109/ICMLA.2011.67","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.67","url":null,"abstract":"Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126004490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Ranking Interactions for a Curation Task 排序互动的策展任务
S. Clematide, Fabio Rinaldi
One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators.
生物医学文本挖掘系统希望从文献中提取的关键信息之一是不同类型生物医学实体(蛋白质、基因、疾病、药物等)之间的相互作用。可以考虑不同类型的实体,例如,作为生物创意竞争评估的一部分,蛋白质-蛋白质相互作用已被广泛研究。然而,更复杂的相互作用,如基因、药物和疾病之间的相互作用越来越引起人们的兴趣。不同的数据库被用来作为评价提取和排序技术的参考。本文的目的是描述一种基于机器学习的重新排序方法,用于从文献中提取候选交互。使用来自Pharm GKB数据库的数据对结果进行评估。当结果被应用于支持人类管理员时,一个好的排名的重要性尤为明显。
{"title":"Ranking Interactions for a Curation Task","authors":"S. Clematide, Fabio Rinaldi","doi":"10.1109/ICMLA.2011.119","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.119","url":null,"abstract":"One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Different types of entities might be considered, for example protein-protein interactions have been extensively studied as part of the Bio Creative competitive evaluations. However, more complex interactions such as those among genes, drugs, and diseases are increasingly of interest. Different databases have been used as reference for the evaluation of extraction and ranking techniques. The aim of this paper is to describe a machine-learning based reranking approach for candidate interactions extracted from the literature. The results are evaluated using data derived from the Pharm GKB database. The importance of a good ranking is particularly evident when the results are applied to support human curators.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127627697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Microarray Classification Using Sub-space Grids 基于子空间网格的微阵列分类
M. Wani
The work presented in this paper describes how sub-space grids can be employed to extract rules for micro array classification. The paper first describes principal component analysis (PCA) algorithm for obtaining sub-space grids from the projected low dimensional space. A recursive procedure is then used to obtain rules where sub-space grids form premises of rules. The extracted set of rules is evaluated on both training and testing data sets. The sub-space grids from PCA algorithm are characterized by overlapped data from different classes and use of even more than two premises in a rule does not fully address the problem of overlapped data. As such the rules obtained do not discriminate different classes accurately. To increase the effectiveness of the set of rules, multiple discriminant analysis (MDA) algorithm instead of PCA algorithm is employed to obtain sub-space grids from the projected low dimensional space. These sub-space grids from MDA algorithm improve the classification accuracy of the system. However, the size of set of rules extracted is large and these rules are sensitive to local variations associated with the data. To address these issues, the paper explores using both the PCA and MDA algorithms simultaneously fo projected low dimensional space for obtaining sub-space grids. The resulting set of rules produce better classification accuracy results. The paper discusses a comprehensive evaluation of this rule based system. The system is tested on a dataset of 62 samples (40 colon tumor and 22 normal colon tissue). The results show that the use of sub-space grids that are obtained from a projected low dimensional space of combined PCA and MDA algorithms increase the accuracy of classification results of micro array data.
本文介绍了如何利用子空间网格提取微阵列分类规则。本文首先描述了从投影低维空间中获取子空间网格的主成分分析(PCA)算法。然后使用递归过程获得规则,其中子空间网格构成规则的前提。提取的规则集在训练和测试数据集上进行评估。PCA算法的子空间网格的特点是不同类别的数据重叠,即使在规则中使用两个以上的前提也不能完全解决数据重叠的问题。因此,所获得的规则并不能准确地区分不同的阶级。为了提高规则集的有效性,采用多元判别分析(MDA)算法代替PCA算法从投影的低维空间中获取子空间网格。这些MDA算法的子空间网格提高了系统的分类精度。然而,提取的规则集的大小很大,并且这些规则对与数据相关的局部变化很敏感。为了解决这些问题,本文探讨了同时使用PCA和MDA算法来投影低维空间以获得子空间网格。所得到的规则集产生更好的分类精度结果。本文对基于规则的系统进行了综合评价。该系统在62个样本(40个结肠肿瘤和22个正常结肠组织)的数据集上进行了测试。结果表明,结合PCA和MDA算法在低维空间投影得到的子空间网格,提高了微阵列数据分类结果的准确性。
{"title":"Microarray Classification Using Sub-space Grids","authors":"M. Wani","doi":"10.1109/ICMLA.2011.125","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.125","url":null,"abstract":"The work presented in this paper describes how sub-space grids can be employed to extract rules for micro array classification. The paper first describes principal component analysis (PCA) algorithm for obtaining sub-space grids from the projected low dimensional space. A recursive procedure is then used to obtain rules where sub-space grids form premises of rules. The extracted set of rules is evaluated on both training and testing data sets. The sub-space grids from PCA algorithm are characterized by overlapped data from different classes and use of even more than two premises in a rule does not fully address the problem of overlapped data. As such the rules obtained do not discriminate different classes accurately. To increase the effectiveness of the set of rules, multiple discriminant analysis (MDA) algorithm instead of PCA algorithm is employed to obtain sub-space grids from the projected low dimensional space. These sub-space grids from MDA algorithm improve the classification accuracy of the system. However, the size of set of rules extracted is large and these rules are sensitive to local variations associated with the data. To address these issues, the paper explores using both the PCA and MDA algorithms simultaneously fo projected low dimensional space for obtaining sub-space grids. The resulting set of rules produce better classification accuracy results. The paper discusses a comprehensive evaluation of this rule based system. The system is tested on a dataset of 62 samples (40 colon tumor and 22 normal colon tissue). The results show that the use of sub-space grids that are obtained from a projected low dimensional space of combined PCA and MDA algorithms increase the accuracy of classification results of micro array data.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126825127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Network-Based Filtering of Unreliable Markers in Genome Mapping 基因组定位中基于网络的不可靠标记过滤
O. Azzam, Loai Al Nimer, Charith D. Chitraranjan, A. Denton, Ajay Kumar, F. Bassi, M. Iqbal, S. Kianian
Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.
基因组作图,或对染色体上DNA标记排序的实验测定,是基因组测序和测序基因组最终组装的重要步骤。提出的研究解决了识别不能可靠放置的标记的问题。如果在标准映射过程中包含这样的标记,则可能导致总体上较差的映射。传统的识别标记不能一致放置的技术是基于重新采样的,对于大量重新采样的种群,这需要一个计算成本很高的过程。我们提出了一种基于网络的方法,该方法使用标记之间的配对相似性,并证明该方法的结果在很大程度上与计算成本更高的传统方法相匹配。利用小麦基因组辐射杂交图谱的数据对该方法进行了评价。
{"title":"Network-Based Filtering of Unreliable Markers in Genome Mapping","authors":"O. Azzam, Loai Al Nimer, Charith D. Chitraranjan, A. Denton, Ajay Kumar, F. Bassi, M. Iqbal, S. Kianian","doi":"10.1109/ICMLA.2011.103","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.103","url":null,"abstract":"Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional techniques for identifying markers that cannot be placed consistently are based on resampling, which requires an already computationally expensive process to be done for a large ensemble of resampled populations. We propose a network-based approach that uses pair wise similarities between markers and demonstrate that the results from this approach largely match the more computationally expensive conventional approaches. The evaluation of the proposed approach is done on data from the radiation hybrid mapping of the wheat genome.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"12 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126114561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering 字符串相似度判别优化及其在生物医学缩写聚类中的应用
Atsuko Yamaguchi, Yasunori Yamamoto, Jin-Dong Kim, T. Takagi, A. Yonezawa
Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, gchemical names has a different morphology, thus computation of their similarity should be differed from that of other terms.h The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
为了处理自然语言文本中的各种表达式,已经开发了许多字符串相似度度量。由于此类度量的丰裕,我们应该考虑度量及其参数的选择,以最大限度地提高给定任务的性能。在寻找聚类术语任务的最佳度量及其参数以改进我们的生命科学缩写词典的初步实验中,我们发现化学名称的字符序列与其他术语相比具有不同的特征。在此基础上,我们实验了四种字符串相似度度量来验证假设,化学名称具有不同的形态,因此其相似度的计算应该与其他术语不同。实验结果表明,化学名称的编辑距离是最好的,将字符串相似度方法区分应用于化学名称和非化学名称可能是一种简单而有效的方法来提高术语聚类的性能。
{"title":"Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering","authors":"Atsuko Yamaguchi, Yasunori Yamamoto, Jin-Dong Kim, T. Takagi, A. Yonezawa","doi":"10.1109/ICMLA.2011.58","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.58","url":null,"abstract":"Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, gchemical names has a different morphology, thus computation of their similarity should be differed from that of other terms.h The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121000152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel Methods for Minimum Entropy Encoding 最小熵编码的核方法
S. Melacci, M. Gori
Following the basic principles of Information-Theoretic Learning (ITL), in this paper we propose Minimum Entropy Encoders (MEEs), a novel approach to data clustering. We consider a set of functions that project each input point onto a minimum entropy configuration (code). The encoding functions are modeled by kernel machines and the resulting code collects the cluster membership probabilities. Two regularizers are included to balance the distribution of the output features and favor smooth solutions, respectively, thus leading to an unconstrained optimization problem that can be efficiently solved by conjugate gradient or concave-convex procedures. The relationships with Maximum Margin Clustering algorithms are investigated, which show that MEEs overcomes some of the critical issues, such as the lack of a multi-class extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.
根据信息论学习(ITL)的基本原理,本文提出了一种新的数据聚类方法——最小熵编码器(MEEs)。我们考虑一组函数,将每个输入点投影到最小熵配置(代码)上。编码函数由内核机建模,生成的代码收集集群隶属概率。包括两个正则化器,分别用于平衡输出特征的分布和支持光滑解,从而导致可以通过共轭梯度或凹凸过程有效解决的无约束优化问题。研究了最大边际聚类算法与最大边际聚类算法的关系,结果表明该算法克服了一些关键问题,如缺乏多类扩展和需要面对大量约束的问题。对所提出方法的几个基准进行的大规模评估显示,在准确性和计算复杂性方面,该方法都优于最先进的技术。
{"title":"Kernel Methods for Minimum Entropy Encoding","authors":"S. Melacci, M. Gori","doi":"10.1109/ICMLA.2011.83","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.83","url":null,"abstract":"Following the basic principles of Information-Theoretic Learning (ITL), in this paper we propose Minimum Entropy Encoders (MEEs), a novel approach to data clustering. We consider a set of functions that project each input point onto a minimum entropy configuration (code). The encoding functions are modeled by kernel machines and the resulting code collects the cluster membership probabilities. Two regularizers are included to balance the distribution of the output features and favor smooth solutions, respectively, thus leading to an unconstrained optimization problem that can be efficiently solved by conjugate gradient or concave-convex procedures. The relationships with Maximum Margin Clustering algorithms are investigated, which show that MEEs overcomes some of the critical issues, such as the lack of a multi-class extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114244683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tactile Sensor System Processing Based on K-means Clustering 基于k均值聚类的触觉传感器系统处理
Harry Chan-Maestas, D. Sofge
Development of a touch-sensitive (sensate) skin for robotic manipulators would provide tactile feedback for fine-grained dexterous control of robots interacting with objects in their environments, a capability that has largely been missing with robotic systems developed to date. A sensate skin for robots would require integration of hundreds or thousands of minute force or pressure sensors, each producing a localized response. Interpretation and extraction of useful information from the sensate skin presents a key technical challenge. In this paper we present a technique for analyzing data from tactile sensor arrays based on K-means clustering. Using a simplified contact model, the procedure estimates both magnitude and location for impacts on the sensate skin surface. Furthermore, it robustly accommodates a variety of sensor array densities by interpolating across areas of sensor response, providing accurate results even between sensing elements.
为机器人操纵者开发一种触觉敏感皮肤,将为机器人与环境中物体交互的精细灵巧控制提供触觉反馈,这是迄今为止开发的机器人系统在很大程度上缺失的一种能力。机器人的感应皮肤需要集成数百或数千个微小的力或压力传感器,每个传感器都能产生局部响应。从感觉皮肤中解释和提取有用信息是一个关键的技术挑战。本文提出了一种基于k均值聚类的触觉传感器阵列数据分析技术。使用简化的接触模型,该程序估计在感觉皮肤表面的影响的大小和位置。此外,它通过跨传感器响应区域的插值,稳健地适应各种传感器阵列密度,即使在传感元件之间也能提供准确的结果。
{"title":"Tactile Sensor System Processing Based on K-means Clustering","authors":"Harry Chan-Maestas, D. Sofge","doi":"10.1109/ICMLA.2011.136","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.136","url":null,"abstract":"Development of a touch-sensitive (sensate) skin for robotic manipulators would provide tactile feedback for fine-grained dexterous control of robots interacting with objects in their environments, a capability that has largely been missing with robotic systems developed to date. A sensate skin for robots would require integration of hundreds or thousands of minute force or pressure sensors, each producing a localized response. Interpretation and extraction of useful information from the sensate skin presents a key technical challenge. In this paper we present a technique for analyzing data from tactile sensor arrays based on K-means clustering. Using a simplified contact model, the procedure estimates both magnitude and location for impacts on the sensate skin surface. Furthermore, it robustly accommodates a variety of sensor array densities by interpolating across areas of sensor response, providing accurate results even between sensing elements.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133377034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards Automatic Classification on Flying Insects Using Inexpensive Sensors 利用廉价传感器对飞虫进行自动分类
Gustavo E. A. P. A. Batista, Yuan Hao, Eamonn J. Keogh, A. Mafra‐Neto
Insects are intimately connected to human life and well being, in both positive and negative senses. While it is estimated that insects pollinate at least two-thirds of the all food consumed by humans, malaria, a disease transmitted by the female mosquito of the Anopheles genus, kills approximately one million people per year. Due to the importance of insects to humans, researchers have developed an arsenal of mechanical, chemical, biological and educational tools to help mitigate insects' harmful effects, and to enhance their beneficial effects. However, the efficiency of such tools depends on knowing the time and location of migrations/infestations/population as early as possible. Insect detection and counting is typically performed by means of traps, usually "sticky traps", which are regularly collected and manually analyzed. The main problem is that this procedure is expensive in terms of materials and human time, and creates a lag between the time the trap is placed and inspected. This lag may only be a week, but in the case of say, mosquitoes or sand flies, this can be more than half their adult life span. We are developing an inexpensive optical sensor that uses a laser beam to detect, count and ultimately classify flying insects from distance. Our objective is to use classification techniques to provide accurate real-time counts of disease vectors down to the species/sex level. This information can be used by public health workers, government and non-government organizations to plan the optimal intervention strategies in the face of limited resources. In this work, we present some preliminary results of our research, conducted with three insect species. We show that using our simple sensor we can accurately classify these species using their wing-beat frequency as feature. We further discuss how we can augment the sensor with other sources of information in order to scale our ideas to classify a larger number of species.
昆虫与人类的生活和福祉密切相关,无论是积极的还是消极的。据估计,人类消耗的全部食物中至少有三分之二是由昆虫授粉的,但疟疾(一种由按蚊属的雌蚊子传播的疾病)每年造成约100万人死亡。由于昆虫对人类的重要性,研究人员开发了一系列机械、化学、生物和教育工具,以帮助减轻昆虫的有害影响,并增强它们的有益影响。然而,这些工具的效率取决于尽可能早地了解迁徙/侵扰/人口的时间和地点。昆虫检测和计数通常是通过陷阱进行的,通常是“粘性陷阱”,定期收集和人工分析。主要的问题是,该过程在材料和人力时间方面都很昂贵,并且在放置疏水阀和检查疏水阀之间存在滞后。这种滞后可能只有一周,但对蚊子或沙蝇来说,这可能是它们成年寿命的一半以上。我们正在开发一种廉价的光学传感器,它使用激光束从远处探测、计数并最终对飞虫进行分类。我们的目标是利用分类技术提供精确到物种/性别水平的疾病媒介的实时计数。公共卫生工作者、政府和非政府组织可以利用这些信息,在资源有限的情况下规划最佳干预战略。在这项工作中,我们介绍了我们对三种昆虫进行研究的一些初步结果。我们表明,使用我们的简单传感器,我们可以准确地分类这些物种,以他们的翅膀拍击频率为特征。我们进一步讨论了如何用其他信息来源来增强传感器,以便扩展我们的想法,对更多的物种进行分类。
{"title":"Towards Automatic Classification on Flying Insects Using Inexpensive Sensors","authors":"Gustavo E. A. P. A. Batista, Yuan Hao, Eamonn J. Keogh, A. Mafra‐Neto","doi":"10.1109/ICMLA.2011.145","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.145","url":null,"abstract":"Insects are intimately connected to human life and well being, in both positive and negative senses. While it is estimated that insects pollinate at least two-thirds of the all food consumed by humans, malaria, a disease transmitted by the female mosquito of the Anopheles genus, kills approximately one million people per year. Due to the importance of insects to humans, researchers have developed an arsenal of mechanical, chemical, biological and educational tools to help mitigate insects' harmful effects, and to enhance their beneficial effects. However, the efficiency of such tools depends on knowing the time and location of migrations/infestations/population as early as possible. Insect detection and counting is typically performed by means of traps, usually \"sticky traps\", which are regularly collected and manually analyzed. The main problem is that this procedure is expensive in terms of materials and human time, and creates a lag between the time the trap is placed and inspected. This lag may only be a week, but in the case of say, mosquitoes or sand flies, this can be more than half their adult life span. We are developing an inexpensive optical sensor that uses a laser beam to detect, count and ultimately classify flying insects from distance. Our objective is to use classification techniques to provide accurate real-time counts of disease vectors down to the species/sex level. This information can be used by public health workers, government and non-government organizations to plan the optimal intervention strategies in the face of limited resources. In this work, we present some preliminary results of our research, conducted with three insect species. We show that using our simple sensor we can accurately classify these species using their wing-beat frequency as feature. We further discuss how we can augment the sensor with other sources of information in order to scale our ideas to classify a larger number of species.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Extended Finite-State Machine Induction Using SAT-Solver 基于sat求解器的扩展有限状态机感应
V. Ulyantsev, F. Tsarev
In the paper we describe the extended finite-state machine (EFSM) induction method that uses SAT-solver. Input data for the induction algorithm is a set of test scenarios. The algorithm consists of several steps: scenarios tree construction, compatibility graph construction, Boolean formula construction, SAT-solver invocation and finite-state machine construction from satisfying assignment. These extended finite-state machines can be used in automata-based programming, where programs are designed as automated controlled objects. Each automated controlled object contains a finite-state machine and a controlled object. The method described has been tested on randomly generated scenario sets of size from 250 to 2000 and on the alarm clock controlling EFSM induction problem where it has greatly outperformed genetic algorithm.
本文描述了基于sat求解器的扩展有限状态机(EFSM)感应方法。归纳算法的输入数据是一组测试场景。该算法包括场景树构造、兼容图构造、布尔公式构造、sat求解器调用和从满足赋值构造有限状态机几个步骤。这些扩展的有限状态机可用于基于自动机的编程,其中程序被设计为自动控制对象。每个自动化被控对象都包含一个有限状态机和一个被控对象。所描述的方法已经在250到2000个随机生成的场景集上进行了测试,并在闹钟控制EFSM诱导问题上进行了测试,其中它大大优于遗传算法。
{"title":"Extended Finite-State Machine Induction Using SAT-Solver","authors":"V. Ulyantsev, F. Tsarev","doi":"10.1109/ICMLA.2011.166","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.166","url":null,"abstract":"In the paper we describe the extended finite-state machine (EFSM) induction method that uses SAT-solver. Input data for the induction algorithm is a set of test scenarios. The algorithm consists of several steps: scenarios tree construction, compatibility graph construction, Boolean formula construction, SAT-solver invocation and finite-state machine construction from satisfying assignment. These extended finite-state machines can be used in automata-based programming, where programs are designed as automated controlled objects. Each automated controlled object contains a finite-state machine and a controlled object. The method described has been tested on randomly generated scenario sets of size from 250 to 2000 and on the alarm clock controlling EFSM induction problem where it has greatly outperformed genetic algorithm.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
A New Control Method for dc-dc Converter by Neural Network Predictor with Repetitive Training 一种基于重复训练神经网络预测器的dc-dc变换器控制新方法
F. Kurokawa, K. Ueno, H. Maruta, H. Osuga
This paper proposes a novel prediction based digital control dc-dc converter. In this method, a neural network control is adopted to improve the transient response in coordination with a conventional P-I-D control. The prediction based control term is consists of predicted data which are obtained from repetitive training of the neural network. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. The proposed method is based on the neural network learning, it is expected that the proposed approach has high availability in providing the easy way for the design of circuit system since there is no need to change the algorithm. The adequate availability of the proposed method is also confirmed by the experiment in which P-I-D control parameters of the circuit are set to non-optimal ones and the proposed method is used in the same manner.
提出了一种基于预测的数字控制dc-dc变换器。该方法采用神经网络控制与传统的P-I-D控制相协调来改善暂态响应。基于预测的控制项由神经网络的重复训练得到的预测数据组成。当负载快速变化时,这种方法可以非常有效地改善暂态响应。与传统的负载电阻阶跃变化相比,该方法有效地抑制了输出电压过冲和电抗器电流过冲。该方法基于神经网络学习,在不需要改变算法的情况下,具有较高的可用性,为电路系统的设计提供了简便的方法。将电路的P-I-D控制参数设置为非最优参数,并以相同的方式使用所提出的方法,也证实了所提出方法的充分可用性。
{"title":"A New Control Method for dc-dc Converter by Neural Network Predictor with Repetitive Training","authors":"F. Kurokawa, K. Ueno, H. Maruta, H. Osuga","doi":"10.1109/ICMLA.2011.17","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.17","url":null,"abstract":"This paper proposes a novel prediction based digital control dc-dc converter. In this method, a neural network control is adopted to improve the transient response in coordination with a conventional P-I-D control. The prediction based control term is consists of predicted data which are obtained from repetitive training of the neural network. This works to improve the transient response very effectively when the load is changed quickly. As a result, the undershoot of the output voltage and the overshoot of the reactor current are suppressed effectively as compared with the conventional one in the step change of load resistance. The proposed method is based on the neural network learning, it is expected that the proposed approach has high availability in providing the easy way for the design of circuit system since there is no need to change the algorithm. The adequate availability of the proposed method is also confirmed by the experiment in which P-I-D control parameters of the circuit are set to non-optimal ones and the proposed method is used in the same manner.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127448775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2011 10th International Conference on Machine Learning and Applications and Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1