Condition monitoring applications deploying the usage of impact acoustic techniques are mostly done intuitively by skilled personnel. In this article, a pattern recognition approach is taken to automate such intuitive human skills for the development of more robust and reliable testing methods. The focus of this work is to use the approach as a part of a major research project in the rail inspection area, within the domain of intelligent transport systems. Data from impact acoustic tests made on wooden beams have been used. The relation between condition of the wooden beams and respective sounds they make when struck, has been analyzed experimentally. Features were extracted from the acoustic emissions of wooden beams and were used for pattern classification. Features such as magnitude of the signal, natural logarithm of the magnitude and Mel-frequency cepstral coefficients, yielded good results. The extracted feature vectors were used as input to various pattern classifiers for further pattern recognition task. The effect of using classifiers like support vector machines and multi-layer perceptron has been tested and compared. Results obtained experimentally, demonstrate that support vector machines provide good detection rates for the classification of impact acoustic signals in the NDT domain
{"title":"Condition Monitoring Using Pattern Recognition Techniques on Data from Acoustic Emissions","authors":"Siril Yella, N. Gupta, M. Dougherty","doi":"10.1109/ICMLA.2006.19","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.19","url":null,"abstract":"Condition monitoring applications deploying the usage of impact acoustic techniques are mostly done intuitively by skilled personnel. In this article, a pattern recognition approach is taken to automate such intuitive human skills for the development of more robust and reliable testing methods. The focus of this work is to use the approach as a part of a major research project in the rail inspection area, within the domain of intelligent transport systems. Data from impact acoustic tests made on wooden beams have been used. The relation between condition of the wooden beams and respective sounds they make when struck, has been analyzed experimentally. Features were extracted from the acoustic emissions of wooden beams and were used for pattern classification. Features such as magnitude of the signal, natural logarithm of the magnitude and Mel-frequency cepstral coefficients, yielded good results. The extracted feature vectors were used as input to various pattern classifiers for further pattern recognition task. The effect of using classifiers like support vector machines and multi-layer perceptron has been tested and compared. Results obtained experimentally, demonstrate that support vector machines provide good detection rates for the classification of impact acoustic signals in the NDT domain","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117162972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web content function indicates authors' intension towards the purpose of the content and therefore plays an important role for Web information processing. In this paper we propose a generalized hidden Markov model which extends traditional hidden Markov model for Web content function detection. By incorporating multiple emission features and detecting state transition sequence based on layout structure, generalized hidden Markov model can effectively make use of Web-specific information and achieve better performance comparing to traditional hidden Markov model. Comparing to previous approaches on function detection, our approach has the advantages of domain-independency and extensibility for other applications. Experiments show promising results with our approach
{"title":"Detecting Web Content Function Using Generalized Hidden Markov Model","authors":"Jinlin Chen, Ping Zhong, Terry Cook","doi":"10.1109/ICMLA.2006.21","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.21","url":null,"abstract":"Web content function indicates authors' intension towards the purpose of the content and therefore plays an important role for Web information processing. In this paper we propose a generalized hidden Markov model which extends traditional hidden Markov model for Web content function detection. By incorporating multiple emission features and detecting state transition sequence based on layout structure, generalized hidden Markov model can effectively make use of Web-specific information and achieve better performance comparing to traditional hidden Markov model. Comparing to previous approaches on function detection, our approach has the advantages of domain-independency and extensibility for other applications. Experiments show promising results with our approach","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115091608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work presents a detailed comparison of three imputation techniques, Bayesian multiple imputation, regression imputation and k nearest neighbor imputation, at various missingness levels. Starting with a complete real-world software measurement dataset called CCCS, missing values were injected into the dependent variable at four levels according to three different missingness mechanisms. The three imputation techniques are evaluated by comparing the imputed and actual values. Our analysis includes a three-way analysis of variance (ANOVA) model, which demonstrates that Bayesian multiple imputation obtains the best performance, followed closely by regression
{"title":"A Comparison of Software Fault Imputation Procedures","authors":"J. V. Hulse, T. Khoshgoftaar, Chris Seiffert","doi":"10.1109/ICMLA.2006.5","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.5","url":null,"abstract":"This work presents a detailed comparison of three imputation techniques, Bayesian multiple imputation, regression imputation and k nearest neighbor imputation, at various missingness levels. Starting with a complete real-world software measurement dataset called CCCS, missing values were injected into the dependent variable at four levels according to three different missingness mechanisms. The three imputation techniques are evaluated by comparing the imputed and actual values. Our analysis includes a three-way analysis of variance (ANOVA) model, which demonstrates that Bayesian multiple imputation obtains the best performance, followed closely by regression","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126830748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents recent results on developing learning agents that can be taught by subject matter experts how to solve problems, through examples and explanations. It introduces the lazy rule refinement method where the expert modifies an example generated by a learned rule. In this case the agent has to decide whether to modify the rule (if the modification applies to all the previous positive examples) or to learn a new rule. However, checking the previous examples would be disruptive or even impossible. The lazy rule refinement method provides an elegant solution to this problem, in which the agent delays the decision whether to modify the rule or to learn a new rule until it accumulated enough examples during the follow-on problem solving process. This method has been incorporated into the disciple learning agent shell and used in the complex application areas of center of gravity analysis and intelligence analysis
{"title":"Lazy Rule Refinement by Knowledge-Based Agents","authors":"Cristina Boicu, G. Tecuci, Mihai Boicu","doi":"10.1109/ICMLA.2006.32","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.32","url":null,"abstract":"This paper presents recent results on developing learning agents that can be taught by subject matter experts how to solve problems, through examples and explanations. It introduces the lazy rule refinement method where the expert modifies an example generated by a learned rule. In this case the agent has to decide whether to modify the rule (if the modification applies to all the previous positive examples) or to learn a new rule. However, checking the previous examples would be disruptive or even impossible. The lazy rule refinement method provides an elegant solution to this problem, in which the agent delays the decision whether to modify the rule or to learn a new rule until it accumulated enough examples during the follow-on problem solving process. This method has been incorporated into the disciple learning agent shell and used in the complex application areas of center of gravity analysis and intelligence analysis","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"30 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125699568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose an analog of kernel principal component analysis (kernel PCA). Our algorithm is based on an approximation of PCA which uses Gram-Schmidt orthonormalization. We combine this approximation with support vector machine kernels to obtain a nonlinear generalization of PCA. By using our approximation to PCA we are able to provide a more easily computed (in the case of many data points) and readily interpretable version of kernel PCA. After demonstrating our algorithm on some examples, we explore its use in applications to fluid flow and microarray data
{"title":"An Approximate Version of Kernel PCA","authors":"Shawn Martin","doi":"10.1109/ICMLA.2006.13","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.13","url":null,"abstract":"We propose an analog of kernel principal component analysis (kernel PCA). Our algorithm is based on an approximation of PCA which uses Gram-Schmidt orthonormalization. We combine this approximation with support vector machine kernels to obtain a nonlinear generalization of PCA. By using our approximation to PCA we are able to provide a more easily computed (in the case of many data points) and readily interpretable version of kernel PCA. After demonstrating our algorithm on some examples, we explore its use in applications to fluid flow and microarray data","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130790041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many researchers are investigating the possibility of utilizing global gene expression profile data as a platform to infer gene regulatory networks. However, heavy computational burden and measurement noises render these efforts difficult and approaches based on quantized levels are vigorously investigated as an alternative. Methods based on quantized values require a procedure to convert continuous expression values into discrete ones. Although there have been algorithms to quantize values into multiple discrete states, these algorithms assumed strict state mixtures (SSM,) so that all expression profiles were divided into pre-specified number of states. We propose two novel quantization algorithms (QAs), model-based quantization algorithm and model-free quantization algorithm that generalize SSM algorithms in two major aspects. First, our QAs assume the maximum number of expression states (Es) be arbitrary. Second, expression profiles can exhibit any combinations of Es possible states. In this paper, we compare the performances between SSM algorithms and QAs using simulation studies as well as applications to actual data and show that quantizing gene expression data using adaptive algorithms is an effective way to reduce data complexity without sacrificing much of essential information
{"title":"Quantization of Global Gene Expression Data","authors":"Tae-Hoon Chung, M. Brun, Seungchan Kim","doi":"10.1109/ICMLA.2006.42","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.42","url":null,"abstract":"Many researchers are investigating the possibility of utilizing global gene expression profile data as a platform to infer gene regulatory networks. However, heavy computational burden and measurement noises render these efforts difficult and approaches based on quantized levels are vigorously investigated as an alternative. Methods based on quantized values require a procedure to convert continuous expression values into discrete ones. Although there have been algorithms to quantize values into multiple discrete states, these algorithms assumed strict state mixtures (SSM,) so that all expression profiles were divided into pre-specified number of states. We propose two novel quantization algorithms (QAs), model-based quantization algorithm and model-free quantization algorithm that generalize SSM algorithms in two major aspects. First, our QAs assume the maximum number of expression states (Es) be arbitrary. Second, expression profiles can exhibit any combinations of Es possible states. In this paper, we compare the performances between SSM algorithms and QAs using simulation studies as well as applications to actual data and show that quantizing gene expression data using adaptive algorithms is an effective way to reduce data complexity without sacrificing much of essential information","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131624206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joel W. Reed, Y. Jiao, T. Potok, Brian A. Klump, M.T. Elmore, A. Hurson
In this paper, we propose a new term weighting scheme called term frequency-inverse corpus frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods
{"title":"TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams","authors":"Joel W. Reed, Y. Jiao, T. Potok, Brian A. Klump, M.T. Elmore, A. Hurson","doi":"10.1109/ICMLA.2006.50","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.50","url":null,"abstract":"In this paper, we propose a new term weighting scheme called term frequency-inverse corpus frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127997895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microarray experiments often produce missing expression values due to various reasons. Accurate and robust estimation methods of missing values are needed since many algorithms and statistical analysis require a complete data set. In this paper, novel imputation methods based on least absolute deviation estimate, referred to as LADimpute, are proposed to estimate missing entries in microarray data. The proposed LADimpute method takes into consideration the local similarity structures in addition to employment of least absolute deviation estimate. Once those genes similar to the target gene with missing values are selected based on some metric, all missing values in the target gene can be estimated by the linear combination of the similar genes simultaneously. In our experiments, the proposed LADimpute method exhibits its accurate and robust performance when compared to other methods over different datasets, changing missing rates and various noise levels
{"title":"An Accurate and Robust Missing Value Estimation for Microarray Data: Least Absolute Deviation Imputation","authors":"Yi Cao, K. Poh","doi":"10.1109/ICMLA.2006.11","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.11","url":null,"abstract":"Microarray experiments often produce missing expression values due to various reasons. Accurate and robust estimation methods of missing values are needed since many algorithms and statistical analysis require a complete data set. In this paper, novel imputation methods based on least absolute deviation estimate, referred to as LADimpute, are proposed to estimate missing entries in microarray data. The proposed LADimpute method takes into consideration the local similarity structures in addition to employment of least absolute deviation estimate. Once those genes similar to the target gene with missing values are selected based on some metric, all missing values in the target gene can be estimated by the linear combination of the similar genes simultaneously. In our experiments, the proposed LADimpute method exhibits its accurate and robust performance when compared to other methods over different datasets, changing missing rates and various noise levels","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Shames, Nima Najmaei, Mohammad Zamani, A. Safavi
In this paper, we have taken advantage of reinforcement learning to develop a new traffic shaper in order to obtain a reasonable utilization of bandwidth while preventing traffic overload in other part of the network and as a result, reducing total number of packet dropping in the whole network.. We used a modified version of Q-learning in which a combination of neural networks keeps the data of Q-table in order to make the operation faster while keeping the required storage as small as possible. This method shows satisfactory results in simulations from the aspects of keeping dropping probability low while injecting as many packets as possible into the network in order to utilize the free bandwidth as much as possible. On the other hand the results show that the system can perform in situations that are not originally designed to act in
{"title":"Application of Reinforcement Learning in Development of a New Adaptive Intelligent Traffic Shaper","authors":"I. Shames, Nima Najmaei, Mohammad Zamani, A. Safavi","doi":"10.1109/ICMLA.2006.16","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.16","url":null,"abstract":"In this paper, we have taken advantage of reinforcement learning to develop a new traffic shaper in order to obtain a reasonable utilization of bandwidth while preventing traffic overload in other part of the network and as a result, reducing total number of packet dropping in the whole network.. We used a modified version of Q-learning in which a combination of neural networks keeps the data of Q-table in order to make the operation faster while keeping the required storage as small as possible. This method shows satisfactory results in simulations from the aspects of keeping dropping probability low while injecting as many packets as possible into the network in order to utilize the free bandwidth as much as possible. On the other hand the results show that the system can perform in situations that are not originally designed to act in","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121037792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sung-Soo Kim, Chan-Hee Lee, Keon-Myung Lee, Sung-Duk Lee
In this paper, we have proposed a new method that extracts a set of signatures from different nucleotide groups via measuring the distances between the groups. The proposed method not only extracts automatically the signatures of different sizes via constraint relaxation, but also provides the locations of signatures in a sequence while it measures the relative distance between the groups, which provides a convenience for understanding the information on nucleotide. The performance of the proposed method is demonstrated through simulations and analysis
{"title":"A New Scheme for Nucleotide Sequence Signature Extraction","authors":"Sung-Soo Kim, Chan-Hee Lee, Keon-Myung Lee, Sung-Duk Lee","doi":"10.1109/ICMLA.2006.9","DOIUrl":"https://doi.org/10.1109/ICMLA.2006.9","url":null,"abstract":"In this paper, we have proposed a new method that extracts a set of signatures from different nucleotide groups via measuring the distances between the groups. The proposed method not only extracts automatically the signatures of different sizes via constraint relaxation, but also provides the locations of signatures in a sequence while it measures the relative distance between the groups, which provides a convenience for understanding the information on nucleotide. The performance of the proposed method is demonstrated through simulations and analysis","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130610967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}