Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583049
G. Boetticher, J. Rudisill
A lot of research in the area of genetic algorithms (GA) is applied, but little research examines the impact of lineage information in optimizing a GA. Normally, researchers consider primarily elitism, an approach which carries only a very small fixed subset of the population to the next generation, as a lineage strategy. This paper investigates several different lineage percentages (what percent of the population to carry forward) to determine an ideal percentage or range from improving the accuracy of a GA. Several experiments are performed, and all results are statistically validated.
{"title":"Optimizing lineage information in genetic algorithms for producing superior models","authors":"G. Boetticher, J. Rudisill","doi":"10.1109/IRI.2008.4583049","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583049","url":null,"abstract":"A lot of research in the area of genetic algorithms (GA) is applied, but little research examines the impact of lineage information in optimizing a GA. Normally, researchers consider primarily elitism, an approach which carries only a very small fixed subset of the population to the next generation, as a lineage strategy. This paper investigates several different lineage percentages (what percent of the population to carry forward) to determine an ideal percentage or range from improving the accuracy of a GA. Several experiments are performed, and all results are statistically validated.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125827651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583071
Itzel Abundez Barrera, Citlalih Gutierrez Estrada, S. D. Zagal, M. N. Perez
The possibility of combining different technologies and developing better performing systems that offer quality results leads to the creation of systems with enhanced adaptation and analysis. Such analysis allows the interaction with previously evaluated techniques, providing reliable, successful results. Such is the case for the research work detailed in this article, which is focused on a technique that allows reusing previously analyzed and formalized information with the support of the Unified Modeled Language (UML). Information is handled in modules aimed to generate the segmentation of medical images without intervention of a specialist. The purpose is to deliver regions of interest for the early detection of cervical cancer.
{"title":"Segmentation of medical images by region growing","authors":"Itzel Abundez Barrera, Citlalih Gutierrez Estrada, S. D. Zagal, M. N. Perez","doi":"10.1109/IRI.2008.4583071","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583071","url":null,"abstract":"The possibility of combining different technologies and developing better performing systems that offer quality results leads to the creation of systems with enhanced adaptation and analysis. Such analysis allows the interaction with previously evaluated techniques, providing reliable, successful results. Such is the case for the research work detailed in this article, which is focused on a technique that allows reusing previously analyzed and formalized information with the support of the Unified Modeled Language (UML). Information is handled in modules aimed to generate the segmentation of medical images without intervention of a specialist. The purpose is to deliver regions of interest for the early detection of cervical cancer.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125503330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583039
P. Vateekul, M. Shyu
Associative classification has aroused significant attention recently and achieved promising results. In the rule ranking process, the confidence measure is usually used to sort the class association rules (CARs). However, it may be not good enough for a classification task due to a low discrimination power to instances in the other classes. In this paper, we propose a novel conflict-based confidence measure with an interleaving ranking strategy for re-ranking CARs in an associative classification framework, which better captures the conflict between a rule and a training data instance. In the experiments, the traditional confidence measure and our proposed conflict-based confidence measure with the interleaving ranking strategy are applied as the primary sorting criterion for CARs. The experimental results show that the proposed associative classification framework achieves promising classification accuracy with the use of the conflict-based confidence measure, particularly for an imbalanced data set.
{"title":"A conflict-based confidence measure for associative classification","authors":"P. Vateekul, M. Shyu","doi":"10.1109/IRI.2008.4583039","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583039","url":null,"abstract":"Associative classification has aroused significant attention recently and achieved promising results. In the rule ranking process, the confidence measure is usually used to sort the class association rules (CARs). However, it may be not good enough for a classification task due to a low discrimination power to instances in the other classes. In this paper, we propose a novel conflict-based confidence measure with an interleaving ranking strategy for re-ranking CARs in an associative classification framework, which better captures the conflict between a rule and a training data instance. In the experiments, the traditional confidence measure and our proposed conflict-based confidence measure with the interleaving ranking strategy are applied as the primary sorting criterion for CARs. The experimental results show that the proposed associative classification framework achieves promising classification accuracy with the use of the conflict-based confidence measure, particularly for an imbalanced data set.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122168574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583024
N. Mohamed, J. Al-Jaroodi, I. Jawhar
The Internet provides a huge amount of online and dynamic information related to stock information, currency exchange rates, interest rates, expected weather status, oil prices, and many other topics. This information is publicly available on dynamic HTML documents or on web services. This paper discusses a flexible notification system that utilizes the available online information and allows users to define a set of notifications that they are interested in. Based on the defined notification conditions, users will be notified by email and/or SMS messages whenever one or more of the conditions are met. In this system, users use Java-based configurations to define the notification requirements. This system also solves some of the issues facing utilizing public information available on the Internet to build the needed notifications. This includes the problem of capturing highly dynamic Internet information as well as supporting advanced types of notifications.
{"title":"A generic notification system for Internet information","authors":"N. Mohamed, J. Al-Jaroodi, I. Jawhar","doi":"10.1109/IRI.2008.4583024","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583024","url":null,"abstract":"The Internet provides a huge amount of online and dynamic information related to stock information, currency exchange rates, interest rates, expected weather status, oil prices, and many other topics. This information is publicly available on dynamic HTML documents or on web services. This paper discusses a flexible notification system that utilizes the available online information and allows users to define a set of notifications that they are interested in. Based on the defined notification conditions, users will be notified by email and/or SMS messages whenever one or more of the conditions are met. In this system, users use Java-based configurations to define the notification requirements. This system also solves some of the issues facing utilizing public information available on the Internet to build the needed notifications. This includes the problem of capturing highly dynamic Internet information as well as supporting advanced types of notifications.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128263403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decision tree learning in the presence of imbalanced data is an issue of great practical importance, as such data is ubiquitous in a wide variety of application domains. We propose hybrid data sampling, which uses a combination of two sampling techniques such as random oversampling and random undersampling, to create a balanced dataset for use in the construction of decision tree classification models. The results demonstrate that our methodology is often able to improve the performance of a C4.5 decision tree learner in the context of imbalanced data.
{"title":"Hybrid sampling for imbalanced data","authors":"Chris Seiffert, T. Khoshgoftaar, J. V. Hulse","doi":"10.3233/ICA-2009-0314","DOIUrl":"https://doi.org/10.3233/ICA-2009-0314","url":null,"abstract":"Decision tree learning in the presence of imbalanced data is an issue of great practical importance, as such data is ubiquitous in a wide variety of application domains. We propose hybrid data sampling, which uses a combination of two sampling techniques such as random oversampling and random undersampling, to create a balanced dataset for use in the construction of decision tree classification models. The results demonstrate that our methodology is often able to improve the performance of a C4.5 decision tree learner in the context of imbalanced data.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130779641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583060
Q. Liang, S. Rubin
Workflow verification has been a well studied research topic during the past few years. Theorem proof based approaches to workflow verification become popular due to several advantages including being based on formal characterization with rigorous and non-ambiguous inference mechanisms. However, a common problem to these inference mechanisms is combinatorial explosions, which forms a major performance hurdle to workflow verification systems based on inference. In this paper, we study how randomization enables reuse and reduces processing time in logic based workflow verification approaches. We, in particular, look at a propositional logic based workflow verification technique. For the logic inference rules, which are used to infer new truthful propositions from existing truthful propositions in this logic, we apply randomization to the inference rules after each verification task such that new inference rules reflecting the componentized verification are added to the inference rule sets. We reviewed the savings incurred in verifying a workflow pattern and provide a theoretical analysis.
{"title":"Rule randomization for propositional logic-based workflow verification","authors":"Q. Liang, S. Rubin","doi":"10.1109/IRI.2008.4583060","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583060","url":null,"abstract":"Workflow verification has been a well studied research topic during the past few years. Theorem proof based approaches to workflow verification become popular due to several advantages including being based on formal characterization with rigorous and non-ambiguous inference mechanisms. However, a common problem to these inference mechanisms is combinatorial explosions, which forms a major performance hurdle to workflow verification systems based on inference. In this paper, we study how randomization enables reuse and reduces processing time in logic based workflow verification approaches. We, in particular, look at a propositional logic based workflow verification technique. For the logic inference rules, which are used to infer new truthful propositions from existing truthful propositions in this logic, we apply randomization to the inference rules after each verification task such that new inference rules reflecting the componentized verification are added to the inference rule sets. We reviewed the savings incurred in verifying a workflow pattern and provide a theoretical analysis.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"97 38","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131879417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583032
Phavy Ouk, Ye Kyaw Thu, M. Matsumoto, Y. Urano
This paper begins with a discussion on the difficulties in applying word-based text entry method for Khmer, official language of Cambodia. Then, we propose a word-based predictive method based on careful analysis on the structure of current Khmer typing system. To evaluate the proposed text entry, we designed and implemented two interface prototypes; the 12-key mobile phone interface and the stylus-based device interface such as Personal Digital Assistant (PDA). Results show that compared to the existing methods our model provides better keystrokes and speed.
本文首先讨论了在柬埔寨官方语言高棉语中应用基于词的文本录入方法的困难。然后,在分析当前高棉语打字系统结构的基础上,提出了一种基于词的预测方法。为了评估建议的文本输入,我们设计并实现了两个界面原型;12键手机界面和PDA (Personal Digital Assistant)等触控笔设备界面。结果表明,与现有的方法相比,我们的模型提供了更好的击键和速度。
{"title":"A word-based predictive text entry method for Khmer language","authors":"Phavy Ouk, Ye Kyaw Thu, M. Matsumoto, Y. Urano","doi":"10.1109/IRI.2008.4583032","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583032","url":null,"abstract":"This paper begins with a discussion on the difficulties in applying word-based text entry method for Khmer, official language of Cambodia. Then, we propose a word-based predictive method based on careful analysis on the structure of current Khmer typing system. To evaluate the proposed text entry, we designed and implemented two interface prototypes; the 12-key mobile phone interface and the stylus-based device interface such as Personal Digital Assistant (PDA). Results show that compared to the existing methods our model provides better keystrokes and speed.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116311179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583037
Xutong Liu, Chang-Tien Lu, F. Chen
A spatial outlier is a spatial object whose non-spatial attributes are significantly different from those of its spatial neighbors. A major limitation associated with the existing outlier detection algorithms is that they generally require a pre-specified number of spatial outliers. Estimating an appropriate number of outliers for a spatial data set is one of the critical issues for outlier analysis. This paper proposes an entropy-based method to address this problem. We define the function of spatial local contrast entropy. Based on the local contrast and local contrast probability that derived from non-spatial and spatial attributes, the spatial local contrast entropy can be computed. By incrementally removing outliers, the entropy value will keep decreasing until it becomes stable at a certain point, where an optimal number of outliers can be estimated. We considered both the single attribute and the multiple attributes of spatial objects. Experiments conducted on the US Housing data validated the effectiveness of our proposed approach.
{"title":"An entropy-based method for assessing the number of spatial outliers","authors":"Xutong Liu, Chang-Tien Lu, F. Chen","doi":"10.1109/IRI.2008.4583037","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583037","url":null,"abstract":"A spatial outlier is a spatial object whose non-spatial attributes are significantly different from those of its spatial neighbors. A major limitation associated with the existing outlier detection algorithms is that they generally require a pre-specified number of spatial outliers. Estimating an appropriate number of outliers for a spatial data set is one of the critical issues for outlier analysis. This paper proposes an entropy-based method to address this problem. We define the function of spatial local contrast entropy. Based on the local contrast and local contrast probability that derived from non-spatial and spatial attributes, the spatial local contrast entropy can be computed. By incrementally removing outliers, the entropy value will keep decreasing until it becomes stable at a certain point, where an optimal number of outliers can be estimated. We considered both the single attribute and the multiple attributes of spatial objects. Experiments conducted on the US Housing data validated the effectiveness of our proposed approach.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122602556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583038
G. Somprasertsri, P. Lalitrojwong
The task of product feature extraction is to find product features that customers refer to their topic reviews. It would be useful to characterize the opinions about the products. We propose an approach for product feature extraction by combining lexical and syntactic features with a maximum entropy model. For the underlying principle of maximum entropy, it prefers the uniform distributions if there is no external knowledge. Using a maximum entropy approach, firstly we extract the learning features from the annotated corpus, secondly we train the maximum entropy model, thirdly we use trained model to extract product features, and finally we apply a natural language processing technique in postprocessing step to discover the remaining product features. Our experimental results show that this approach is suitable for automatic product feature extraction.
{"title":"Automatic product feature extraction from online product reviews using maximum entropy with lexical and syntactic features","authors":"G. Somprasertsri, P. Lalitrojwong","doi":"10.1109/IRI.2008.4583038","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583038","url":null,"abstract":"The task of product feature extraction is to find product features that customers refer to their topic reviews. It would be useful to characterize the opinions about the products. We propose an approach for product feature extraction by combining lexical and syntactic features with a maximum entropy model. For the underlying principle of maximum entropy, it prefers the uniform distributions if there is no external knowledge. Using a maximum entropy approach, firstly we extract the learning features from the annotated corpus, secondly we train the maximum entropy model, thirdly we use trained model to extract product features, and finally we apply a natural language processing technique in postprocessing step to discover the remaining product features. Our experimental results show that this approach is suitable for automatic product feature extraction.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115179782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/IRI.2008.4583078
S. Cheon, Doohwan Kim, B. Zeigler
The structural knowledge of a system represented in system entity structure (SES) supports the handling of organization issue in model compositions. A pruning operation results in a reduced structure, pruned entity structure (PES) that the SES is pruned to meet design objectives. The PES is eventually synthesized into a simulation model by combining it with models in the model base. SES is implemented in XML Metadata using Java language and Sun’s Document Object Models (DOM) specification. The SESBuilder1 software supports natural language input for an SES definition and XML instance generation of its PES. The coupling information expressed in natural language with restricted syntax processed by SESBuilder is used to compose models. As a real example, the compositions of Discrete Event Specification (DEVS) generator models, representing the US Climate Normals, are presented. The example presents the natural language input and XML instances for PES including coupling information in the SESBuilder.
{"title":"DEVS model composition by system entity structure","authors":"S. Cheon, Doohwan Kim, B. Zeigler","doi":"10.1109/IRI.2008.4583078","DOIUrl":"https://doi.org/10.1109/IRI.2008.4583078","url":null,"abstract":"The structural knowledge of a system represented in system entity structure (SES) supports the handling of organization issue in model compositions. A pruning operation results in a reduced structure, pruned entity structure (PES) that the SES is pruned to meet design objectives. The PES is eventually synthesized into a simulation model by combining it with models in the model base. SES is implemented in XML Metadata using Java language and Sun’s Document Object Models (DOM) specification. The SESBuilder1 software supports natural language input for an SES definition and XML instance generation of its PES. The coupling information expressed in natural language with restricted syntax processed by SESBuilder is used to compose models. As a real example, the compositions of Discrete Event Specification (DEVS) generator models, representing the US Climate Normals, are presented. The example presents the natural language input and XML instances for PES including coupling information in the SESBuilder.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130128854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}