With advanced information techniques, organizations want to make their database public for different purposes. It is important to do some data transformations that prevent private information to be revealed before publishing the database. In this paper, we introduce a combined approach to enhance the privacy of the databases to be released. The combination of two existing techniques, k-anonymity and randomization, provides better privacy protection than only applying one of two approaches and still reserves certain data utility. The experiments on real-world dataset show that our privacy breach prevention algorithm enhances the privacy with small cost increase compared to the k-anonymity approach.
{"title":"Enhancing Privacy of Released Database","authors":"Tingting Chen, S. Zhong","doi":"10.1109/GrC.2007.101","DOIUrl":"https://doi.org/10.1109/GrC.2007.101","url":null,"abstract":"With advanced information techniques, organizations want to make their database public for different purposes. It is important to do some data transformations that prevent private information to be revealed before publishing the database. In this paper, we introduce a combined approach to enhance the privacy of the databases to be released. The combination of two existing techniques, k-anonymity and randomization, provides better privacy protection than only applying one of two approaches and still reserves certain data utility. The experiments on real-world dataset show that our privacy breach prevention algorithm enhances the privacy with small cost increase compared to the k-anonymity approach.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132321708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Nii, Shigeru Ando, Yutaka Takahashi, A. Uchinuno, R. Sakashita
The nursing care quality improvement is very important in the medical field. Currently, nursing-care freestyle texts (nursing-care data) are collected from many hospitals in Japan by using Web applications. Some nursing-care experts evaluate the collected data to improve nursing care quality. For evaluating the nursing-care data, experts need to read all freestyle texts carefully. However, it is a hard task for an expert to evaluate the data because of huge number of nursing-care data in the database. In order to reduce workloads evaluating nursing-care data, we propose a support vector machine(SVM) based classification system.
{"title":"Nursing-Care Freestyle Text Classification Using Support Vector Machines","authors":"M. Nii, Shigeru Ando, Yutaka Takahashi, A. Uchinuno, R. Sakashita","doi":"10.1109/GrC.2007.131","DOIUrl":"https://doi.org/10.1109/GrC.2007.131","url":null,"abstract":"The nursing care quality improvement is very important in the medical field. Currently, nursing-care freestyle texts (nursing-care data) are collected from many hospitals in Japan by using Web applications. Some nursing-care experts evaluate the collected data to improve nursing care quality. For evaluating the nursing-care data, experts need to read all freestyle texts carefully. However, it is a hard task for an expert to evaluate the data because of huge number of nursing-care data in the database. In order to reduce workloads evaluating nursing-care data, we propose a support vector machine(SVM) based classification system.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134484247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of rough sets theory to select essential attributes that can represent the original data set is well known. Knowledge discovered from such essential attributes are typically represented as rules, and are therefore representative of the original data. We present three results towards rule evaluation as an extension of the "rules-as-attributes measure ". First, we present an approach of finding representative sets of rules for a given data set. Secondly, we suggest that the Johnson's reducer of the ROSETTA software generates a reduct with the minimum number of rules, and can be considered as a minimum representation of the original knowledge. Our third result provides an integrated approach for rule evaluation based on both the rule importance measure and the method of finding representative sets of rules. We argue that this approach can take the representative rules ranking into a further stage. These approaches are proposed to facilitate the rule evaluations and can provide an automatic and complete comprehension of the original data set.
{"title":"A Method of Finding Representative Sets of Rules","authors":"Jiye Li, N. Cercone, Jianchao Han","doi":"10.1109/GrC.2007.145","DOIUrl":"https://doi.org/10.1109/GrC.2007.145","url":null,"abstract":"The use of rough sets theory to select essential attributes that can represent the original data set is well known. Knowledge discovered from such essential attributes are typically represented as rules, and are therefore representative of the original data. We present three results towards rule evaluation as an extension of the \"rules-as-attributes measure \". First, we present an approach of finding representative sets of rules for a given data set. Secondly, we suggest that the Johnson's reducer of the ROSETTA software generates a reduct with the minimum number of rules, and can be considered as a minimum representation of the original knowledge. Our third result provides an integrated approach for rule evaluation based on both the rule importance measure and the method of finding representative sets of rules. We argue that this approach can take the representative rules ranking into a further stage. These approaches are proposed to facilitate the rule evaluations and can provide an automatic and complete comprehension of the original data set.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hierarchical prosody structure generation is a key component for a speech synthesis system. One major feature of the prosody of Mandarin Chinese speech flow is prosodic phrase grouping. In this paper a method based on maximum entropy Markov model (MEMM) is proposed to predict prosodic phrase boundaries in unrestricted Chinese text. MEMM is described in detail that combines transition probabilities and conditional probabilities of states effectively. The conditional probabilities of states are estimated by maximum entropy (ME) theory. A comparison is conducted between the new model and maximum entropy model for prosody phrase break prediction. The experiments show that utilizing the same feature set, MEMM improves overall performance. The precision and recall ratio are improved.
{"title":"A Maximum Entropy Markov Model for Prediction of Prosodic Phrase Boundaries in Chinese TTS","authors":"Ziping Zhao, Tingjian Zhao, Yaoting Zhu","doi":"10.1109/GrC.2007.66","DOIUrl":"https://doi.org/10.1109/GrC.2007.66","url":null,"abstract":"Hierarchical prosody structure generation is a key component for a speech synthesis system. One major feature of the prosody of Mandarin Chinese speech flow is prosodic phrase grouping. In this paper a method based on maximum entropy Markov model (MEMM) is proposed to predict prosodic phrase boundaries in unrestricted Chinese text. MEMM is described in detail that combines transition probabilities and conditional probabilities of states effectively. The conditional probabilities of states are estimated by maximum entropy (ME) theory. A comparison is conducted between the new model and maximum entropy model for prosody phrase break prediction. The experiments show that utilizing the same feature set, MEMM improves overall performance. The precision and recall ratio are improved.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132402941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How to combine multiple clusterings into a single clustering solution of better quality is a critical problem in cluster ensemble. In this paper, we extend Strehl's consensus function based on information- theoretic principles and propose a novel weighted consensus function to combine multiple "soft" clusterings. In our consensus function, we use mutual information to measure the sharing information between two "soft" clusterings and emphasize the clustering which is much different from the others. We use the algorithm similar to sequential k-means to obtain the solution of this consensus function and conduct experiments on four real-world datasets to compare our algorithm with other four consensus function, including CSPA, HGPA, MCLA, QMI. The results indicate that our consensus function provides solutions of better quality than CSPA, HGPA, MCLA, QMI and when the distribution of diversity in cluster ensembles is uneven, considering the influence of diversity can improve the quality of clustering ensemble.
{"title":"A Weighted Consensus Function Based on Information-Theoretic Principles to Combine Soft Clusterings","authors":"Yan Gao, Shiwen Gu, Jianhua Li, Zhining Liao","doi":"10.1109/GrC.2007.156","DOIUrl":"https://doi.org/10.1109/GrC.2007.156","url":null,"abstract":"How to combine multiple clusterings into a single clustering solution of better quality is a critical problem in cluster ensemble. In this paper, we extend Strehl's consensus function based on information- theoretic principles and propose a novel weighted consensus function to combine multiple \"soft\" clusterings. In our consensus function, we use mutual information to measure the sharing information between two \"soft\" clusterings and emphasize the clustering which is much different from the others. We use the algorithm similar to sequential k-means to obtain the solution of this consensus function and conduct experiments on four real-world datasets to compare our algorithm with other four consensus function, including CSPA, HGPA, MCLA, QMI. The results indicate that our consensus function provides solutions of better quality than CSPA, HGPA, MCLA, QMI and when the distribution of diversity in cluster ensembles is uneven, considering the influence of diversity can improve the quality of clustering ensemble.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While privacy preservation of data mining approaches has been an important topic for a number of years, privacy of social network data is a relatively new area of interest. Previous research has shown that anonymization alone may not be sufficient for hiding identity information on certain real world data sets. In this paper, we focus on understanding the impact of network topology and node substructure on the level of anonymity present in the network. We present a new measure, topological anonymity, that quantifies the amount of privacy preserved in different topological structures. The measure uses a combination of known social network metrics and attempts to identify when node and edge inference breeches arise in these graphs.
{"title":"Measuring Topological Anonymity in Social Networks","authors":"Lisa Singh, J. Zhan","doi":"10.1109/GrC.2007.31","DOIUrl":"https://doi.org/10.1109/GrC.2007.31","url":null,"abstract":"While privacy preservation of data mining approaches has been an important topic for a number of years, privacy of social network data is a relatively new area of interest. Previous research has shown that anonymization alone may not be sufficient for hiding identity information on certain real world data sets. In this paper, we focus on understanding the impact of network topology and node substructure on the level of anonymity present in the network. We present a new measure, topological anonymity, that quantifies the amount of privacy preserved in different topological structures. The measure uses a combination of known social network metrics and attempts to identify when node and edge inference breeches arise in these graphs.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In contemporary distributed applications questions concerning coordination have become increasingly urgent. There is a trade-off however to be made between the need for a highly reactive behavior and the need for semantically rich high level abstractions. Especially w.r.t. context-aware applications where various systems have to act together and come to coordinated conclusions the need for powerful semantic abstractions is evident. Our research is based on the observation that human teams are very good in coordinating (when compared to technical systems). Consequently we chose an approach of common sense reasoning which is capable to grasp the specifics of human behavior. One specific in this approach is the usage of fuzzy quotients which bears strong similarities to the notion of granules.
{"title":"Fuzzy Quotients in Reactive Common Sense Reasoning","authors":"M. Cebulla","doi":"10.1109/GrC.2007.81","DOIUrl":"https://doi.org/10.1109/GrC.2007.81","url":null,"abstract":"In contemporary distributed applications questions concerning coordination have become increasingly urgent. There is a trade-off however to be made between the need for a highly reactive behavior and the need for semantically rich high level abstractions. Especially w.r.t. context-aware applications where various systems have to act together and come to coordinated conclusions the need for powerful semantic abstractions is evident. Our research is based on the observation that human teams are very good in coordinating (when compared to technical systems). Consequently we chose an approach of common sense reasoning which is capable to grasp the specifics of human behavior. One specific in this approach is the usage of fuzzy quotients which bears strong similarities to the notion of granules.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124955428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, the previously introduced fuzzy modeling method is used to model the input impedance of two coupled dipole antennas in the echelon form. The initial data of two coupled dipole antennas in the parallel and collinear form, which are required for the model, are obtained using the MoM. Then, the knowledge of two coupled dipole antennas in the echelon form is easily predicted based on the knowledge of two coupled dipole antennas in the parallel and collinear form and the concept of spatial membership functions. Also, the problem behavior is well approximated. Comparing the proposed model results with MoM shows an excellent agreement with a vanishingly short execution time comparing with MoM.
{"title":"Prediction of the Input Impedance of Two Coupled Dipole Antennas in the Echelon Form","authors":"S. R. Ostadzadeh, M. Soleimani, M. Tayarani","doi":"10.1109/GrC.2007.36","DOIUrl":"https://doi.org/10.1109/GrC.2007.36","url":null,"abstract":"In this paper, the previously introduced fuzzy modeling method is used to model the input impedance of two coupled dipole antennas in the echelon form. The initial data of two coupled dipole antennas in the parallel and collinear form, which are required for the model, are obtained using the MoM. Then, the knowledge of two coupled dipole antennas in the echelon form is easily predicted based on the knowledge of two coupled dipole antennas in the parallel and collinear form and the concept of spatial membership functions. Also, the problem behavior is well approximated. Comparing the proposed model results with MoM shows an excellent agreement with a vanishingly short execution time comparing with MoM.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121164856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Horizontal collaboration fuzzy C-means (HC-FCM) is a useful tool for dealing with collaborative clustering problems where a pattern-set is described in some different feature spaces independently and thus results in different data sets. By means of FCM, clustering may be carried on these different data sets and thus result in different partition matrices. For one of these data sets, how to take means of the clustering information of the other data sets to help its own clustering and thus to give a reasonable collaborative clustering result is a meaningful topic and becomes the aim of HC-FCM. Because of potential security and privacy restrictions, the clustering information can be provided only by partition matrices instead of the data sets themselves. This confines the manner of using the clustering information. In the original frame of HC-FCM given by W.Pedrycz, the partition matrices are directly introduced to the clustering algorithm without any preprocessing. In this paper, we will show the necessity of the preprocessing on the partition matrices and present an available method for the preprocessing. Some experiments are given to show the performance of the proposed method for preprocessing. With the work of this paper, the horizontal collaboration fuzzy C-means will be well carried on.
{"title":"A Necessary Preprocessing in Horizontal Collaborative Fuzzy Clustering","authors":"Fusheng Yu, Juan Tang, Ruiqiong Cai","doi":"10.1109/GrC.2007.33","DOIUrl":"https://doi.org/10.1109/GrC.2007.33","url":null,"abstract":"Horizontal collaboration fuzzy C-means (HC-FCM) is a useful tool for dealing with collaborative clustering problems where a pattern-set is described in some different feature spaces independently and thus results in different data sets. By means of FCM, clustering may be carried on these different data sets and thus result in different partition matrices. For one of these data sets, how to take means of the clustering information of the other data sets to help its own clustering and thus to give a reasonable collaborative clustering result is a meaningful topic and becomes the aim of HC-FCM. Because of potential security and privacy restrictions, the clustering information can be provided only by partition matrices instead of the data sets themselves. This confines the manner of using the clustering information. In the original frame of HC-FCM given by W.Pedrycz, the partition matrices are directly introduced to the clustering algorithm without any preprocessing. In this paper, we will show the necessity of the preprocessing on the partition matrices and present an available method for the preprocessing. Some experiments are given to show the performance of the proposed method for preprocessing. With the work of this paper, the horizontal collaboration fuzzy C-means will be well carried on.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"412 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115726891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper put forward dominance granule based multi-objective sorting algorithm (DGSA). The dominance granule can be obtained by the dominance relation in the information system and granular computing. It is the basis of multi-objective sorting and fitness assignment. Therefore, the dominance granule based multi-objective sorting algorithm is designed and reduces the computational complexity highly.
{"title":"Granular Computing Based Sorting Method in Multi-Objective Optimization","authors":"Gaowei Yan, Gang Xie, Keming Xie, T. Lin","doi":"10.1109/GrC.2007.127","DOIUrl":"https://doi.org/10.1109/GrC.2007.127","url":null,"abstract":"This paper put forward dominance granule based multi-objective sorting algorithm (DGSA). The dominance granule can be obtained by the dominance relation in the information system and granular computing. It is the basis of multi-objective sorting and fitness assignment. Therefore, the dominance granule based multi-objective sorting algorithm is designed and reduces the computational complexity highly.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128358140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}