Gamila Obadi, Pavla Drázdilová, Lukas Hlavacek, J. Martinovič, V. Snás̃el
In the article there is presented comparison of overlapping clustering methods for data mining of DBLP datasets. For the analysis, the DBLP data sets were pre-processed, while each journal has been assigned attributes, defined by its topics. The data collection can be described as vague and uncertain; obtained clusters and applied queries do not necessarily have crisp boundaries. The authors presented clustering through a tolerance rough set method (TRSM) and fuzzy c-mean (FCM) algorithm for journal recommendation based on topic search. The comparison of both clustering methods was presented using different measures of similarity.
{"title":"A Tolerance Rough Set Based Overlapping Clustering for the DBLP Data","authors":"Gamila Obadi, Pavla Drázdilová, Lukas Hlavacek, J. Martinovič, V. Snás̃el","doi":"10.1109/WI-IAT.2010.286","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.286","url":null,"abstract":"In the article there is presented comparison of overlapping clustering methods for data mining of DBLP datasets. For the analysis, the DBLP data sets were pre-processed, while each journal has been assigned attributes, defined by its topics. The data collection can be described as vague and uncertain; obtained clusters and applied queries do not necessarily have crisp boundaries. The authors presented clustering through a tolerance rough set method (TRSM) and fuzzy c-mean (FCM) algorithm for journal recommendation based on topic search. The comparison of both clustering methods was presented using different measures of similarity.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128273974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses missing edges and vertices in a network. We discuss interchangeability and duality between vertices and edges in a graph. We use covariate information associated with vertices to estimate the probability of missing edges; likewise, we use covariate information associated with edges to estimate the probability of missing vertices. In order to predict missing vertices, we apply the line graph transformation, which converts edges to vertices and vertices to edges. The probability of an edge is obtained by taking the inner product of the vectors of covariates. Moreover, we have extended the methodology of predicting two edges (dyadic ties) to predict edge
{"title":"Predicting Edges and Vertices in a Network","authors":"Walid K. Sharabati, E. Wegman, Yasmin H. Said","doi":"10.1109/WI-IAT.2010.317","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.317","url":null,"abstract":"This paper addresses missing edges and vertices in a network. We discuss interchangeability and duality between vertices and edges in a graph. We use covariate information associated with vertices to estimate the probability of missing edges; likewise, we use covariate information associated with edges to estimate the probability of missing vertices. In order to predict missing vertices, we apply the line graph transformation, which converts edges to vertices and vertices to edges. The probability of an edge is obtained by taking the inner product of the vectors of covariates. Moreover, we have extended the methodology of predicting two edges (dyadic ties) to predict edge","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129011178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current web is known as a space with constantly growing interactivity among its users. It is changing from the data storage into a social interaction place where people not only search interesting information, but also communicate and collaborate. Obviously, social networks are the most used places for common interaction among people. We present a method for analysis of the strength of relationships together with their evolution. This method is based on the various user activities in social networks. We evaluate our approach within the Facebook social network.
{"title":"Tracing Strength of Relationships in Social Networks","authors":"Ivan Srba, M. Bieliková","doi":"10.1109/WI-IAT.2010.241","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.241","url":null,"abstract":"Current web is known as a space with constantly growing interactivity among its users. It is changing from the data storage into a social interaction place where people not only search interesting information, but also communicate and collaborate. Obviously, social networks are the most used places for common interaction among people. We present a method for analysis of the strength of relationships together with their evolution. This method is based on the various user activities in social networks. We evaluate our approach within the Facebook social network.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115129724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The k-means method is a widely used clustering technique because of its simplicity and speed. However, the clustering result depends heavily on the chosen initial value. In this report, we propose a seeding method with independent component analysis for the k-means method. Using a benchmark dataset, we evaluate the performance of our proposed method and compare it with other seeding methods.
{"title":"Careful Seeding Based on Independent Component Analysis for k-Means Clustering","authors":"T. Onoda, Miho Sakai, S. Yamada","doi":"10.1109/WI-IAT.2010.102","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.102","url":null,"abstract":"The k-means method is a widely used clustering technique because of its simplicity and speed. However, the clustering result depends heavily on the chosen initial value. In this report, we propose a seeding method with independent component analysis for the k-means method. Using a benchmark dataset, we evaluate the performance of our proposed method and compare it with other seeding methods.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116760249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naoki Yoshinaga, S. Itaya, Rie Tanaka, Taku Konishi, Shinichi Doi, Keiji Yamada, P. Davis
We analyze email communications within a large company to reveal how email activity patterns depend on content. We characterize email contents using keywords and examine statistics of email transmissions. As a result, we are able to identify differences in network structures and propagation behaviors depending on the type of keyword.
{"title":"Content Propagation Analysis of E-mail Communications","authors":"Naoki Yoshinaga, S. Itaya, Rie Tanaka, Taku Konishi, Shinichi Doi, Keiji Yamada, P. Davis","doi":"10.1109/WI-IAT.2010.202","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.202","url":null,"abstract":"We analyze email communications within a large company to reveal how email activity patterns depend on content. We characterize email contents using keywords and examine statistics of email transmissions. As a result, we are able to identify differences in network structures and propagation behaviors depending on the type of keyword.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121190959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We argue that it is more practical to address the ontology mapping self-tuning problem in a whole system context instead of in a single matcher context. In this paper we introduce RMOMS, a Reference Model for Ontology Mapping Systems, consisting of six parts, the Preprocessor, the Dispatcher, the Matcher(s), the Aggregator, the Pruner, and the User Interface, with which to disassemble the self-tuning problem into more feasible units. We propose Maximum Weight Bipartite Graph Matching method for self-tuning matchers and Stable Match method for self-tuning aggregator, and test them in LiSTOMS, a light-weighted prototype sample of RMOMS. With comparison with some notable systems, LiSTOMS shows leading recall rate and competing precision rate.
{"title":"LiSTOMS: A Light-Weighted Self-Tuning Ontology Mapping System","authors":"Zhen Zhen, Junyi Shen, Jinwei Zhao, J. Qian","doi":"10.1109/WI-IAT.2010.173","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.173","url":null,"abstract":"We argue that it is more practical to address the ontology mapping self-tuning problem in a whole system context instead of in a single matcher context. In this paper we introduce RMOMS, a Reference Model for Ontology Mapping Systems, consisting of six parts, the Preprocessor, the Dispatcher, the Matcher(s), the Aggregator, the Pruner, and the User Interface, with which to disassemble the self-tuning problem into more feasible units. We propose Maximum Weight Bipartite Graph Matching method for self-tuning matchers and Stable Match method for self-tuning aggregator, and test them in LiSTOMS, a light-weighted prototype sample of RMOMS. With comparison with some notable systems, LiSTOMS shows leading recall rate and competing precision rate.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115348349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we introduce the problem of providing privacy preserving information for Web indexing, classification, and other information retrieval task. Web pages are represented by a frequency term vector that preserves k-anonymity for all the Web pages. This vector can then be used, for example, to build indexes of classifiers. Our proposal makes use of semantic micro aggregation.
{"title":"Towards Privacy Preserving Information Retrieval through Semantic Microaggregation","authors":"Daniel Abril, G. Navarro-Arribas, V. Torra","doi":"10.1109/WI-IAT.2010.132","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.132","url":null,"abstract":"In this paper we introduce the problem of providing privacy preserving information for Web indexing, classification, and other information retrieval task. Web pages are represented by a frequency term vector that preserves k-anonymity for all the Web pages. This vector can then be used, for example, to build indexes of classifiers. Our proposal makes use of semantic micro aggregation.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124862475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Lucchese, S. Orlando, R. Perego, F. Silvestri, Gabriele Tolomei
Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE). The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.
{"title":"Detecting Task-Based Query Sessions Using Collaborative Knowledge","authors":"C. Lucchese, S. Orlando, R. Perego, F. Silvestri, Gabriele Tolomei","doi":"10.1109/WI-IAT.2010.281","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.281","url":null,"abstract":"Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE). The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance function that takes care of query lexical content and exploits the collaborative knowledge collected by Wiktionary and Wikipedia.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128358424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A metasearch engine queries search engines and collates information returned by them in one result set for the user. Metasearch can be external or internal. In external metasearch, result lists from external, independent search engines are merged. On the other hand, in an internal metasearch, result lists from using different search algorithms on the same corpus are aggregated. Thus result merging is a key function of metasearch. In this work, we propose a model for result merging that is based on the Analytic Hierarchy Process and compares documents and search engines in pair-wise comparison before merging. Our model has the capability to merge result lists based on ranks as well as scores, as returned by search engines. We use the LETOR 2 (LEarning TO Rank) dataset from Microsoft Research Asia for our experiments. When using document ranks, our model improves by 31.60% and 8.58% over the Borda-Fuse and Weighted Borda-Fuse models respectively. When using document scores the improvements are 42.92% and 18.03% respectively.
元搜索引擎查询搜索引擎,并将它们返回的信息整理成一个结果集供用户使用。元搜索可以是外部的也可以是内部的。在外部元搜索中,来自外部独立搜索引擎的结果列表被合并。另一方面,在内部元搜索中,在同一语料库上使用不同搜索算法的结果列表被聚合。因此,结果合并是元搜索的一个关键功能。在这项工作中,我们提出了一个基于层次分析法的结果合并模型,并在合并之前对文档和搜索引擎进行配对比较。我们的模型具有根据搜索引擎返回的排名和分数合并结果列表的能力。我们使用微软亚洲研究院的LETOR 2 (LEarning TO Rank)数据集进行实验。当使用文档排名时,我们的模型分别比Borda-Fuse和加权Borda-Fuse模型提高了31.60%和8.58%。使用文献评分时,提高率分别为42.92%和18.03%。
{"title":"Search Engine Result Aggregation Using Analytical Hierarchy Process","authors":"A. De, Elizabeth D. Diaz, Vijay V. Raghavan","doi":"10.1109/WI-IAT.2010.256","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.256","url":null,"abstract":"A metasearch engine queries search engines and collates information returned by them in one result set for the user. Metasearch can be external or internal. In external metasearch, result lists from external, independent search engines are merged. On the other hand, in an internal metasearch, result lists from using different search algorithms on the same corpus are aggregated. Thus result merging is a key function of metasearch. In this work, we propose a model for result merging that is based on the Analytic Hierarchy Process and compares documents and search engines in pair-wise comparison before merging. Our model has the capability to merge result lists based on ranks as well as scores, as returned by search engines. We use the LETOR 2 (LEarning TO Rank) dataset from Microsoft Research Asia for our experiments. When using document ranks, our model improves by 31.60% and 8.58% over the Borda-Fuse and Weighted Borda-Fuse models respectively. When using document scores the improvements are 42.92% and 18.03% respectively.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128430181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Based on the macro MART model of an organizaion’s informationalization development, and with further application of system dynamics principles, the system dynamics behavior description of MART model and the corresponding calculation method of Jacobian matrix and the characteristic value are offerred. The related balance points or singular points, along with such chaotic features as attractor, saddle point, repulsion point and dissipation, are also analyzed. It is further proved that e-government system development is highly complicated and unstable, the development of information technology does not depend on the e-government system development and the functional level of e-government’s management is the most critical element.
{"title":"Chaotic Analysis on E-government System Development","authors":"Yanzhang Wang, Guirong Xiao, Shengju Han","doi":"10.1109/WI-IAT.2010.107","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.107","url":null,"abstract":"Based on the macro MART model of an organizaion’s informationalization development, and with further application of system dynamics principles, the system dynamics behavior description of MART model and the corresponding calculation method of Jacobian matrix and the characteristic value are offerred. The related balance points or singular points, along with such chaotic features as attractor, saddle point, repulsion point and dissipation, are also analyzed. It is further proved that e-government system development is highly complicated and unstable, the development of information technology does not depend on the e-government system development and the functional level of e-government’s management is the most critical element.","PeriodicalId":197966,"journal":{"name":"Web Intelligence/IAT Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127915947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}