M. Laclavik, Stefan Dlugolinsky, M. Kvassay, L. Hluchý
The article discusses the potential methods and benefits of the analysis of social networks hidden in the enterprise and personal email archives. A proof-of concept prototype was developed. Social network extraction and the spreading activation algorithm are discussed and evaluated.
{"title":"Use of E-mail Social Networks for Enterprise Benefit","authors":"M. Laclavik, Stefan Dlugolinsky, M. Kvassay, L. Hluchý","doi":"10.1109/WI-IAT.2010.126","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.126","url":null,"abstract":"The article discusses the potential methods and benefits of the analysis of social networks hidden in the enterprise and personal email archives. A proof-of concept prototype was developed. Social network extraction and the spreading activation algorithm are discussed and evaluated.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131578026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prakash Mandayam Comar, Pang-Ning Tan, Anil K. Jain
Identifying cohesive subgroups in networks, also known as clustering is an active area of research in link mining with many practical applications. However, most of the early work in this area has focused on partitioning a single network or a bipartite graph into clusters/communities. This paper presents a framework that simultaneously clusters nodes from multiple related networks and learns the correspondences between subgroups in different networks. The framework also allows the incorporation of prior information about potential relationships between the subgroups. We have performed extensive experiments on both synthetic and real-life data sets to evaluate the effectiveness of our framework. Our results show superior performance of simultaneous clustering over independent clustering of individual networks.
{"title":"Identifying Cohesive Subgroups and Their Correspondences in Multiple Related Networks","authors":"Prakash Mandayam Comar, Pang-Ning Tan, Anil K. Jain","doi":"10.1109/WI-IAT.2010.226","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.226","url":null,"abstract":"Identifying cohesive subgroups in networks, also known as clustering is an active area of research in link mining with many practical applications. However, most of the early work in this area has focused on partitioning a single network or a bipartite graph into clusters/communities. This paper presents a framework that simultaneously clusters nodes from multiple related networks and learns the correspondences between subgroups in different networks. The framework also allows the incorporation of prior information about potential relationships between the subgroups. We have performed extensive experiments on both synthetic and real-life data sets to evaluate the effectiveness of our framework. Our results show superior performance of simultaneous clustering over independent clustering of individual networks.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131888771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web cache replacement Algorithms proposed in the literature try to maximize the Hit Ratio (HR), the Byte Hit Ratio (BHR), and the Delay Saving Ratio (DSR). However, even with an infinite Web cache storage capacity, values of these metrics could not exceed 70% most of the time. This is due to the fact that, given a workload, the first reference to an object is always a miss. Moreover, a statistical analysis of the workload shows that as much as 76% of objects are One-Timers (OT), i.e. they are referenced only once. Caching OT objects usually degrade the performance of all Web cache replacement algorithms. In fact, it may cause the eviction of N-Timer (NT) objects and hence increases the number of misses. In this paper, we present a technique to classify whether a cached object is an OT or not. We show through simulation that classification may significantly enhance the performance of replacement algorithms with respect to the HR, the BHR and the DSR.
{"title":"Impact of One-Timer/N-Timer Object Classification on the Performance of Web Cache Replacement Algorithms","authors":"Saloua Messaoud Abid, H. Youssef","doi":"10.1109/WI-IAT.2010.124","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.124","url":null,"abstract":"Web cache replacement Algorithms proposed in the literature try to maximize the Hit Ratio (HR), the Byte Hit Ratio (BHR), and the Delay Saving Ratio (DSR). However, even with an infinite Web cache storage capacity, values of these metrics could not exceed 70% most of the time. This is due to the fact that, given a workload, the first reference to an object is always a miss. Moreover, a statistical analysis of the workload shows that as much as 76% of objects are One-Timers (OT), i.e. they are referenced only once. Caching OT objects usually degrade the performance of all Web cache replacement algorithms. In fact, it may cause the eviction of N-Timer (NT) objects and hence increases the number of misses. In this paper, we present a technique to classify whether a cached object is an OT or not. We show through simulation that classification may significantly enhance the performance of replacement algorithms with respect to the HR, the BHR and the DSR.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133401551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In pervasive computing environment the contexts are usually imprecise and incomplete due to unreliable connectivity, user mobility, and resource constraints. In this paper we present an approach based on the Dempster-Shafer Theory (DST) for the reasoning with imprecise context. To solve the two fundamental issues of the DST, computation intensiveness and the Zadeh paradox, we filer out excrescent subsets based on their energy to reduce the number of subsets, and employ the concept of evidence loss and approval degree of evidence in the combining process.
{"title":"Reasoning with Imprecise Context Using Improved Dempster-Shafer Theory","authors":"C. H. Lyu, Minseuk Choi, Z. Li, H. Youn","doi":"10.1109/WI-IAT.2010.190","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.190","url":null,"abstract":"In pervasive computing environment the contexts are usually imprecise and incomplete due to unreliable connectivity, user mobility, and resource constraints. In this paper we present an approach based on the Dempster-Shafer Theory (DST) for the reasoning with imprecise context. To solve the two fundamental issues of the DST, computation intensiveness and the Zadeh paradox, we filer out excrescent subsets based on their energy to reduce the number of subsets, and employ the concept of evidence loss and approval degree of evidence in the combining process.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115196278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We tackle problems related to Web query formulation: given the set of keywords from a search session, 1) we find a maximum promising Web query, and, 2) we construct a family of promising Web queries covering all keywords. A query is promising if it fulfills user-defined constraints on the number of returned hits. We assume a real-world setting where the user is not given direct access to a search engine's index, i.e., querying is possible only through an interface. The goal to be optimized is the overall number of submitted Web queries. For both problems we develop search strategies based on co-occurrence probabilities. The achieved performance gain is substantial: compared to the uninformed baselines without co-occurrence probabilities the expected savings are up to 50% in the number of submitted queries, index accesses, and runtime.
{"title":"Making the Most of a Web Search Session","authors":"Benno Stein, Matthias Hagen","doi":"10.1109/WI-IAT.2010.234","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.234","url":null,"abstract":"We tackle problems related to Web query formulation: given the set of keywords from a search session, 1) we find a maximum promising Web query, and, 2) we construct a family of promising Web queries covering all keywords. A query is promising if it fulfills user-defined constraints on the number of returned hits. We assume a real-world setting where the user is not given direct access to a search engine's index, i.e., querying is possible only through an interface. The goal to be optimized is the overall number of submitted Web queries. For both problems we develop search strategies based on co-occurrence probabilities. The achieved performance gain is substantial: compared to the uninformed baselines without co-occurrence probabilities the expected savings are up to 50% in the number of submitted queries, index accesses, and runtime.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116883483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interactive dynamic influence diagrams (I-DID) are graphical models for sequential decision making in uncertain settings shared by other agents. Algorithms for solving I-DIDs face the challenge of an exponentially growing space of candidate models ascribed to other agents, over time. Pruning behaviorally equivalent models is one way toward minimizing the model set. We seek to further reduce the complexity by additionally pruning models that are approximately subjectively equivalent. Toward this, we define subjective equivalence in terms of the distribution over the subject agent's future action-observation paths, and introduce the notion of epsilon-subjective equivalence. We present a new approximation technique that reduces the candidate model space by removing models that are epsilon-subjectively equivalent with representative ones.
{"title":"Epsilon-Subjective Equivalence of Models for Interactive Dynamic Influence Diagrams","authors":"Prashant Doshi, Muthukumaran Chandrasekaran, Yi-feng Zeng","doi":"10.1109/WI-IAT.2010.74","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.74","url":null,"abstract":"Interactive dynamic influence diagrams (I-DID) are graphical models for sequential decision making in uncertain settings shared by other agents. Algorithms for solving I-DIDs face the challenge of an exponentially growing space of candidate models ascribed to other agents, over time. Pruning behaviorally equivalent models is one way toward minimizing the model set. We seek to further reduce the complexity by additionally pruning models that are approximately subjectively equivalent. Toward this, we define subjective equivalence in terms of the distribution over the subject agent's future action-observation paths, and introduce the notion of epsilon-subjective equivalence. We present a new approximation technique that reduces the candidate model space by removing models that are epsilon-subjectively equivalent with representative ones.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116167967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuchu Dong, D. Ouyang, Yuxin Ye, Haihong Yu, Yonggang Zhang
To solve the problem of searching for an optimal elimination ordering of Bayesian networks, a novel effective heuristic, MinSum Weight, and an ACS approach incorporated with multi-heuristic mechanism are proposed. The ACS approach named MHC-ACS utilizes a set of heuristics to direct the ants moving in the search space. The cooperation of multiple heuristics helps ants explore more regions. Moreover, the most appropriate heuristic will be identified and be reinforced with the evolution of the whole system. Experiments demonstrate that MHC-ACS has a better performance than other swarm intelligence methods.
{"title":"A Multi-heuristic Cooperative Ant Colony System for Optimizing Elimination Ordering of Bayesian Networks","authors":"Xuchu Dong, D. Ouyang, Yuxin Ye, Haihong Yu, Yonggang Zhang","doi":"10.1109/WI-IAT.2010.33","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.33","url":null,"abstract":"To solve the problem of searching for an optimal elimination ordering of Bayesian networks, a novel effective heuristic, MinSum Weight, and an ACS approach incorporated with multi-heuristic mechanism are proposed. The ACS approach named MHC-ACS utilizes a set of heuristics to direct the ants moving in the search space. The cooperation of multiple heuristics helps ants explore more regions. Moreover, the most appropriate heuristic will be identified and be reinforced with the evolution of the whole system. Experiments demonstrate that MHC-ACS has a better performance than other swarm intelligence methods.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123503024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Distributed Stochastic Algorithm (DSA), Distributed Breakout Algorithm (DBA), and variations such as Distributed Simulated Annealing (DSAN), MGM-1, and DisPeL, are distributed hill-climbing techniques for solving large Distributed Constraint Optimization Problems (DCOPs) such as distributed scheduling, resource allocation, and distributed route planning. Like their centralized counterparts, these algorithms employ escape techniques to avoid getting trapped in local minima during the search process. For example, the best known version of DSA, DSA-B, makes hill-climbing and lateral escape moves, moves that do not impact the solution quality, with a single probability $p$. DSAN uses a similar scheme, but also occasionally makes a move that leads to a worse solution in an effort to find a better overall solution. Although these escape moves tend to lead to a better solutions in the end, the cost of employing the various strategies is often not well understood. In this work, we investigate the costs and benefits of the various escape strategies by empirically evaluating each of these protocols in distributed graph coloring and sensor tracking domains. Through our testing, we discovered that by reducing or eliminating escape moves, the cost of using these algorithms decreases dramatically without significantly affecting solution quality.
{"title":"Getting What You Pay For: Is Exploration in Distributed Hill Climbing Really Worth it?","authors":"Melanie Smith, R. Mailler","doi":"10.1109/WI-IAT.2010.31","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.31","url":null,"abstract":"The Distributed Stochastic Algorithm (DSA), Distributed Breakout Algorithm (DBA), and variations such as Distributed Simulated Annealing (DSAN), MGM-1, and DisPeL, are distributed hill-climbing techniques for solving large Distributed Constraint Optimization Problems (DCOPs) such as distributed scheduling, resource allocation, and distributed route planning. Like their centralized counterparts, these algorithms employ escape techniques to avoid getting trapped in local minima during the search process. For example, the best known version of DSA, DSA-B, makes hill-climbing and lateral escape moves, moves that do not impact the solution quality, with a single probability $p$. DSAN uses a similar scheme, but also occasionally makes a move that leads to a worse solution in an effort to find a better overall solution. Although these escape moves tend to lead to a better solutions in the end, the cost of employing the various strategies is often not well understood. In this work, we investigate the costs and benefits of the various escape strategies by empirically evaluating each of these protocols in distributed graph coloring and sensor tracking domains. Through our testing, we discovered that by reducing or eliminating escape moves, the cost of using these algorithms decreases dramatically without significantly affecting solution quality.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116811548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The vision of the Semantic Web is to lift current Web into semantic repositories where heterogeneous data can be queried and different services can be mashed up. The Web becomes a platform for integrating data and services. The paper discusses the MuzkMesh music portal which mashups existing semantic music data from the Linked Open Data (LOD) bubbles and other common APIs. It aims to demo the power of semantic integration and useful use scenarios on music retrieval and entertainment.
{"title":"Muzk Mesh: Interlinking Semantic Music Data","authors":"M. Singhi, Ying Ding, Yuyin Sun","doi":"10.1109/WI-IAT.2010.162","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.162","url":null,"abstract":"The vision of the Semantic Web is to lift current Web into semantic repositories where heterogeneous data can be queried and different services can be mashed up. The Web becomes a platform for integrating data and services. The paper discusses the MuzkMesh music portal which mashups existing semantic music data from the Linked Open Data (LOD) bubbles and other common APIs. It aims to demo the power of semantic integration and useful use scenarios on music retrieval and entertainment.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117225841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.
{"title":"Predicting Web Search Hit Counts","authors":"Tian Tian, J. Geller, Soon Ae Chun","doi":"10.1109/WI-IAT.2010.227","DOIUrl":"https://doi.org/10.1109/WI-IAT.2010.227","url":null,"abstract":"Keyword-based search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. We present an approach for predicting the number of hits for a given set of query terms. Using word frequencies derived from a large corpus, we construct random samples of combinations of these words as search terms. Then we derive a correlation function between the computed probabilities of search terms and the observed hit counts for them. This regression function is used to predict the hit counts for a user’s new searches, with the intention of avoiding information overload. We report the results of experiments with Google, Yahoo! and Bing to validate our methodology. We further investigate the monotonicity of search results for negative search terms by those three search engines.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124738724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}