In World Wide Web, contents of web documents play important roles in the evolution process because of their effects on linking preference. A majority of topological properties are content-related, and among them the clustering features are sensitive to contents of Web documents. In this paper, we first observe the impacts of content similarity on web links by introducing a metric called Linkage Probability. Then we investigate how contents influence the formation mechanism of the most basic cluster, triangle, with a metric named Triangularization Probability. Experimental results indicate that content similarity has a positive function in the process of cluster formation in theWeb. Theoretical analysis predicts the contents influence on the clustering features in the Web very well.
{"title":"How Contents Influence Clustering Features in the Web","authors":"Christopher Thomas, A. Sheth","doi":"10.1109/WI.2007.93","DOIUrl":"https://doi.org/10.1109/WI.2007.93","url":null,"abstract":"In World Wide Web, contents of web documents play important roles in the evolution process because of their effects on linking preference. A majority of topological properties are content-related, and among them the clustering features are sensitive to contents of Web documents. In this paper, we first observe the impacts of content similarity on web links by introducing a metric called Linkage Probability. Then we investigate how contents influence the formation mechanism of the most basic cluster, triangle, with a metric named Triangularization Probability. Experimental results indicate that content similarity has a positive function in the process of cluster formation in theWeb. Theoretical analysis predicts the contents influence on the clustering features in the Web very well.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122708244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaitali Gupta, Rajdeep Bhowmik, Michael R. Head, M. Govindaraju, Weiyi Meng
There is a critical need to design and develop tools that abstract away the fundamental complexity of XML-based Web services specifications and toolkits, and provide an elegant, intuitive, simple, and powerful query-based invocation system to end users. Web services based tools and standards have been designed to facilitate seamless integration and development for application developers. As a result, current implementations require the end user to have intimate knowledge of Web services and related toolkits, and users often play an informed role in the overall Web services execution process. We employ a self-learning mechanism and a set of algorithms and optimizations to match user queries with corresponding operations in Web services. Our system uses Semantic Web concepts and Ontologies in the process of automating Web services matchmaking. We present performance analysis of our system and quantify the exact gains in precision and recall due to the knowledge acquisition algorithms.
{"title":"Improving Performance of Web Services Query Matchmaking with Automated Knowledge Acquisition","authors":"Chaitali Gupta, Rajdeep Bhowmik, Michael R. Head, M. Govindaraju, Weiyi Meng","doi":"10.1109/WI.2007.66","DOIUrl":"https://doi.org/10.1109/WI.2007.66","url":null,"abstract":"There is a critical need to design and develop tools that abstract away the fundamental complexity of XML-based Web services specifications and toolkits, and provide an elegant, intuitive, simple, and powerful query-based invocation system to end users. Web services based tools and standards have been designed to facilitate seamless integration and development for application developers. As a result, current implementations require the end user to have intimate knowledge of Web services and related toolkits, and users often play an informed role in the overall Web services execution process. We employ a self-learning mechanism and a set of algorithms and optimizations to match user queries with corresponding operations in Web services. Our system uses Semantic Web concepts and Ontologies in the process of automating Web services matchmaking. We present performance analysis of our system and quantify the exact gains in precision and recall due to the knowledge acquisition algorithms.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132052444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autonomy is one of the characteristics that agent has which distinguish agent systems from the other conceptualisations within Computer Science. To prove the validity of intention execution in AgentSpeak, according to the agent's goal, we construct a model-theoretic semantics of AgentSpeak and an informal interpretation of agent program. Then we give an equivalence theorem of intention execution for AgentSpeak that the sequence of actions produced by an agent written in AgentSpeak is equivalent with the intention produced by the model that satisfies the belief set and plan set of agent.
{"title":"The Soundness and Completeness Proof of Agent Intention in AgentSpeak","authors":"Chuming Chen, M. Matthews","doi":"10.1109/WI.2007.102","DOIUrl":"https://doi.org/10.1109/WI.2007.102","url":null,"abstract":"Autonomy is one of the characteristics that agent has which distinguish agent systems from the other conceptualisations within Computer Science. To prove the validity of intention execution in AgentSpeak, according to the agent's goal, we construct a model-theoretic semantics of AgentSpeak and an informal interpretation of agent program. Then we give an equivalence theorem of intention execution for AgentSpeak that the sequence of actions produced by an agent written in AgentSpeak is equivalent with the intention produced by the model that satisfies the belief set and plan set of agent.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132475829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semantic web approach seems interesting for supporting content mining of millions of patents accessible through the Web. In this paper, we describe our approach for generating semantic annotations on patents, by relying on the structure and on a semantic representation of patent documents. We use both the structure of the patent documents and their textual contents processed by Natural Language Processing (NLP) tools. This method, primarily aimed at helping biologists use patent information can be generalized to all kinds of domains or of structured documents.
{"title":"Supporting Patent Mining by using Ontology-based Semantic Annotations","authors":"N. Ghoula, Khaled Khelif, R. Dieng","doi":"10.1109/WI.2007.98","DOIUrl":"https://doi.org/10.1109/WI.2007.98","url":null,"abstract":"Semantic web approach seems interesting for supporting content mining of millions of patents accessible through the Web. In this paper, we describe our approach for generating semantic annotations on patents, by relying on the structure and on a semantic representation of patent documents. We use both the structure of the patent documents and their textual contents processed by Natural Language Processing (NLP) tools. This method, primarily aimed at helping biologists use patent information can be generalized to all kinds of domains or of structured documents.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126659016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the issue of researcher profiling. By researcher profiling, we mean building a semantic profile for an academic researcher, by identifying and annotating information from the Web. Previously, person profile annotation was often undertaken separately in an ad-hoc fashion. This paper first gives a formalization of the entire problem and proposes a unified approach to perform the task using Conditional Random Fields (CRF). The paper shows that with introduction of a set of tags, most of the annotation tasks can be performed within this approach. Experiments show that significant improvements over the separated method can be obtained, because the subtasks of annotation are interdependent and should be performed together. The method has been applied to expert finding. Experimental results show that the performance of expert finding can be significantly improved by using the profiling method.
{"title":"A Unified Approach to Researcher Profiling","authors":"Limin Yao, Jie Tang, Juan-Zi Li","doi":"10.1109/WI.2007.14","DOIUrl":"https://doi.org/10.1109/WI.2007.14","url":null,"abstract":"This paper addresses the issue of researcher profiling. By researcher profiling, we mean building a semantic profile for an academic researcher, by identifying and annotating information from the Web. Previously, person profile annotation was often undertaken separately in an ad-hoc fashion. This paper first gives a formalization of the entire problem and proposes a unified approach to perform the task using Conditional Random Fields (CRF). The paper shows that with introduction of a set of tags, most of the annotation tasks can be performed within this approach. Experiments show that significant improvements over the separated method can be obtained, because the subtasks of annotation are interdependent and should be performed together. The method has been applied to expert finding. Experimental results show that the performance of expert finding can be significantly improved by using the profiling method.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116002667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the task of extracting focused salient information items, relevant and important for a given topic, from a large encyclopedic resource. Specifically, for a given topic (a Wikipedia article) we identify snippets from other articles in Wikipedia that contain important information for the topic of the original article, without duplicates. We compare several methods for addressing the task, and find that a mixture of content-based, link-based, and layout-based features outperforms other methods, especially in combination with the use of so-called reference corpora that capture the key properties of entities of a common type.
{"title":"Fact Discovery in Wikipedia","authors":"S. F. Adafre, V. Jijkoun, M. de Rijke","doi":"10.1109/WI.2007.57","DOIUrl":"https://doi.org/10.1109/WI.2007.57","url":null,"abstract":"We address the task of extracting focused salient information items, relevant and important for a given topic, from a large encyclopedic resource. Specifically, for a given topic (a Wikipedia article) we identify snippets from other articles in Wikipedia that contain important information for the topic of the original article, without duplicates. We compare several methods for addressing the task, and find that a mixture of content-based, link-based, and layout-based features outperforms other methods, especially in combination with the use of so-called reference corpora that capture the key properties of entities of a common type.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131879767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In stream data mining it is important to use the most recent data to cope with the evolving nature of the underlying patterns. Simply keeping the most recent records offers no flexibility about which data is kept, and does not exploit even minimal redundancies in the data (a first step towards pattern discovery). This paper focuses in how to construct and maintain efficiently (in one pass) a compact summary for data such as web logs and text streams. The resulting structure is a prefix tree, with ordering criterion that changes with time, such as an activity time stamp or attribute frequency. A detailed analysis of the factors that affect its performance is carried out, including empirical evaluations using the well known 20 Newsgroups data set. Guidelines for forgetting and tree pruning are also provided. Finally, we use this data structure to discover evolving topics from the 20 Newsgroups.
{"title":"Summarizing Evolving Data Streams using Dynamic Prefix Trees","authors":"Carlos Rojas, O. Nasraoui","doi":"10.1109/WI.2007.114","DOIUrl":"https://doi.org/10.1109/WI.2007.114","url":null,"abstract":"In stream data mining it is important to use the most recent data to cope with the evolving nature of the underlying patterns. Simply keeping the most recent records offers no flexibility about which data is kept, and does not exploit even minimal redundancies in the data (a first step towards pattern discovery). This paper focuses in how to construct and maintain efficiently (in one pass) a compact summary for data such as web logs and text streams. The resulting structure is a prefix tree, with ordering criterion that changes with time, such as an activity time stamp or attribute frequency. A detailed analysis of the factors that affect its performance is carried out, including empirical evaluations using the well known 20 Newsgroups data set. Guidelines for forgetting and tree pruning are also provided. Finally, we use this data structure to discover evolving topics from the 20 Newsgroups.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116123185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit structure in a page often reflects the underlying semantics of the data. Unfortunately, exploiting this information presents significant challenges due to the immense amount of implicitly structured content on the web, lack of schema information, and unknown source quality. We present TQA, a web-scale system for automatic question answering that is often able to find answers to real natural language questions from the implicitly structured content on the web. Our experiments over more than 200 million structures extracted from a partial web crawl demonstrate the promise of our approach.
{"title":"Question Answering over Implicitly Structured Web Content","authors":"Eugene Agichtein, C. Burges, Eric Brill","doi":"10.1109/WI.2007.88","DOIUrl":"https://doi.org/10.1109/WI.2007.88","url":null,"abstract":"Implicitly structured content on the Web such as HTML tables and lists can be extremely valuable for web search, question answering, and information retrieval, as the implicit structure in a page often reflects the underlying semantics of the data. Unfortunately, exploiting this information presents significant challenges due to the immense amount of implicitly structured content on the web, lack of schema information, and unknown source quality. We present TQA, a web-scale system for automatic question answering that is often able to find answers to real natural language questions from the implicitly structured content on the web. Our experiments over more than 200 million structures extracted from a partial web crawl demonstrate the promise of our approach.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114808766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahmood Neshati, Ali Alijamaat, H. Abolhassani, Afshin Rahimi, Mehdi Hosseini
Taxonomy learning is one of the major steps in ontology learning process. Manual construction of taxonomies is a time-consuming and cumbersome task. Recently many researchers have focused on automatic taxonomy learning, but still quality of generated taxonomies is not satisfactory. In this paper we have proposed a new compound similarity measure. This measure is based on both knowledge poor and knowledge rich approaches to find word similarity. We also used Machine Learning Technique (Neural Network model) for combination of several similarity methods. We have compared our method with simple syntactic similarity measure. Our measure considerably improves the precision and recall of automatic generated taxonomies.
{"title":"Taxonomy Learning Using Compound Similarity Measure","authors":"Mahmood Neshati, Ali Alijamaat, H. Abolhassani, Afshin Rahimi, Mehdi Hosseini","doi":"10.1109/WI.2007.99","DOIUrl":"https://doi.org/10.1109/WI.2007.99","url":null,"abstract":"Taxonomy learning is one of the major steps in ontology learning process. Manual construction of taxonomies is a time-consuming and cumbersome task. Recently many researchers have focused on automatic taxonomy learning, but still quality of generated taxonomies is not satisfactory. In this paper we have proposed a new compound similarity measure. This measure is based on both knowledge poor and knowledge rich approaches to find word similarity. We also used Machine Learning Technique (Neural Network model) for combination of several similarity methods. We have compared our method with simple syntactic similarity measure. Our measure considerably improves the precision and recall of automatic generated taxonomies.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115086672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI planning is the main stream method for automatic semantic web service composition (SWSC) research. However, planning based SWSC method can only return service composition upon user requirement description and lacks flexibility to deal with environment change. Deliberate agent architecture, such as BDI agent, is hopeful to make SWSC more intelligent. In this paper, we propose an automatic SWSC enabling method for AgentSpeak agent. Firstly, conversion algorithm from OWL-S web service description to agent's plan set (OWLS2APS) is presented. Target service is converted to agent's goal and related services are converted into agent's plan set. Then, SWSC is automatically performed through agent's intention formation. Agent invokes web service according to service sequence converted back from its intention. Agent can behave rationally with rules or ask for human intervention when SWSC or service invocation is not feasible. At last, a case study on enterprise credit rating service composition is presented to illustrate the method.
{"title":"Automatic Semantic Web Service Composition via Agent Intention Execution in AgentSpeak","authors":"Huan Li, Zheng Qin, Fan Yu, Jun Qin, Bo Yang","doi":"10.1109/WI.2007.25","DOIUrl":"https://doi.org/10.1109/WI.2007.25","url":null,"abstract":"AI planning is the main stream method for automatic semantic web service composition (SWSC) research. However, planning based SWSC method can only return service composition upon user requirement description and lacks flexibility to deal with environment change. Deliberate agent architecture, such as BDI agent, is hopeful to make SWSC more intelligent. In this paper, we propose an automatic SWSC enabling method for AgentSpeak agent. Firstly, conversion algorithm from OWL-S web service description to agent's plan set (OWLS2APS) is presented. Target service is converted to agent's goal and related services are converted into agent's plan set. Then, SWSC is automatically performed through agent's intention formation. Agent invokes web service according to service sequence converted back from its intention. Agent can behave rationally with rules or ask for human intervention when SWSC or service invocation is not feasible. At last, a case study on enterprise credit rating service composition is presented to illustrate the method.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134206222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}