Pub Date : 2012-11-01DOI: 10.1504/IJKWI.2012.050283
T. Ichimura, Shin Kamada, Kosuke Kato
A self organising map (SOM) is trained using unsupervised learning to produce a two-dimensional discretised representation of input space of the training cases. Growing hierarchical SOM is an architecture which grows both in a hierarchical way representing the structure of data distribution and in a horizontal way representing the size of each individual maps. The control method of the growing degree by pruning off the redundant branch of hierarchy in SOM has been proposed and the criteria were designed by the adjustment of parameter settings according to a quantisation error and the size of map. Moreover, the interface tool for the proposed method called the interactive GHSOM has been developed. The interactive GHSOM can determine the knowledge of classification from the hierarchy of structure. A smartphone-based tourist participatory sensing system has been developed in Android smartphone. The system can collect tourist subjective data which includes jpeg files with GPS, geographic location name, the evaluation, and comments written in natural language at sightseeing spot. In this paper, we classified the subjective data by interactive GHSOM and extracted the rules by C4.5. After the interactive GHSOM implementation, the structure of the extracted rules became a lucid expression.
{"title":"Knowledge discovery of tourist subjective data in smartphone-based participatory sensing system by interactive growing hierarchical SOM and C4.5","authors":"T. Ichimura, Shin Kamada, Kosuke Kato","doi":"10.1504/IJKWI.2012.050283","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.050283","url":null,"abstract":"A self organising map (SOM) is trained using unsupervised learning to produce a two-dimensional discretised representation of input space of the training cases. Growing hierarchical SOM is an architecture which grows both in a hierarchical way representing the structure of data distribution and in a horizontal way representing the size of each individual maps. The control method of the growing degree by pruning off the redundant branch of hierarchy in SOM has been proposed and the criteria were designed by the adjustment of parameter settings according to a quantisation error and the size of map. Moreover, the interface tool for the proposed method called the interactive GHSOM has been developed. The interactive GHSOM can determine the knowledge of classification from the hierarchy of structure. A smartphone-based tourist participatory sensing system has been developed in Android smartphone. The system can collect tourist subjective data which includes jpeg files with GPS, geographic location name, the evaluation, and comments written in natural language at sightseeing spot. In this paper, we classified the subjective data by interactive GHSOM and extracted the rules by C4.5. After the interactive GHSOM implementation, the structure of the extracted rules became a lucid expression.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126985998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1504/IJKWI.2012.050282
Shinpei Yagi, Keiichi Tamura, H. Kitakami
An approximate query, which is an approximate pattern matching in sequence databases, is one of the most important techniques for many different areas, such as computational biology, text mining, web intelligence and pattern recognition; it returns many similar sub-sequences. In this paper, we refer to a set of such similar sub-sequences as a mismatch cluster. To support users who execute an approximate query on a sequence database to find the regularities of approximate patterns that similar to the query pattern, we have developed the stepwise generalisation method that extracts a reduced expression, called a minimum generalised set, from a mismatch cluster. This paper proposes a novel parallelisation model with a hierarchical task pool for the parallel processing of the stepwise generalisation method on a multi-core PC cluster. To manage tasks efficiently on multi-core CPUs, the proposed model uses the hierarchical task pool and an efficient hierarchical dynamic load balancing technique. We evaluate the proposed method using real protein sequences on an actual multi-core PC cluster. Experimental results confirm that the proposed method performs well on multi-core CPUs and on a multi-core PC cluster.
{"title":"Parallel processing for stepwise generalisation method on multi-core PC cluster","authors":"Shinpei Yagi, Keiichi Tamura, H. Kitakami","doi":"10.1504/IJKWI.2012.050282","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.050282","url":null,"abstract":"An approximate query, which is an approximate pattern matching in sequence databases, is one of the most important techniques for many different areas, such as computational biology, text mining, web intelligence and pattern recognition; it returns many similar sub-sequences. In this paper, we refer to a set of such similar sub-sequences as a mismatch cluster. To support users who execute an approximate query on a sequence database to find the regularities of approximate patterns that similar to the query pattern, we have developed the stepwise generalisation method that extracts a reduced expression, called a minimum generalised set, from a mismatch cluster. This paper proposes a novel parallelisation model with a hierarchical task pool for the parallel processing of the stepwise generalisation method on a multi-core PC cluster. To manage tasks efficiently on multi-core CPUs, the proposed model uses the hierarchical task pool and an efficient hierarchical dynamic load balancing technique. We evaluate the proposed method using real protein sequences on an actual multi-core PC cluster. Experimental results confirm that the proposed method performs well on multi-core CPUs and on a multi-core PC cluster.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131748276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1504/IJKWI.2012.050286
Akira Hara, Haruko Tanaka, T. Ichimura, T. Takahama
Rule extraction from database by soft computing methods is important for knowledge acquisition. For example, knowledge from the web pages can be useful for information retrieval. When genetic programming (GP) is applied to rule extraction from a database, the attributes of data are often used for the terminal symbols. However, the real databases have a large number of attributes. Therefore, the size of the terminal set increases and the search space becomes vast. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by using the clusters for terminals instead of original attributes, the number of terminal symbols can be reduced. Therefore, the search space can be reduced. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. We applied our proposed methods to two many-attribute datasets, the classification of molecules as a benchmark problem and the page rank learning for information retrieval. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate rules.
{"title":"Knowledge acquisition from many-attribute data by genetic programming with clustered terminal symbols","authors":"Akira Hara, Haruko Tanaka, T. Ichimura, T. Takahama","doi":"10.1504/IJKWI.2012.050286","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.050286","url":null,"abstract":"Rule extraction from database by soft computing methods is important for knowledge acquisition. For example, knowledge from the web pages can be useful for information retrieval. When genetic programming (GP) is applied to rule extraction from a database, the attributes of data are often used for the terminal symbols. However, the real databases have a large number of attributes. Therefore, the size of the terminal set increases and the search space becomes vast. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by using the clusters for terminals instead of original attributes, the number of terminal symbols can be reduced. Therefore, the search space can be reduced. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. We applied our proposed methods to two many-attribute datasets, the classification of molecules as a benchmark problem and the page rank learning for information retrieval. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate rules.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126309453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1504/IJKWI.2012.050284
S. Matsumoto, T. Kashima, T. Matsui, Kazuaki Nakamura, T. Matsutomi
The authors have participated in an activity for town development named 'Machi-POS' which is a research project to develop a service by totally supporting management of independent small restaurants within a certain area. As a part effort of Machi-POS, this study firstly develops an inventory management system cooperating with POS system and web-based automatic menu order feature. The system will contribute to reduce the opportunity loss, and will make the improvement of service quality and sales possible at lower labour cost. The system also provides POS services cheaply and efficiently by using common devices. This study makes it available for two or more restaurants in restaurant district, and gives additional function to support safety and security of foods. The multiple-user capability enables restaurants to share inventory information to each other. Therefore, reducing disposal loss will be further promoted because overstock or shortage is adjustable between restaurants with Machi-POS.
{"title":"Machi-POS - point of sales system for restaurant district","authors":"S. Matsumoto, T. Kashima, T. Matsui, Kazuaki Nakamura, T. Matsutomi","doi":"10.1504/IJKWI.2012.050284","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.050284","url":null,"abstract":"The authors have participated in an activity for town development named 'Machi-POS' which is a research project to develop a service by totally supporting management of independent small restaurants within a certain area. As a part effort of Machi-POS, this study firstly develops an inventory management system cooperating with POS system and web-based automatic menu order feature. The system will contribute to reduce the opportunity loss, and will make the improvement of service quality and sales possible at lower labour cost. The system also provides POS services cheaply and efficiently by using common devices. This study makes it available for two or more restaurants in restaurant district, and gives additional function to support safety and security of foods. The multiple-user capability enables restaurants to share inventory information to each other. Therefore, reducing disposal loss will be further promoted because overstock or shortage is adjustable between restaurants with Machi-POS.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129917959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1504/IJKWI.2012.050285
Taichi Uraji, Kenichi Takahashi
This paper discusses the group control of elevators in the web monitoring system for improving efficiency and saving energy; an efficient control method for multi-car elevator using reinforcement learning is proposed. In the method, the control agent selects the best strategy among three strategies, namely distance-strategy, passenger-strategy, and zone-strategy, according to traffic flow. The control agent takes the number of total passengers and the distance from the departure floor to the destination floor of a call into account. Through experiments, the performance of the proposed method is shown; the average service time of the proposed method is compared with the average service time for the cases where the car assignment is made by each of the three strategies.
{"title":"Assignment strategy selection for multi-car elevator group control using reinforcement learning","authors":"Taichi Uraji, Kenichi Takahashi","doi":"10.1504/IJKWI.2012.050285","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.050285","url":null,"abstract":"This paper discusses the group control of elevators in the web monitoring system for improving efficiency and saving energy; an efficient control method for multi-car elevator using reinforcement learning is proposed. In the method, the control agent selects the best strategy among three strategies, namely distance-strategy, passenger-strategy, and zone-strategy, according to traffic flow. The control agent takes the number of total passengers and the distance from the departure floor to the destination floor of a call into account. Through experiments, the performance of the proposed method is shown; the average service time of the proposed method is compared with the average service time for the cases where the car assignment is made by each of the three strategies.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124435843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-01DOI: 10.1504/IJKWI.2012.048164
Soichi Murai, Taketoshi Ushiama
Recently, the electronic book (e-book) market is growing rapidly and people have been able to choose e-books that they would like to read from a large amount of e-books. Therefore, techniques for finding efficiently one or more sufficient books that have something worth reading from vast numbers of e-books are demanded. In order to support a user to select books, many techniques for searching and recommending books have been proposed. However, the user would have to decide whether each book in candidates is worth reading. In this paper, we introduce a method for supporting a user to decide whether he/she read an e-book novel or not effectively. Our method recommends a user sentences that would attract and/or interest the user in an e-book novel. In our method, firstly, the attractiveness of each term in a novel is calculated based on reviews about the novel on the web. Then, the attractiveness of each sentence in the novel is calculated based on the attractiveness of the terms. This paper shows the experimental results of our method and discusses its effectiveness.
{"title":"Review-based recommendation of attractive sentences in a novel for effective browsing","authors":"Soichi Murai, Taketoshi Ushiama","doi":"10.1504/IJKWI.2012.048164","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.048164","url":null,"abstract":"Recently, the electronic book (e-book) market is growing rapidly and people have been able to choose e-books that they would like to read from a large amount of e-books. Therefore, techniques for finding efficiently one or more sufficient books that have something worth reading from vast numbers of e-books are demanded. In order to support a user to select books, many techniques for searching and recommending books have been proposed. However, the user would have to decide whether each book in candidates is worth reading. In this paper, we introduce a method for supporting a user to decide whether he/she read an e-book novel or not effectively. Our method recommends a user sentences that would attract and/or interest the user in an e-book novel. In our method, firstly, the attractiveness of each term in a novel is calculated based on reviews about the novel on the web. Then, the attractiveness of each sentence in the novel is calculated based on the attractiveness of the terms. This paper shows the experimental results of our method and discusses its effectiveness.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117253027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-01DOI: 10.1504/IJKWI.2012.048163
Ryo Okamoto, A. Kashihara
The purpose of presentation rehearsal is to enable a presenter to be aware of insufficiency or incompleteness of his/her knowledge and refining the knowledge. In our study, we have proposed a framework of the presentation rehearsal support system to assist the peers to review the presentation in the rehearsal, and have developed a prototype system. During the presentation rehearsal, the system accumulates a lot of review comments from the peers to represent and organise them, which is used for the presenter to revise his/her presentation contents. In this paper, we propose the 'back-review' method for utilisation of the review comments for the revision of the presentation slides.
{"title":"Back-review support system for presentation rehearsal review","authors":"Ryo Okamoto, A. Kashihara","doi":"10.1504/IJKWI.2012.048163","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.048163","url":null,"abstract":"The purpose of presentation rehearsal is to enable a presenter to be aware of insufficiency or incompleteness of his/her knowledge and refining the knowledge. In our study, we have proposed a framework of the presentation rehearsal support system to assist the peers to review the presentation in the rehearsal, and have developed a prototype system. During the presentation rehearsal, the system accumulates a lot of review comments from the peers to represent and organise them, which is used for the presenter to revise his/her presentation contents. In this paper, we propose the 'back-review' method for utilisation of the review comments for the revision of the presentation slides.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133038975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-01DOI: 10.1504/IJKWI.2012.048165
Yuki Nonaka, M. Hasegawa
In the cognitive radio networks, spectrum resources are dynamically shared among the users for radio resource usage optimisation. However, depending on the distributed decision-making algorithms of the spectrum sharing procedure, undesirable chaotic switching may occur in the network. For chaotic non-linear systems, the chaos control theory has been proposed and applied to a wide variety of non-linear systems to stabilise such systems. In this paper, the delay feedback method for chaos control is applied to the chaotic phenomena in the cognitive radio networks. We show that our method can stabilise the chaotic oscillations in the user-centric cognitive radio networks. By comparisons with conventional parameter tuning methods, we confirm that our control approach is more efficient and faster to stabilise the cognitive radio systems.
{"title":"Chaotic oscillations in user-centric cognitive radio networks and their control","authors":"Yuki Nonaka, M. Hasegawa","doi":"10.1504/IJKWI.2012.048165","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.048165","url":null,"abstract":"In the cognitive radio networks, spectrum resources are dynamically shared among the users for radio resource usage optimisation. However, depending on the distributed decision-making algorithms of the spectrum sharing procedure, undesirable chaotic switching may occur in the network. For chaotic non-linear systems, the chaos control theory has been proposed and applied to a wide variety of non-linear systems to stabilise such systems. In this paper, the delay feedback method for chaos control is applied to the chaotic phenomena in the cognitive radio networks. We show that our method can stabilise the chaotic oscillations in the user-centric cognitive radio networks. By comparisons with conventional parameter tuning methods, we confirm that our control approach is more efficient and faster to stabilise the cognitive radio systems.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116883588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-07-01DOI: 10.1504/IJKWI.2012.048162
Xicen Zhang, Yuki Hayashi, T. Kojiri, Toyohide Watanabe
When Chinese students study Japanese, they sometimes find it difficult to understand the grammar of Japanese correctly because Chinese and Japanese use the same Chinese characters in a different way. Our research focuses on the students' exercises of translating Chinese sentences into Japanese, and enables students to acquire correct knowledge based on translated sentences. For this objective, our Chinese-Japanese translation process is divided into three steps: translation of words, ordering of words, and addition of functional expressions to complete the sentence. The system compares translated sentences written by students with the correct answer preset in the system and specifies the step in which students might be mistaken. Then, the system gives a result of evaluating the translated sentence with the explanation of incorrect translation. Since the usage of Japanese functional expressions is the most difficult for the Chinese students translating a sentence, our system supports students learning Japanese functional expressions by considering the factors of the mistaken functional expressions. According to the mistake information, the system selects another question from the question database, which has the same factor as the mistaken functional expression. Experimental results showed that our system was effective in detecting the mistakes, especially mistakes of words and Japanese functional expressions.
{"title":"Mistake detection method in Chinese-Japanese translation for Japanese learning support","authors":"Xicen Zhang, Yuki Hayashi, T. Kojiri, Toyohide Watanabe","doi":"10.1504/IJKWI.2012.048162","DOIUrl":"https://doi.org/10.1504/IJKWI.2012.048162","url":null,"abstract":"When Chinese students study Japanese, they sometimes find it difficult to understand the grammar of Japanese correctly because Chinese and Japanese use the same Chinese characters in a different way. Our research focuses on the students' exercises of translating Chinese sentences into Japanese, and enables students to acquire correct knowledge based on translated sentences. For this objective, our Chinese-Japanese translation process is divided into three steps: translation of words, ordering of words, and addition of functional expressions to complete the sentence. The system compares translated sentences written by students with the correct answer preset in the system and specifies the step in which students might be mistaken. Then, the system gives a result of evaluating the translated sentence with the explanation of incorrect translation. Since the usage of Japanese functional expressions is the most difficult for the Chinese students translating a sentence, our system supports students learning Japanese functional expressions by considering the factors of the mistaken functional expressions. According to the mistake information, the system selects another question from the question database, which has the same factor as the mistaken functional expression. Experimental results showed that our system was effective in detecting the mistakes, especially mistakes of words and Japanese functional expressions.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130608013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1504/IJKWI.2011.044122
Y. S. Mudhasir, J. Deepika, S. Sendhilkumar
Any existing search engine suffers the problem of redundancy in their search results. Detecting and eliminating such redundancy (near-duplicates) is one thrust area of research conducted widely by many search engine researchers. Provenance-based factors would improve the web search in view of providing beneficial quality content to the user. For users, many factors that affect personalisation may prove to be useful in determining the quality and trust in web documents. Also provenance information is helpful in filtering near duplicates from search results based on 6W factors. Hence this paper is aimed towards developing a web search system using provenance-based technique of near-duplicates detection and elimination. This system incorporates a personalised crawler (focused crawler) for computing author credentials which contributes to the trustworthiness of a web document. Finally, the results of the proposed system are compared with existing algorithms using a test bed of web documents.
{"title":"An evaluation of provenance-based near-duplicates detection","authors":"Y. S. Mudhasir, J. Deepika, S. Sendhilkumar","doi":"10.1504/IJKWI.2011.044122","DOIUrl":"https://doi.org/10.1504/IJKWI.2011.044122","url":null,"abstract":"Any existing search engine suffers the problem of redundancy in their search results. Detecting and eliminating such redundancy (near-duplicates) is one thrust area of research conducted widely by many search engine researchers. Provenance-based factors would improve the web search in view of providing beneficial quality content to the user. For users, many factors that affect personalisation may prove to be useful in determining the quality and trust in web documents. Also provenance information is helpful in filtering near duplicates from search results based on 6W factors. Hence this paper is aimed towards developing a web search system using provenance-based technique of near-duplicates detection and elimination. This system incorporates a personalised crawler (focused crawler) for computing author credentials which contributes to the trustworthiness of a web document. Finally, the results of the proposed system are compared with existing algorithms using a test bed of web documents.","PeriodicalId":113936,"journal":{"name":"Int. J. Knowl. Web Intell.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114611993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}