Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586363
Hien T. Nguyen, T. Cao
Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.
{"title":"Named entity disambiguation on an ontology enriched by Wikipedia","authors":"Hien T. Nguyen, T. Cao","doi":"10.1109/RIVF.2008.4586363","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586363","url":null,"abstract":"Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than pursuing rule-based methods as in literature, we employ a machine learning model to not only disambiguate but also identify named entities. In addition, our method explores in details the use of a range of features extracted from texts, a given ontology, and Wikipedia data for disambiguation. This paper also systematically analyzes impacts of the features on disambiguation accuracy by varying their combinations for representing named entities. Empirical evaluation shows that, while the ontology provides basic features of named entities, Wikipedia is a fertile source for additional features to construct accurate and robust named entity disambiguation systems.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115145886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586343
Cong Duy Vu Hoang, Mai Ngo, Dinh Dien
Reordering is of crucial importance for machine translation. Solving the reordering problem can lead to remarkable improvements in translation performance. In this paper, we propose a novel approach to solve the word reordering problem in statistical machine translation. We rely on the dependency relations retrieved from a statistical parser incorporating with linguistic hand-crafted rules to create the transformations. These dependency-based transformations can produce the problem of word movement on both phrase and word reordering which is a difficult problem on parse tree based approaches. Such transformations are then applied as a preprocessor to English language both in training and decoding process to obtain an underlying word order closer to the Vietnamese language. About the hand-crafted rules, we extract from the syntactic differences of word order between English and Vietnamese language. This approach is simple and easy to implement with a small rule set, not lead to the rule explosion. We describe the experiments using our model on VCLEVC corpus [18] and consider the translation from English to Vietnamese, showing significant improvements about 2-4% BLEU score in comparison with the MOSES phrase-based baseline system [19].
{"title":"A dependency-based word reordering approach for Statistical Machine Translation","authors":"Cong Duy Vu Hoang, Mai Ngo, Dinh Dien","doi":"10.1109/RIVF.2008.4586343","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586343","url":null,"abstract":"Reordering is of crucial importance for machine translation. Solving the reordering problem can lead to remarkable improvements in translation performance. In this paper, we propose a novel approach to solve the word reordering problem in statistical machine translation. We rely on the dependency relations retrieved from a statistical parser incorporating with linguistic hand-crafted rules to create the transformations. These dependency-based transformations can produce the problem of word movement on both phrase and word reordering which is a difficult problem on parse tree based approaches. Such transformations are then applied as a preprocessor to English language both in training and decoding process to obtain an underlying word order closer to the Vietnamese language. About the hand-crafted rules, we extract from the syntactic differences of word order between English and Vietnamese language. This approach is simple and easy to implement with a small rule set, not lead to the rule explosion. We describe the experiments using our model on VCLEVC corpus [18] and consider the translation from English to Vietnamese, showing significant improvements about 2-4% BLEU score in comparison with the MOSES phrase-based baseline system [19].","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115689832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586338
D. C. Nguyen, Guanghu Shen, Ho-Youl Jung, Hyun-Yeol Chung
In this paper, we present some methods to improve the performance of microphone array speech recognition system based on Limabeam algorithm. For improving recognition accuracy, we proposed weighted Mahalanobis distance (WMD) based on traditional distance measure in a Gaussian classifier and is a modified method to give weights for different features in it according to their distances after the variance normalization. Experimental results showed that Limabeam adopted weighted Mahalanobis distance measure (WMD-Limabeam) improves recognition performance significantly than those by original Limabeam. In compared experiments with some other extended versions of Limabeam algorithm such as subband Limabeam and N-best parallel model for unsupervised Limabeam, we could see that the WMD-Limabeam show higher recognition accuracy. In cases of the system that adopted WMD, we obtained correct word recognition rate of 89.4% for calibrate Limabeam and 84.6% for unsupervised Limabeam, 3.0% and 5.0% higher than original Limabeam respectively. This rate also results in 9.0% higher than delay and sum algorithm.
{"title":"Performance improvement of speech recognition system using microphone array","authors":"D. C. Nguyen, Guanghu Shen, Ho-Youl Jung, Hyun-Yeol Chung","doi":"10.1109/RIVF.2008.4586338","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586338","url":null,"abstract":"In this paper, we present some methods to improve the performance of microphone array speech recognition system based on Limabeam algorithm. For improving recognition accuracy, we proposed weighted Mahalanobis distance (WMD) based on traditional distance measure in a Gaussian classifier and is a modified method to give weights for different features in it according to their distances after the variance normalization. Experimental results showed that Limabeam adopted weighted Mahalanobis distance measure (WMD-Limabeam) improves recognition performance significantly than those by original Limabeam. In compared experiments with some other extended versions of Limabeam algorithm such as subband Limabeam and N-best parallel model for unsupervised Limabeam, we could see that the WMD-Limabeam show higher recognition accuracy. In cases of the system that adopted WMD, we obtained correct word recognition rate of 89.4% for calibrate Limabeam and 84.6% for unsupervised Limabeam, 3.0% and 5.0% higher than original Limabeam respectively. This rate also results in 9.0% higher than delay and sum algorithm.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128502406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586336
T. Le, L. Bui
The human face image recognition is one of the prominent problems at present. Recognizing human faces correctly will aid some fields such as national defense and person verification. One of the most vital processing of recognizing face images is to detect human faces in the images. Some approaches have been used to detect human faces. However, they still have some limitations. In the paper, we will consider some popular methods, AdaBoost, artificial neural network (ANN) etc., for detecting human faces. Then we will propose a hybrid model of combining AdaBoost and artificial neural network to solve the process efficiently. The system which was build from the proposed model has been conducted on database CalTech. The recognition correctness is more than 96%. It shows the feasibility of the proposed model.
{"title":"A hybrid approach of AdaBoost and Artificial Neural Network for detecting human faces","authors":"T. Le, L. Bui","doi":"10.1109/RIVF.2008.4586336","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586336","url":null,"abstract":"The human face image recognition is one of the prominent problems at present. Recognizing human faces correctly will aid some fields such as national defense and person verification. One of the most vital processing of recognizing face images is to detect human faces in the images. Some approaches have been used to detect human faces. However, they still have some limitations. In the paper, we will consider some popular methods, AdaBoost, artificial neural network (ANN) etc., for detecting human faces. Then we will propose a hybrid model of combining AdaBoost and artificial neural network to solve the process efficiently. The system which was build from the proposed model has been conducted on database CalTech. The recognition correctness is more than 96%. It shows the feasibility of the proposed model.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126011051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586326
Anh Nguyen-Xuan, C. Tijus
This paper presents a study about human transfer of learning between isomorphs and on the conditions of rule discovery that could be of help for machine learning. When faced with a new problem, the human learner uses the knowledge s/he already possesses in order to mentally represent and manipulate the objects s/he has to deal within the process of problem solving. We propose that familiar domain knowledge provide concepts as useful biases for discovering general rules when solving isomorphic problems as well as problems which entail a larger problem space. Results of two experiments using isomorphic versions of the ldquorule discoveryrdquo Nim game, and versions that entail a larger problem space, suggest that participants use the even concept in the familiar domain according to external representations that can either predictably favor or impair learning and transfer in a foreseeable way.
{"title":"Rules discovery: Transfer and generalization","authors":"Anh Nguyen-Xuan, C. Tijus","doi":"10.1109/RIVF.2008.4586326","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586326","url":null,"abstract":"This paper presents a study about human transfer of learning between isomorphs and on the conditions of rule discovery that could be of help for machine learning. When faced with a new problem, the human learner uses the knowledge s/he already possesses in order to mentally represent and manipulate the objects s/he has to deal within the process of problem solving. We propose that familiar domain knowledge provide concepts as useful biases for discovering general rules when solving isomorphic problems as well as problems which entail a larger problem space. Results of two experiments using isomorphic versions of the ldquorule discoveryrdquo Nim game, and versions that entail a larger problem space, suggest that participants use the even concept in the familiar domain according to external representations that can either predictably favor or impair learning and transfer in a foreseeable way.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124428195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586356
N. D. Pham, Trong Duc Le, Hyunseung Choo
Continuous data collection applications in wireless sensor networks require sensor nodes to continuously sample the surrounding physical phenomenon and then return the data to a processing center. Battery-operated sensors have to avoid heavy use of their wireless radio by compressing the time series sensed data instead of transmitting it in raw form. One of the most commonly used compacting methods is piecewise linear approximation. Previously, Liu et al. proposed a greedy PLAMLiS algorithm to approximate the time series into a number of line segments running in Theta(n2logn) time, however this is not appropriate for processing in the sensors. Therefore, based on our study we propose an alternative algorithm which obtains the same result but needs a shorter running time. Based on theoretical analysis and comprehensive simulations, it is shown that the new proposed algorithm has a competitive computational cost of Theta(nlogn) as well as reducing the number of line segments and so it can decrease the overall radio transmission load in order to save energy of the sensor nodes.
{"title":"Enhance exploring temporal correlation for data collection in WSNs","authors":"N. D. Pham, Trong Duc Le, Hyunseung Choo","doi":"10.1109/RIVF.2008.4586356","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586356","url":null,"abstract":"Continuous data collection applications in wireless sensor networks require sensor nodes to continuously sample the surrounding physical phenomenon and then return the data to a processing center. Battery-operated sensors have to avoid heavy use of their wireless radio by compressing the time series sensed data instead of transmitting it in raw form. One of the most commonly used compacting methods is piecewise linear approximation. Previously, Liu et al. proposed a greedy PLAMLiS algorithm to approximate the time series into a number of line segments running in Theta(n2logn) time, however this is not appropriate for processing in the sensors. Therefore, based on our study we propose an alternative algorithm which obtains the same result but needs a shorter running time. Based on theoretical analysis and comprehensive simulations, it is shown that the new proposed algorithm has a competitive computational cost of Theta(nlogn) as well as reducing the number of line segments and so it can decrease the overall radio transmission load in order to save energy of the sensor nodes.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128753380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586328
T. Khoát
The best upper time bound for solving the bounded integer programming (BIP) up to now is poly(phi) ldr n2n+o(n), where n and phi are the dimension and the input size of the problem respectively. In this paper, we show that BIP is solvable in deterministic time poly(phi) ldr nn+o(n). Moreover we also show that under some reasonable assumptions, BIP is solvable in probabilistic time 2O(n).
{"title":"On the bounded integer programming","authors":"T. Khoát","doi":"10.1109/RIVF.2008.4586328","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586328","url":null,"abstract":"The best upper time bound for solving the bounded integer programming (BIP) up to now is poly(phi) ldr n2n+o(n), where n and phi are the dimension and the input size of the problem respectively. In this paper, we show that BIP is solvable in deterministic time poly(phi) ldr nn+o(n). Moreover we also show that under some reasonable assumptions, BIP is solvable in probabilistic time 2O(n).","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129564413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586348
L. Baud, N. Pham, P. Bellot
We introduce a new overlay network named ROSA1. Overlay networks offer a way to bypass the routing constraints of the underlying network. ROSA used this overlay network property to offer a resilient routing to critical applications. Unlike other overlay networks dealing with the routing resilience issue, we oriented our research towards building a robust overlay network topology instead of a robust routing function. We tried to maintain a path between any pairs of nodes of the network. The routing resilience is obtained by forcing nodes to choose and modify dynamically their neighbors set according to the ROSA protocol. Moreover, ROSA is highly scalable.
{"title":"Robust overlay network with Self-Adaptive topology: Protocol description","authors":"L. Baud, N. Pham, P. Bellot","doi":"10.1109/RIVF.2008.4586348","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586348","url":null,"abstract":"We introduce a new overlay network named ROSA1. Overlay networks offer a way to bypass the routing constraints of the underlying network. ROSA used this overlay network property to offer a resilient routing to critical applications. Unlike other overlay networks dealing with the routing resilience issue, we oriented our research towards building a robust overlay network topology instead of a robust routing function. We tried to maintain a path between any pairs of nodes of the network. The routing resilience is obtained by forcing nodes to choose and modify dynamically their neighbors set according to the ROSA protocol. Moreover, ROSA is highly scalable.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128932809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586366
Nguyen-Khang Pham, A. Morin, P. Gros, Quyet-Thang Le
We are concerned by the use of factorial correspondence analysis (FCA) for image retrieval. FCA is designed for analyzing contingency tables. In textual data analysis (TDA), FCA analyzes a contingency table crossing terms/words and documents. To adapt FCA on images, we first define "visual words" computed from scalable invariant feature transform (SIFT) descriptors in images and use them for image quantization. At this step, we can build a contingency table crossing "visual words" as terms/words and images as documents. The method was tested on the Caltech4 and Stewenius and Nister datasets on which it provides better results (quality of results and execution time) than classical methods as tf * idf and probabilistic latent semantic analysis (PLSA). To scale up and improve the retrieval quality, we propose a new retrieval schema using inverted files based on the relevant indicators of correspondence analysis (representation quality of images on axes and contribution of images to the inertia of the axes). The numerical experiments show that our algorithm performs faster than the exhaustive method without losing precision.
我们关注的是使用阶乘对应分析(FCA)的图像检索。FCA设计用于分析列联表。在文本数据分析(TDA)中,FCA分析一个包含术语/单词和文档的列联表。为了使FCA适用于图像,我们首先定义了从图像中的可缩放不变特征变换(SIFT)描述符计算的“视觉词”,并将其用于图像量化。在这个步骤中,我们可以构建一个将“视觉单词”作为术语/单词和图像作为文档的列联表。该方法在Caltech4和Stewenius and Nister数据集上进行了测试,在结果质量和执行时间上,它比经典方法如tf * idf和概率潜在语义分析(PLSA)提供了更好的结果。为了扩大检索规模并提高检索质量,本文基于对应分析的相关指标(图像在坐标轴上的表示质量和图像对坐标轴惯性的贡献),提出了一种基于倒立文件的检索模式。数值实验表明,该算法比穷举法更快,且精度不低。
{"title":"Factorial Correspondence Analysis for image retrieval","authors":"Nguyen-Khang Pham, A. Morin, P. Gros, Quyet-Thang Le","doi":"10.1109/RIVF.2008.4586366","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586366","url":null,"abstract":"We are concerned by the use of factorial correspondence analysis (FCA) for image retrieval. FCA is designed for analyzing contingency tables. In textual data analysis (TDA), FCA analyzes a contingency table crossing terms/words and documents. To adapt FCA on images, we first define \"visual words\" computed from scalable invariant feature transform (SIFT) descriptors in images and use them for image quantization. At this step, we can build a contingency table crossing \"visual words\" as terms/words and images as documents. The method was tested on the Caltech4 and Stewenius and Nister datasets on which it provides better results (quality of results and execution time) than classical methods as tf * idf and probabilistic latent semantic analysis (PLSA). To scale up and improve the retrieval quality, we propose a new retrieval schema using inverted files based on the relevant indicators of correspondence analysis (representation quality of images on axes and contribution of images to the inertia of the axes). The numerical experiments show that our algorithm performs faster than the exhaustive method without losing precision.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-13DOI: 10.1109/RIVF.2008.4586333
Van Hoa Nguyen, D. Lavenier
Sequence similarity search is a common and repeated task in molecular biology. The rapid growth of genomic databases leads to the need of speeding up the treatment of this task. In this paper, we present a subset seed algorithm for intensive protein sequence comparison. We have accelerated this algorithm by using indexing technique and fine grained parallelism of GPU and SIMD instructions. We have implemented two programs: iBLASTP, iTBLASTN. The GPU (SIMD) implementation of the two programs achieves a speed up ranging from 5.5 to 10 (4 to 5.6) compared to the BLASTP and TBLASTN of the BLAST program family, with comparable sensitivity.
{"title":"Speeding up subset seed algorithm for intensive protein sequence comparison","authors":"Van Hoa Nguyen, D. Lavenier","doi":"10.1109/RIVF.2008.4586333","DOIUrl":"https://doi.org/10.1109/RIVF.2008.4586333","url":null,"abstract":"Sequence similarity search is a common and repeated task in molecular biology. The rapid growth of genomic databases leads to the need of speeding up the treatment of this task. In this paper, we present a subset seed algorithm for intensive protein sequence comparison. We have accelerated this algorithm by using indexing technique and fine grained parallelism of GPU and SIMD instructions. We have implemented two programs: iBLASTP, iTBLASTN. The GPU (SIMD) implementation of the two programs achieves a speed up ranging from 5.5 to 10 (4 to 5.6) compared to the BLASTP and TBLASTN of the BLAST program family, with comparable sensitivity.","PeriodicalId":233667,"journal":{"name":"2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125693645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}