首页 > 最新文献

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.最新文献

英文 中文
Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering 基于迭代采样和模型聚类的蛋白质相互作用实验噪声检测
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188977
Hiroshi Mamitsuka
One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.
当前分子生物学中最重要的问题之一是建立蛋白质相互作用的精确网络。近年来发展起来的高通量实验技术积累了大量的蛋白质-蛋白质相互作用数据,但众所周知,数据的可靠性还没有达到令人满意的水平。在本文中,我们尝试使用随机模型的学习作为其子程序,通过迭代采样方法计算检测可能包含在蛋白质-蛋白质相互作用数据中的实验误差或噪声。该方法重复两个步骤:选择可视为无噪声的样例,并用所选择的样例交替训练分量算法。选择噪声候选者作为由先前获得的随机模型计算的平均似然最小的例子。我们使用合成数据集和真实数据集对其他两种方法进行了经验评估。我们通过使用包含有意添加的噪声的中型和大型合成数据集来检查噪声和数据大小的影响。中型合成数据集的结果表明,当噪声比较高时,该方法与其他两种方法的性能差异的显著性水平更为明显。进一步的实验表明,这一实验发现也适用于大规模的数据集。利用真实蛋白质相互作用数据集的实验进一步证实了该方法的性能优势。
{"title":"Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering","authors":"Hiroshi Mamitsuka","doi":"10.1109/BIBE.2003.1188977","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188977","url":null,"abstract":"One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128198396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A data mining method to predict transcriptional regulatory sites based on differentially expressed genes in human genome 基于人类基因组差异表达基因预测转录调控位点的数据挖掘方法
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188966
Hsien-Da Huang, Huei-Lin Chang, T. Tsou, Baw-Jhiune Liu, Jorng-Tzong Horng
Very large-scale gene expression analysis, i.e., UniGene and dbEST, are provided to find those genes with significantly differential expression in specific tissues. The differentially expressed genes in a specific tissue are potentially regulated concurrently by a combination of transcription factors. This study attempts to mine putative binding sites on how combinations of the known regulatory sites homologs and over-represented repetitive elements are distributed in the promoter regions of considered groups of differentially expressed genes. We propose a data mining approach to statistically discover the significantly tissue-specific combinations of known site homologs and over-represented repetitive sequences, which are distributed in the promoter regions of differential gene groups. The association rules mined would facilitate to predict putative regulatory elements and identify genes potentially co-regulated by the putative regulatory elements.
提供了非常大规模的基因表达分析,即UniGene和dbEST,以寻找在特定组织中具有显著差异表达的基因。特定组织中差异表达的基因可能同时受到转录因子组合的调节。本研究试图挖掘假设的结合位点,即已知的同源调控位点和过度代表的重复元件的组合如何分布在考虑的差异表达基因组的启动子区域。我们提出了一种数据挖掘方法来统计发现分布在差异基因组的启动子区域的已知位点同源物和过度代表的重复序列的显著组织特异性组合。所挖掘的关联规则将有助于预测假定的调控元件并识别可能由假定的调控元件共同调控的基因。
{"title":"A data mining method to predict transcriptional regulatory sites based on differentially expressed genes in human genome","authors":"Hsien-Da Huang, Huei-Lin Chang, T. Tsou, Baw-Jhiune Liu, Jorng-Tzong Horng","doi":"10.1109/BIBE.2003.1188966","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188966","url":null,"abstract":"Very large-scale gene expression analysis, i.e., UniGene and dbEST, are provided to find those genes with significantly differential expression in specific tissues. The differentially expressed genes in a specific tissue are potentially regulated concurrently by a combination of transcription factors. This study attempts to mine putative binding sites on how combinations of the known regulatory sites homologs and over-represented repetitive elements are distributed in the promoter regions of considered groups of differentially expressed genes. We propose a data mining approach to statistically discover the significantly tissue-specific combinations of known site homologs and over-represented repetitive sequences, which are distributed in the promoter regions of differential gene groups. The association rules mined would facilitate to predict putative regulatory elements and identify genes potentially co-regulated by the putative regulatory elements.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Influence of the thermal treatment applied to PAN gel on its length change and generated force 热处理对聚丙烯腈凝胶长度变化及生成力的影响
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188964
H. Tamagawa, F. Nogata, Toyotaka Watanabe, A. Abe, S. Popovic
PAN gel is known for its strong matrix as well as for its fast length change by the acid-base environmental solution exchange. Besides, PAN gel is a quite soft material like a real cell. Therefore it's been regarded as a most promising material as an artificial muscle. However, its matrix strength declines extremely unfortunately in basic solution. Its matrix should be improved so as not decline, otherwise it cannot be an artificial muscle for practical use. We applied a high temperature thermal treatment and a subsequent hydrolysis to PAN gel prepared through the nearly conventional processing method, and we investigated the time dependence of its length change ratio and generated force through the acid-base solution exchange, and its durability. Although its length change and force generation performances were impaired to some extent, we found an improvement of its matrix robustness and durability.
聚丙烯腈凝胶以其强基质和在酸碱环境溶液交换中长度变化快而闻名。此外,聚丙烯腈凝胶是一种非常柔软的材料,就像真正的细胞一样。因此,它被认为是一种最有前途的人造肌肉材料。但在碱性溶液中,其基体强度急剧下降。对其基质进行改良,不使其退化,否则不能成为实用的人造肌肉。我们通过高温热处理和随后的水解制备了PAN凝胶,我们研究了其长度变化率和通过酸碱溶液交换产生的力的时间依赖性,以及它的耐久性。虽然它的长度变化和力生成性能受到一定程度的影响,但我们发现它的矩阵鲁棒性和耐久性得到了改善。
{"title":"Influence of the thermal treatment applied to PAN gel on its length change and generated force","authors":"H. Tamagawa, F. Nogata, Toyotaka Watanabe, A. Abe, S. Popovic","doi":"10.1109/BIBE.2003.1188964","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188964","url":null,"abstract":"PAN gel is known for its strong matrix as well as for its fast length change by the acid-base environmental solution exchange. Besides, PAN gel is a quite soft material like a real cell. Therefore it's been regarded as a most promising material as an artificial muscle. However, its matrix strength declines extremely unfortunately in basic solution. Its matrix should be improved so as not decline, otherwise it cannot be an artificial muscle for practical use. We applied a high temperature thermal treatment and a subsequent hydrolysis to PAN gel prepared through the nearly conventional processing method, and we investigated the time dependence of its length change ratio and generated force through the acid-base solution exchange, and its durability. Although its length change and force generation performances were impaired to some extent, we found an improvement of its matrix robustness and durability.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125108936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing the Escherichia coli gene expression data by a multilayer adjusted tree organizing map 利用多层调整树组织图分析大肠杆菌基因表达数据
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188965
Ning Wei, L. Gruenwald, T. Conway
Using the DNA microarray technology, biologists have thousands of array data available. Discovering the function relations between genes and their involvements in biological processes depends on the ability to efficiently process and quantitatively analyze large amounts of array data. Clustering algorithms are among the popular tools that can be used to help biologists achieve their goals. Although some existing research projects employed clustering algorithms on biological data, none of them has examined the Escherichia coli (E. coli) gene expression data. This paper proposes a clustering algorithm called Multilayer Adjusted Tree Organizing Map (MA TOM) to analyze the E. coli gene expression data. In a semi-supervised manner, MATOM constructs a multilayer map, and at the same time, removes noise data in the previously trained maps in order to improve the training process. This paper then presents the clustering results produced by MATOM and other existing clustering algorithms using the E. coli gene expression data, and a new evaluation method to assess them. The results show that MATOM performs the best in terms of percentage of genes that are clustered correctly.
使用DNA微阵列技术,生物学家有成千上万的阵列数据可用。发现基因之间的功能关系及其在生物过程中的参与依赖于有效处理和定量分析大量阵列数据的能力。聚类算法是可以用来帮助生物学家实现目标的流行工具之一。虽然现有的一些研究项目在生物数据上使用了聚类算法,但没有一个研究项目检测过大肠杆菌(E. coli)的基因表达数据。本文提出了一种多层调整树组织图(Multilayer Adjusted Tree Organizing Map, MA TOM)聚类算法来分析大肠杆菌基因表达数据。MATOM以半监督的方式构建多层地图,同时去除之前训练地图中的噪声数据,以改善训练过程。然后,本文介绍了利用大肠杆菌基因表达数据,利用MATOM和其他现有聚类算法产生的聚类结果,以及一种新的评估方法。结果表明,就正确聚类的基因百分比而言,MATOM表现最好。
{"title":"Analyzing the Escherichia coli gene expression data by a multilayer adjusted tree organizing map","authors":"Ning Wei, L. Gruenwald, T. Conway","doi":"10.1109/BIBE.2003.1188965","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188965","url":null,"abstract":"Using the DNA microarray technology, biologists have thousands of array data available. Discovering the function relations between genes and their involvements in biological processes depends on the ability to efficiently process and quantitatively analyze large amounts of array data. Clustering algorithms are among the popular tools that can be used to help biologists achieve their goals. Although some existing research projects employed clustering algorithms on biological data, none of them has examined the Escherichia coli (E. coli) gene expression data. This paper proposes a clustering algorithm called Multilayer Adjusted Tree Organizing Map (MA TOM) to analyze the E. coli gene expression data. In a semi-supervised manner, MATOM constructs a multilayer map, and at the same time, removes noise data in the previously trained maps in order to improve the training process. This paper then presents the clustering results produced by MATOM and other existing clustering algorithms using the E. coli gene expression data, and a new evaluation method to assess them. The results show that MATOM performs the best in terms of percentage of genes that are clustered correctly.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122716356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of contact maps using support vector machines 使用支持向量机预测接触图
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188926
Ying Zhao, G. Karypis
Contact map prediction is of great interest for its application in fold recognition and protein 3D structure determination. In this paper we present a contact-map prediction algorithm that employs Support Vector Machines as the machine learning tool and incorporates various features such as sequence profiles and their conservation, correlated mutation analysis based on various amino acid physicochemical properties, and secondary structure. In addition, we evaluated the effectiveness of the different features on contact map prediction for different fold classes. On average, our predictor achieved a prediction accuracy of 0.2238 with an improvement over a random predictor of a factor 11.7, which is better than reported studies. Our study showed that predicted secondary structure features play an important roles for the proteins containing beta structures. Models based on secondary structure features and CMA features produce different sets of predictions. Our study also suggests that models learned separately for different protein fold families may achieve better performance than a unified model.
接触图预测在折叠识别和蛋白质三维结构确定中具有重要的应用价值。在本文中,我们提出了一种采用支持向量机作为机器学习工具的接触图预测算法,该算法结合了序列特征及其保守性、基于各种氨基酸理化性质的相关突变分析和二级结构等多种特征。此外,我们还评估了不同特征对不同褶皱类型接触图预测的有效性。平均而言,我们的预测器实现了0.2238的预测精度,比随机预测器提高了11.7个因子,这比报道的研究要好。我们的研究表明,预测的二级结构特征对含有β结构的蛋白质起着重要的作用。基于二级结构特征和CMA特征的模型产生不同的预测集。我们的研究还表明,对不同蛋白质折叠家族分别学习的模型可能比统一的模型获得更好的性能。
{"title":"Prediction of contact maps using support vector machines","authors":"Ying Zhao, G. Karypis","doi":"10.1109/BIBE.2003.1188926","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188926","url":null,"abstract":"Contact map prediction is of great interest for its application in fold recognition and protein 3D structure determination. In this paper we present a contact-map prediction algorithm that employs Support Vector Machines as the machine learning tool and incorporates various features such as sequence profiles and their conservation, correlated mutation analysis based on various amino acid physicochemical properties, and secondary structure. In addition, we evaluated the effectiveness of the different features on contact map prediction for different fold classes. On average, our predictor achieved a prediction accuracy of 0.2238 with an improvement over a random predictor of a factor 11.7, which is better than reported studies. Our study showed that predicted secondary structure features play an important roles for the proteins containing beta structures. Models based on secondary structure features and CMA features produce different sets of predictions. Our study also suggests that models learned separately for different protein fold families may achieve better performance than a unified model.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127652614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Evolving bubbles for prostate surface detection from TRUS images 从TRUS图像中检测前列腺表面的演化气泡
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188936
Fan Shao, K. Ling, W. Ng
Prostate boundary detection from ultrasound images plays a key role in prostate disease diagnoses and treatments. Due to the poor quality of ultrasound images, however, this still remains as a difficult task. Currently, boundary detection are performed manually, which is arduous and heavily user dependent. This paper presents a new approach derived from level set method to semiautomatically detect the prostate surface from 3D transrectal ultrasound images. In this method, a few initial bubbles are simply specified by the user from five particular slices based on the prostate shape. When bubbles evolve, they expand, shrink merge and split, and finally produce the desired prostate surface. To remedy the "boundary leaking" problem caused by gaps or weak boundaries, both region information and statistical intensity distribution are incorporated into the model. We applied the proposed method to eight 3D TRUS images and the results have shown its effectiveness.
超声图像前列腺边界检测在前列腺疾病的诊断和治疗中起着至关重要的作用。然而,由于超声图像质量差,这仍然是一项艰巨的任务。目前,边界检测都是手工进行的,这是一项艰巨且高度依赖用户的工作。本文提出了一种基于水平集方法的三维经直肠超声图像前列腺表面半自动检测方法。在这种方法中,用户根据前列腺形状从五个特定的切片中简单地指定几个初始气泡。当气泡形成时,它们膨胀、收缩、合并、分裂,最终形成理想的前列腺表面。为了解决边界间隙或弱边界造成的“边界泄漏”问题,模型中同时加入了区域信息和统计强度分布。将该方法应用于8幅三维TRUS图像,结果表明了该方法的有效性。
{"title":"Evolving bubbles for prostate surface detection from TRUS images","authors":"Fan Shao, K. Ling, W. Ng","doi":"10.1109/BIBE.2003.1188936","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188936","url":null,"abstract":"Prostate boundary detection from ultrasound images plays a key role in prostate disease diagnoses and treatments. Due to the poor quality of ultrasound images, however, this still remains as a difficult task. Currently, boundary detection are performed manually, which is arduous and heavily user dependent. This paper presents a new approach derived from level set method to semiautomatically detect the prostate surface from 3D transrectal ultrasound images. In this method, a few initial bubbles are simply specified by the user from five particular slices based on the prostate shape. When bubbles evolve, they expand, shrink merge and split, and finally produce the desired prostate surface. To remedy the \"boundary leaking\" problem caused by gaps or weak boundaries, both region information and statistical intensity distribution are incorporated into the model. We applied the proposed method to eight 3D TRUS images and the results have shown its effectiveness.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115026126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Time series analysis of gene expression and location data 基因表达和位置数据的时间序列分析
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188967
Chen-Hsiang Yeang, T. Jaakkola
We develop a method for integrating time series expression profiles and factor-gene binding data to quantify dynamic aspects of gene regulation. We estimate latencies for transcription activation by explaining time correlations between gene expression profiles through available factor-gene binding information. The resulting aligned expression profiles are subsequently clustered and again combined with binding information to determine groups or subgroups of co-regulated genes. The predictions derived from this approach are consistent with existing results. Our analysis also provides several hypotheses not implicated in previous studies.
我们开发了一种整合时间序列表达谱和因子-基因结合数据的方法,以量化基因调控的动态方面。我们通过可用的因子-基因结合信息来解释基因表达谱之间的时间相关性,从而估计转录激活的潜伏期。结果一致的表达谱随后被聚类,并再次与结合信息结合,以确定共调节基因的组或亚组。从这种方法中得出的预测与现有的结果是一致的。我们的分析还提供了以前研究中没有涉及的几个假设。
{"title":"Time series analysis of gene expression and location data","authors":"Chen-Hsiang Yeang, T. Jaakkola","doi":"10.1109/BIBE.2003.1188967","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188967","url":null,"abstract":"We develop a method for integrating time series expression profiles and factor-gene binding data to quantify dynamic aspects of gene regulation. We estimate latencies for transcription activation by explaining time correlations between gene expression profiles through available factor-gene binding information. The resulting aligned expression profiles are subsequently clustered and again combined with binding information to determine groups or subgroups of co-regulated genes. The predictions derived from this approach are consistent with existing results. Our analysis also provides several hypotheses not implicated in previous studies.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115893429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering 利用k-均值聚类技术进行基因聚类的微阵列实验中最小样本量的确定
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188979
Fang-Xiang Wu, W. Zhang, A. Kusalik
Gene expression profiles obtained from time-series microarray experiments can reveal important information about biological processes. However, conducting such experiments is costly and time consuming. The cost and time required are linearly proportional to sample size. Therefore, it is worthwhile to provide a way to determine the minimal number of samples or trials required in a microarray experiment. One of the uses of microarray hybridization experiments is to group together genes with similar patterns of the expression using clustering techniques. In this paper, the k-means clustering technique is used. The basic idea of our approach is an incremental process in which testing, analysis and evaluation are integrated and iterated. The process is terminated when the evaluation of the results of two consecutive experiments shows they are sufficiently close. Two measures of "closeness" are proposed and two real microarray datasets are used to validate our approach. The results show that the sample size required to cluster genes in these two datasets can be reduced; i.e. the same results can be achieved with less cost. The approach can be used with other clustering techniques as well.
从时间序列微阵列实验中获得的基因表达谱可以揭示生物过程的重要信息。然而,进行这样的实验既昂贵又耗时。所需的成本和时间与样本量成线性比例。因此,提供一种方法来确定微阵列实验中所需的最小样品或试验数量是值得的。微阵列杂交实验的用途之一是使用聚类技术将具有相似表达模式的基因分组在一起。本文采用k-means聚类技术。我们方法的基本思想是一个增量过程,在这个过程中,测试、分析和评估是集成和迭代的。当对两个连续实验结果的评价表明它们足够接近时,该过程终止。提出了两种“接近度”的度量,并使用了两个真实的微阵列数据集来验证我们的方法。结果表明,在这两个数据集中进行基因聚类所需的样本量可以减少;也就是说,可以用更少的成本获得同样的结果。这种方法也可以与其他聚类技术一起使用。
{"title":"Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering","authors":"Fang-Xiang Wu, W. Zhang, A. Kusalik","doi":"10.1109/BIBE.2003.1188979","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188979","url":null,"abstract":"Gene expression profiles obtained from time-series microarray experiments can reveal important information about biological processes. However, conducting such experiments is costly and time consuming. The cost and time required are linearly proportional to sample size. Therefore, it is worthwhile to provide a way to determine the minimal number of samples or trials required in a microarray experiment. One of the uses of microarray hybridization experiments is to group together genes with similar patterns of the expression using clustering techniques. In this paper, the k-means clustering technique is used. The basic idea of our approach is an incremental process in which testing, analysis and evaluation are integrated and iterated. The process is terminated when the evaluation of the results of two consecutive experiments shows they are sufficiently close. Two measures of \"closeness\" are proposed and two real microarray datasets are used to validate our approach. The results show that the sample size required to cluster genes in these two datasets can be reduced; i.e. the same results can be achieved with less cost. The approach can be used with other clustering techniques as well.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123035823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Vessel extraction in medical images by 3D wave propagation and traceback 基于三维波传播与回溯的医学图像血管提取
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188944
C. Kirbas, Francis K. H. Quek
This paper presents an approach for the extraction of vasculature from a volume of Magnetic Resonance Angiography (MRA) images by using a 3D wave propagation and traceback mechanism. We discuss both the theory and the implementation of the approach. Using a dual-sigmoidal filter, we label each voxel in the MRA volume with the likelihood that it is within a vessel. Representing the reciprocal of this likelihood image as an array of refractive indices, we propagate a digital wave through the volume from the base of the vascular tree. This wave 'washes' over the vasculature and extracts the vascular tree, ignoring local noise perturbations. While the approach is inherently SIMD we present an efficient sequential algorithm for the wave propagation, and discuss the traceback algorithm. We demonstrate the effectiveness of our integer image neighborhood-based algorithm and its robustness to image noise.
本文提出了一种利用三维波传播和回溯机制从大量磁共振血管造影(MRA)图像中提取血管的方法。我们讨论了该方法的理论和实现。使用双s型滤波器,我们标记MRA体积中的每个体素,使其在血管内的可能性。将此似然图像的倒数表示为折射率阵列,我们从血管树的底部传播数字波。这种波“冲刷”了脉管系统,提取了脉管树,忽略了局部噪声扰动。虽然该方法本质上是SIMD,但我们提出了一种有效的波传播顺序算法,并讨论了回溯算法。我们证明了基于整数图像邻域的算法的有效性及其对图像噪声的鲁棒性。
{"title":"Vessel extraction in medical images by 3D wave propagation and traceback","authors":"C. Kirbas, Francis K. H. Quek","doi":"10.1109/BIBE.2003.1188944","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188944","url":null,"abstract":"This paper presents an approach for the extraction of vasculature from a volume of Magnetic Resonance Angiography (MRA) images by using a 3D wave propagation and traceback mechanism. We discuss both the theory and the implementation of the approach. Using a dual-sigmoidal filter, we label each voxel in the MRA volume with the likelihood that it is within a vessel. Representing the reciprocal of this likelihood image as an array of refractive indices, we propagate a digital wave through the volume from the base of the vascular tree. This wave 'washes' over the vasculature and extracts the vascular tree, ignoring local noise perturbations. While the approach is inherently SIMD we present an efficient sequential algorithm for the wave propagation, and discuss the traceback algorithm. We demonstrate the effectiveness of our integer image neighborhood-based algorithm and its robustness to image noise.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129697849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
An assessment of a metric space database index to support sequence homology 一个度量空间数据库索引支持序列同源性的评估
Pub Date : 2003-03-10 DOI: 10.1109/BIBE.2003.1188976
Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker
Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data sets on disk. We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers (1994), we organize the database as records of fixed length substrings. Empirical results are promising. However, metric-space indexes are subject to "the curse of dimensionality" and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-directional bulk load produces a more effective index than the existing M-tree initialization algorithms.
层次度量空间聚类方法通常用于将蛋白质组组织成分类。因此,通常预期可以利用分层聚类作为可扩展数据库索引结构的基础,以管理序列数据的超指数增长。M-tree就是这样一种数据结构,专门用于管理磁盘上的大型数据集。我们探索了m树在肽序列数据存储和检索中的应用。利用Myers(1994)首先提出的技术,我们将数据库组织为固定长度子字符串的记录。实证结果是有希望的。然而,度量空间指标受到“维度诅咒”的影响,指标的最终性能对指标的初始构建质量很敏感。我们引入了新的分层大负载算法,该算法在自顶向下和自底向上聚类之间交替进行初始化索引。使用酵母蛋白质组,双向批量加载产生比现有的m树初始化算法更有效的索引。
{"title":"An assessment of a metric space database index to support sequence homology","authors":"Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker","doi":"10.1109/BIBE.2003.1188976","DOIUrl":"https://doi.org/10.1109/BIBE.2003.1188976","url":null,"abstract":"Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data sets on disk. We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers (1994), we organize the database as records of fixed length substrings. Empirical results are promising. However, metric-space indexes are subject to \"the curse of dimensionality\" and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-directional bulk load produces a more effective index than the existing M-tree initialization algorithms.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128517346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1