International Journal of Data Mining and Bioinformatics最新文献

英文中文

Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm. Smith-Waterman算法的MPI-CUDA混合模型的设计与实现。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069710

Heba Khaled, Hossam El Deen Mostafa Faheem, Rania El Gohary

This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.

将消息传递接口与NVIDIA公司发明的并行计算平台和编程模型CUDA相结合，提出了一种解决多对序列对齐问题的新型混合模型。该模型的目标是配备类似图形处理单元(GPU)卡的同构集群节点。该模型由主节点调度器(MND)和工作GPU节点(WGN)组成。MND在集群工作节点之间分配工作负载，然后聚合结果。WGN使用Smith-Waterman算法执行多个成对序列比对。我们还提出了一种基于逐行计算对齐矩阵的Smith-Waterman算法的改进实现。实验结果表明，通过增加工作GPU节点的数量，可以显著减少运行时间。当我们对运行在四个节点上的SWISS-PROT蛋白质知识库进行测试时，所提出的模型实现了每秒约12千兆细胞更新的性能。

{"title":"Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.","authors":"Heba Khaled, Hossam El Deen Mostafa Faheem, Rania El Gohary","doi":"10.1504/ijdmb.2015.069710","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069710","url":null,"abstract":"This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 3","pages":"313-27"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069710","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Two stages weighted sampling strategy for detecting the relation between gene expression and disease. 基因表达与疾病关系的两阶段加权抽样检测策略。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069417

Chih-Chung Yang, Wen-Shin Lin, Chien-Pang Lee, Yungho Leu

For microarray data analysis, most of them focus on selecting relevant genes and calculating the classification accuracy by the selected relevant genes. This paper wants to detect the relation between the gene expression levels and the classes of a cancer (or a disease) to assist researchers for initial diagnosis. The proposed method is called a Two Stages Weighted Sampling strategy (TSWS strategy). According to the results, the performance of TSWS strategy is better than other existing methods in terms of the classification accuracy and the number of selected relevant genes. Furthermore, TSWS strategy also can use to understand and detect the relation between the gene expression levels and the classes of a cancer (or a disease).

对于微阵列数据分析，大多侧重于选择相关基因，并通过选择的相关基因计算分类精度。本文旨在检测基因表达水平与癌症(或疾病)类别之间的关系，以帮助研究人员进行初步诊断。该方法被称为两阶段加权抽样策略(TSWS策略)。结果表明，TSWS策略在分类精度和选择的相关基因数量方面优于其他现有方法。此外，TSWS策略还可用于了解和检测基因表达水平与癌症(或疾病)类别之间的关系。

引用次数: 2

Named entity recognition and classification in biomedical text using classifier ensemble. 基于分类器集成的生物医学文本命名实体识别与分类。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067954

Sriparna Saha, Asif Ekbal, Utpal Kumar Sikdar

Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.

命名实体识别与分类(NERC)是生物医学领域信息提取中的一项重要任务。生物医学命名实体包括提到的蛋白质、基因、DNA、RNA等，它们通常具有复杂的结构，难以识别。在本文中，我们提出了一种基于单目标优化的分类器集成技术，该技术利用遗传算法(GA)的搜索能力来搜索生物医学文本中的NERC。这里，GA用于量化每个分类器中每个类的投票数量。我们使用不同的分类方法，如条件随机场和支持向量机，根据特征集和/或特征模板的不同表示来构建许多模型。用JNLPBA 2004和GENETAG两个基准数据集对该技术进行了评估。实验得到的F-测量值分别为75.97%和95.90%。与现有系统的比较表明，我们提出的系统达到了最先进的性能。

{"title":"Named entity recognition and classification in biomedical text using classifier ensemble.","authors":"Sriparna Saha, Asif Ekbal, Utpal Kumar Sikdar","doi":"10.1504/ijdmb.2015.067954","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067954","url":null,"abstract":"Named Entity Recognition and Classification (NERC) is an important task in information extraction for biomedicine domain. Biomedical Named Entities include mentions of proteins, genes, DNA, RNA, etc. which, in general, have complex structures and are difficult to recognise. In this paper, we propose a Single Objective Optimisation based classifier ensemble technique using the search capability of Genetic Algorithm (GA) for NERC in biomedical texts. Here, GA is used to quantify the amount of voting for each class in each classifier. We use diverse classification methods like Conditional Random Field and Support Vector Machine to build a number of models depending upon the various representations of the set of features and/or feature templates. The proposed technique is evaluated with two benchmark datasets, namely JNLPBA 2004 and GENETAG. Experiments yield the overall F- measure values of 75.97% and 95.90%, respectively. Comparisons with the existing systems show that our proposed system achieves state-of-the-art performance.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 4","pages":"365-91"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067954","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34145685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

LMDS-based approach for efficient top-k local ligand-binding site search. 基于lmds的top-k局部配体结合位点高效搜索方法。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.070066

Sungchul Kim, Lee Sael, Hwanjo Yu

In this work, we propose a LMDS-based binding-site search for improving the search speed of the Patch-Surfer method. Patch-Surfer is efficient in recognition of protein-ligand binding partners, further speedup is necessary to address multiple-user access. Futher speedup is realised by exploiting Landmark Multi-Dimensional Scaling (LMDS). It computes embedding coordinates for data points based on their distances from landmark points. When selecting the landmark points, we adopt two approaches--random and greedy selection. Our method approximately retrieves top-k results and the accuracy increases as we exploit more landmark points. Although two landmark selection approaches show comparable results, the greedy selection shows the best performance when the number of landmark points is large. Using our method, the searching time is reduced up to 99% and it retrieves almost 80% of exact top-k results. Additionally, LMDS-based binding-site search+ improves the retrieval accuracy from 80% to 95% while sacrificing the speedup ratio from 99% to 90% compared to Patch-Surfer.

在这项工作中，我们提出了一种基于lmds的结合位点搜索，以提高Patch-Surfer方法的搜索速度。Patch-Surfer在识别蛋白质配体结合伙伴方面是有效的，进一步的加速是必要的，以解决多用户访问。通过利用Landmark Multi-Dimensional Scaling (LMDS)实现进一步的加速。它根据数据点到地标点的距离计算数据点的嵌入坐标。在选择地标点时，我们采用随机选择和贪婪选择两种方法。我们的方法近似地检索top-k结果，并且随着我们利用更多的地标点，精度增加。虽然两种地标选择方法的结果具有可比性，但当地标点数量较大时，贪婪选择方法表现出最好的性能。使用我们的方法，搜索时间减少了99%，并且检索了几乎80%的精确top-k结果。此外，与Patch-Surfer相比，基于lmds的结合位点搜索将检索准确率从80%提高到95%，同时牺牲了从99%到90%的加速比。

{"title":"LMDS-based approach for efficient top-k local ligand-binding site search.","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1504/ijdmb.2015.070066","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.070066","url":null,"abstract":"In this work, we propose a LMDS-based binding-site search for improving the search speed of the Patch-Surfer method. Patch-Surfer is efficient in recognition of protein-ligand binding partners, further speedup is necessary to address multiple-user access. Futher speedup is realised by exploiting Landmark Multi-Dimensional Scaling (LMDS). It computes embedding coordinates for data points based on their distances from landmark points. When selecting the landmark points, we adopt two approaches--random and greedy selection. Our method approximately retrieves top-k results and the accuracy increases as we exploit more landmark points. Although two landmark selection approaches show comparable results, the greedy selection shows the best performance when the number of landmark points is large. Using our method, the searching time is reduced up to 99% and it retrieves almost 80% of exact top-k results. Additionally, LMDS-based binding-site search+ improves the retrieval accuracy from 80% to 95% while sacrificing the speedup ratio from 99% to 90% compared to Patch-Surfer.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 4","pages":"417-33"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.070066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34192166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gene microarray data analysis using parallel point-symmetry-based clustering. 基于并行点对称聚类的基因微阵列数据分析。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067320

Anasua Sarkar, Ujjwal Maulik

Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.

鉴定共表达基因是微阵列基因表达分析的中心目标。基于点对称的聚类是一种重要的无监督学习技术，用于识别对称凸形或非凸形聚类。为了实现大型微阵列数据的快速聚类，我们提出了一种基于点对称的K-Means算法的分布式时间高效可扩展方法。使用基于对称的算法分析基因表达数据的自然基础是将具有相似对称表达模式的基因分组在一起。这种新的并行实现在不牺牲大型微阵列数据集聚类解决方案质量的情况下，在时间上满足线性加速。将基于并行点对称的K-Means算法与另一种新的基于并行点对称的K-Means算法以及现有的8个人工和基准微阵列数据集上的并行K-Means算法进行了比较，证明了其在时序和有效性方面的优越性。还进行了统计分析，以确定这种基于消息传递接口的点对称k -均值实现的意义。我们还分析了聚类解决方案的生物学相关性。

{"title":"Gene microarray data analysis using parallel point-symmetry-based clustering.","authors":"Anasua Sarkar, Ujjwal Maulik","doi":"10.1504/ijdmb.2015.067320","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067320","url":null,"abstract":"Identification of co-expressed genes is the central goal in microarray gene expression analysis. Point-symmetry-based clustering is an important unsupervised learning technique for recognising symmetrical convex- or non-convex-shaped clusters. To enable fast clustering of large microarray data, we propose a distributed time-efficient scalable approach for point-symmetry-based K-Means algorithm. A natural basis for analysing gene expression data using symmetry-based algorithm is to group together genes with similar symmetrical expression patterns. This new parallel implementation also satisfies linear speedup in timing without sacrificing the quality of clustering solution on large microarray data sets. The parallel point-symmetry-based K-Means algorithm is compared with another new parallel symmetry-based K-Means and existing parallel K-Means over eight artificial and benchmark microarray data sets, to demonstrate its superiority, in both timing and validity. The statistical analysis is also performed to establish the significance of this message-passing-interface based point-symmetry K-Means implementation. We also analysed the biological relevance of clustering solutions.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 3","pages":"277-300"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Predicting malignancy from mammography findings and image-guided core biopsies. 从乳房x光检查结果和图像引导的核心活检预测恶性肿瘤。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067319

Pedro Ferreira, Nuno A Fonseca, Inês Dutra, Ryan Woods, Elizabeth Burnside

The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. We applied various algorithms with parameter variation to learn from the data. The tasks were to predict mass density and to predict malignancy. The best classifier that predicts mass density is based on a support vector machine and has accuracy of 81.3%. The expert correctly annotated 70% of the mass densities. The best classifier that predicts malignancy is also based on a support vector machine and has accuracy of 85.6%, with a positive predictive value of 85%. One important contribution of this work is that our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

这项工作的主要目标是产生机器学习模型，从一组减少的注释乳房x光检查结果中预测乳房x光检查的结果。在这项研究中，我们使用了一个由348个连续乳房肿块组成的数据集，这些肿块在2005年10月至2007年12月期间对328名女性受试者进行了图像引导的核心活检。我们应用了各种参数变化的算法从数据中学习。任务是预测肿瘤密度和恶性肿瘤。预测质量密度的最佳分类器是基于支持向量机的，准确率为81.3%。专家正确标注了70%的质量密度。预测恶性肿瘤的最佳分类器也是基于支持向量机，准确率为85.6%，阳性预测值为85%。这项工作的一个重要贡献是，我们的模型可以在没有质量密度属性的情况下预测恶性肿瘤，因为我们可以使用我们的质量密度预测器来填补这个属性。

{"title":"Predicting malignancy from mammography findings and image-guided core biopsies.","authors":"Pedro Ferreira, Nuno A Fonseca, Inês Dutra, Ryan Woods, Elizabeth Burnside","doi":"10.1504/ijdmb.2015.067319","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067319","url":null,"abstract":"The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. We applied various algorithms with parameter variation to learn from the data. The tasks were to predict mass density and to predict malignancy. The best classifier that predicts mass density is based on a support vector machine and has accuracy of 81.3%. The expert correctly annotated 70% of the mass densities. The best classifier that predicts malignancy is also based on a support vector machine and has accuracy of 85.6%, with a positive predictive value of 85%. One important contribution of this work is that our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 3","pages":"257-76"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Fuzzy watershed segmentation algorithm: an enhanced algorithm for 2D gel electrophoresis image segmentation. 模糊分水岭分割算法:一种改进的二维凝胶电泳图像分割算法。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069659

Shaheera Rashwan, Amany Sarhan, Muhamed Talaat Faheem, Bayumy A Youssef

Detection and quantification of protein spots is an important issue in the analysis of two-dimensional electrophoresis images. However, there is a main challenge in the segmentation of 2DGE images which is to separate overlapping protein spots correctly and to find the weak protein spots. In this paper, we describe a new robust technique to segment and model the different spots present in the gels. The watershed segmentation algorithm is modified to handle the problem of over-segmentation by initially partitioning the image to mosaic regions using the composition of fuzzy relations. The experimental results showed the effectiveness of the proposed algorithm to overcome the over segmentation problem associated with the available algorithm. We also use a wavelet denoising function to enhance the quality of the segmented image. The results of using a denoising function before the proposed fuzzy watershed segmentation algorithm is promising as they are better than those without denoising.

蛋白质斑点的检测和定量是二维电泳图像分析中的一个重要问题。然而，在2DGE图像分割中存在一个主要的挑战，即正确分离重叠的蛋白质点和寻找薄弱的蛋白质点。在本文中，我们描述了一种新的鲁棒技术来分割和建模存在于凝胶中的不同点。对分水岭分割算法进行了改进，利用模糊关系组合将图像初始分割为多个拼接区域，从而解决了过度分割问题。实验结果表明，该算法有效地克服了现有算法存在的过度分割问题。我们还使用小波去噪函数来提高分割图像的质量。在模糊分水岭分割算法之前使用去噪函数的结果比不去噪的结果更有希望。

{"title":"Fuzzy watershed segmentation algorithm: an enhanced algorithm for 2D gel electrophoresis image segmentation.","authors":"Shaheera Rashwan, Amany Sarhan, Muhamed Talaat Faheem, Bayumy A Youssef","doi":"10.1504/ijdmb.2015.069659","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069659","url":null,"abstract":"Detection and quantification of protein spots is an important issue in the analysis of two-dimensional electrophoresis images. However, there is a main challenge in the segmentation of 2DGE images which is to separate overlapping protein spots correctly and to find the weak protein spots. In this paper, we describe a new robust technique to segment and model the different spots present in the gels. The watershed segmentation algorithm is modified to handle the problem of over-segmentation by initially partitioning the image to mosaic regions using the composition of fuzzy relations. The experimental results showed the effectiveness of the proposed algorithm to overcome the over segmentation problem associated with the available algorithm. We also use a wavelet denoising function to enhance the quality of the segmented image. The results of using a denoising function before the proposed fuzzy watershed segmentation algorithm is promising as they are better than those without denoising.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 3","pages":"275-93"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069659","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34125294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Mining literatures to discover novel multiple biological associations in a disease context. 挖掘文献，发现疾病背景下新的多重生物学关联。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069419

Alberto Faro, Daniela Giordano, Francesco Maiorana

The text mining methods proposed to discover associations between pairs of biological entities by mining a scientific literature often extract associations already existing in the literature, whereas their extensions supervise too much the discovery process with heuristics and ontologies that limit the research space. On the other hand, the methods that search novel associations applying the text mining methods to two literatures do not avoid the risk of discovering syllogisms based on faulty premises. For this reason, the paper proposes a method that helps the users to discover associations among biological entities by mining the literature using an unsupervised clustering approach. The discovered multiple associations are derived from binary associations to limit the computational load without compromising the methodology accuracy. A case study demonstrates how the tool derived from the methodology works in practice. A comparison between this tool and other tools available in the literature points out the methodology effectiveness.

通过挖掘科学文献来发现生物实体对之间的关联的文本挖掘方法通常提取文献中已经存在的关联，而它们的扩展过多地监督了启发式和本体的发现过程，限制了研究空间。另一方面，将文本挖掘方法应用于两篇文献的搜索新关联的方法并不能避免发现基于错误前提的三段论的风险。为此，本文提出了一种方法，通过使用无监督聚类方法挖掘文献，帮助用户发现生物实体之间的关联。发现的多个关联由二元关联派生，在不影响方法准确性的前提下限制计算负荷。一个案例研究演示了从方法论衍生出来的工具在实践中是如何工作的。将该工具与文献中可用的其他工具进行比较，指出了该方法的有效性。

{"title":"Mining literatures to discover novel multiple biological associations in a disease context.","authors":"Alberto Faro, Daniela Giordano, Francesco Maiorana","doi":"10.1504/ijdmb.2015.069419","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069419","url":null,"abstract":"The text mining methods proposed to discover associations between pairs of biological entities by mining a scientific literature often extract associations already existing in the literature, whereas their extensions supervise too much the discovery process with heuristics and ontologies that limit the research space. On the other hand, the methods that search novel associations applying the text mining methods to two literatures do not avoid the risk of discovering syllogisms based on faulty premises. For this reason, the paper proposes a method that helps the users to discover associations among biological entities by mining the literature using an unsupervised clustering approach. The discovered multiple associations are derived from binary associations to limit the computational load without compromising the methodology accuracy. A case study demonstrates how the tool derived from the methodology works in practice. A comparison between this tool and other tools available in the literature points out the methodology effectiveness.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"12 2","pages":"224-56"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069419","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Patient-specific early classification of multivariate observations. 多变量观察的患者特异性早期分类。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067955

Mohamed F Ghalwash, Dušan Ramljak, Zoran Obradović

Early classification of time series has been receiving a lot of attention recently. In this paper we present a model, which we call the Early Classification Model (ECM), that allows for early, accurate and patient-specific classification of multivariate observations. ECM is comprised of an integration of the widely used Hidden Markov Model (HMM) and Support Vector Machine (SVM) models. It attained very promising results on the datasets we tested it on: in one set of experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification. In the set of experiments tested on a sepsis therapy dataset, ECM was able to surpass the standard threshold-based method and the state-of-the-art method for early classification of multivariate time series.

时间序列的早期分类问题近年来受到了广泛的关注。在本文中，我们提出了一个模型，我们称之为早期分类模型(ECM)，它允许对多变量观察进行早期，准确和患者特定的分类。ECM是隐马尔可夫模型(HMM)和支持向量机(SVM)模型的集成。在我们对其进行测试的数据集上，它获得了非常有希望的结果:在一组基于多发性硬化症患者对药物治疗反应的公开数据集的实验中，ECM平均只使用了40%的时间序列，并且能够优于一些基线模型，这些模型需要完整的时间序列进行分类。在脓毒症治疗数据集上测试的一组实验中，ECM能够超越标准的基于阈值的方法和最先进的多变量时间序列早期分类方法。

引用次数: 11

ACC-FMD: ant colony clustering for functional module detection in protein-protein interaction networks. ACC-FMD:蛋白质相互作用网络中功能模块检测的蚁群聚类。

IF 0.3 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics

Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067323

Junzhong Ji, Hongxin Liu, Aidong Zhang, Zhijun Liu, Chunnian Liu

Mining functional modules in Protein-Protein Interaction (PPI) networks is a very important research for revealing the structure-functionality relationships in biological processes. More recently, some swarm intelligence algorithms have been successfully applied in the field. This paper presents a new nature-inspired approach, ACC-FMD, which is based on ant colony clustering to detect functional modules. First, some proteins with the higher clustering coefficients are, respectively, selected as ant seed nodes. And then, the picking and dropping operations based on ant probabilistic models are developed and employed to assign proteins into the corresponding clusters represented by seeds. Finally, the best clustering result in each generation is used to perform the information transmission by updating the similarly function. Experimental results on some benchmarked datasets show that ACC-FMD outperforms the CFinder and MCODE algorithms and has comparative performance with the MINE, COACH, DPClus and Core algorithms in terms of the general evaluation metrics.

蛋白质-蛋白质相互作用(Protein-Protein Interaction, PPI)网络中功能模块的挖掘是揭示生物过程中结构-功能关系的重要研究内容。近年来，一些群体智能算法已成功应用于该领域。本文提出了一种基于蚁群聚类的功能模块检测新方法——ACC-FMD。首先，分别选取聚类系数较高的蛋白作为蚁种节点;然后，开发了基于蚁群概率模型的取落操作，并将蛋白质分配到以种子为代表的相应簇中。最后，利用每一代的最佳聚类结果，通过更新相似函数进行信息传递。在一些基准数据集上的实验结果表明，ACC-FMD算法优于CFinder和MCODE算法，并且在一般评价指标上与MINE、COACH、DPClus和Core算法性能相当。

{"title":"ACC-FMD: ant colony clustering for functional module detection in protein-protein interaction networks.","authors":"Junzhong Ji, Hongxin Liu, Aidong Zhang, Zhijun Liu, Chunnian Liu","doi":"10.1504/ijdmb.2015.067323","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067323","url":null,"abstract":"Mining functional modules in Protein-Protein Interaction (PPI) networks is a very important research for revealing the structure-functionality relationships in biological processes. More recently, some swarm intelligence algorithms have been successfully applied in the field. This paper presents a new nature-inspired approach, ACC-FMD, which is based on ant colony clustering to detect functional modules. First, some proteins with the higher clustering coefficients are, respectively, selected as ant seed nodes. And then, the picking and dropping operations based on ant probabilistic models are developed and employed to assign proteins into the corresponding clusters represented by seeds. Finally, the best clustering result in each generation is used to perform the information transmission by updating the similarly function. Experimental results on some benchmarked datasets show that ACC-FMD outperforms the CFinder and MCODE algorithms and has comparative performance with the MINE, COACH, DPClus and Core algorithms in terms of the general evaluation metrics.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 3","pages":"331-63"},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34039167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Data Mining and Bioinformatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀