2013 8th International Symposium on Health Informatics and Bioinformatics最新文献

英文中文

Classification of cohesin family using class specific motifs 用类特定基序对黏结蛋白家族进行分类

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661687

Ercument M. Eser, B. Arslan, U. Sezerman

Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.

从蛋白质序列中提取基序一直是生物信息学家面临的一项具有挑战性的任务。类特异性基序通常存在于一类中，但在其他类中所占比例较小，可用于蛋白质序列的高度精确分类。在这项研究中，我们提出了一种新的基于评分的方法，使用减少的氨基酸字母来选择特定类别的n-gram基序。内聚蛋白序列与Dockerin模块相互作用，构建最常见和最丰富的有机聚合物纤维素，用于类特异性基序选择，然后将选择的基序作为特征给予J48和SVM算法。分类结果用各种n-gram大小、减少的氨基酸字母和特征数的参数进行检验。结果表明，采用Gbmr14字母表、每族5个特征、4克图案和J48算法，训练正确率为98.61%，测试正确率为94.54%。该技术可推广应用于其他蛋白质家族。

引用次数: 2

Sampling bias in microarray data analysis: A demonstration in the field of reproductive biology 微阵列数据分析中的采样偏差:在生殖生物学领域的演示

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661684

S. Manafi, A. Uyar, A. Bener

The actual benefit from high-throughput microarray experiments strongly relies on elimination of all possible sources of biases during both the experimental procedure and data analysis process. Within the context of reproductive biology, microarray based transcriptomic analysis of oocyte and surrounding cumulus/granulosa cells poses significant challenges due to limited amount of samples and/or potential contaminations from adjacent cells. In this study, we investigated the effect of sampling bias on consistency of the microarray differential expression analysis in the field of reproduction. Experiments were conducted on five datasets obtained from publicly available microarray repositories. For each dataset, probe level expression values were extracted and background adjustment, inter-array quantile normalization and probe set summarization were performed according to the Robust Multi-Chip Average algorithm. Genes with a false discovery rate-corrected p value of <;0.05 and [Fold Change] > 2 were considered as differentially expressed. Results demonstrate that both number of replicates and including different subsets of available samples in the analysis alter the number of differentially expressed genes. We suggest that assessment of inter-sample variance prior to differential expression analysis is an important step in microarray experiments and proper handling of that variance may require alternative normalization and/or statistical test methods.

高通量微阵列实验的实际效益强烈依赖于在实验过程和数据分析过程中消除所有可能的偏差来源。在生殖生物学的背景下，由于样品数量有限和/或邻近细胞的潜在污染，基于微阵列的卵母细胞和周围积云/颗粒细胞的转录组学分析面临重大挑战。在本研究中，我们研究了采样偏差对微阵列差异表达分析在生殖领域一致性的影响。实验在从公开可用的微阵列存储库获得的五个数据集上进行。对于每个数据集，提取探针水平表达值，并根据鲁棒多芯片平均算法进行背景调整、阵列间分位数归一化和探针集汇总。错误发现率校正p值为2的基因被认为是差异表达。结果表明，重复的数量和在分析中包括不同的可用样本子集都会改变差异表达基因的数量。我们建议，在差异表达分析之前评估样本间方差是微阵列实验的重要步骤，正确处理该方差可能需要替代的归一化和/或统计检验方法。

{"title":"Sampling bias in microarray data analysis: A demonstration in the field of reproductive biology","authors":"S. Manafi, A. Uyar, A. Bener","doi":"10.1109/HIBIT.2013.6661684","DOIUrl":"https://doi.org/10.1109/HIBIT.2013.6661684","url":null,"abstract":"The actual benefit from high-throughput microarray experiments strongly relies on elimination of all possible sources of biases during both the experimental procedure and data analysis process. Within the context of reproductive biology, microarray based transcriptomic analysis of oocyte and surrounding cumulus/granulosa cells poses significant challenges due to limited amount of samples and/or potential contaminations from adjacent cells. In this study, we investigated the effect of sampling bias on consistency of the microarray differential expression analysis in the field of reproduction. Experiments were conducted on five datasets obtained from publicly available microarray repositories. For each dataset, probe level expression values were extracted and background adjustment, inter-array quantile normalization and probe set summarization were performed according to the Robust Multi-Chip Average algorithm. Genes with a false discovery rate-corrected p value of <;0.05 and [Fold Change] > 2 were considered as differentially expressed. Results demonstrate that both number of replicates and including different subsets of available samples in the analysis alter the number of differentially expressed genes. We suggest that assessment of inter-sample variance prior to differential expression analysis is an important step in microarray experiments and proper handling of that variance may require alternative normalization and/or statistical test methods.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128236731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Ranking tandem mass spectra: And the impact of database size and scoring function on peptide spectrum matches 串联质谱排序:以及数据库大小和评分功能对肽谱匹配的影响

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661686

Canan Has, Cemal Ulas Kundakci, Aybuge Altay, J. Allmer

Proteomics is currently driven by mass spectrometry. For the analysis of tandem mass spectra many computational algorithms have been proposed. There are two approaches, one which assigns a peptide sequence to a tandem mass spectrum directly and one which employs a sequence database for looking up possible solutions. The former method needs high quality spectra while the latter can tolerate lower quality spectra. Since both methods are computationally expensive, it is sensible to establish spectral quality using an independent fast algorithm. In this study, we first establish proper settings for database search algorithms for the analysis of spectra in our gold benchmark dataset and then analyze the performance of ScanRanker, an algorithm for quality assessment of tandem MS spectra, on this ground truth data. We found that OMSSA and MSGFDB have limitations in their scoring functions but were able to form a proper consensus prediction using majority vote for our benchmark data. Unfortunately, ScanRanker's results do not correlate well with the consensus and ScanRanker is also too slow to be used in the capacity it is supposed to be used.

蛋白质组学目前是由质谱法驱动的。对于串联质谱的分析，已经提出了许多计算算法。有两种方法，一种是直接将肽序列分配给串联质谱，另一种是使用序列数据库查找可能的解决方案。前一种方法需要高质量的光谱，后一种方法可以容忍低质量的光谱。由于这两种方法的计算成本都很高，因此使用独立的快速算法建立光谱质量是明智的。在本研究中，我们首先为我们的黄金基准数据集中的光谱分析建立了适当的数据库搜索算法设置，然后分析了ScanRanker(串联质谱质量评估算法)在该基线数据上的性能。我们发现OMSSA和MSGFDB在评分功能上有局限性，但能够对我们的基准数据使用多数投票形成适当的共识预测。不幸的是，ScanRanker的结果与共识并没有很好地关联，而且ScanRanker也太慢了，无法在应该使用的容量中使用。

{"title":"Ranking tandem mass spectra: And the impact of database size and scoring function on peptide spectrum matches","authors":"Canan Has, Cemal Ulas Kundakci, Aybuge Altay, J. Allmer","doi":"10.1109/HIBIT.2013.6661686","DOIUrl":"https://doi.org/10.1109/HIBIT.2013.6661686","url":null,"abstract":"Proteomics is currently driven by mass spectrometry. For the analysis of tandem mass spectra many computational algorithms have been proposed. There are two approaches, one which assigns a peptide sequence to a tandem mass spectrum directly and one which employs a sequence database for looking up possible solutions. The former method needs high quality spectra while the latter can tolerate lower quality spectra. Since both methods are computationally expensive, it is sensible to establish spectral quality using an independent fast algorithm. In this study, we first establish proper settings for database search algorithms for the analysis of spectra in our gold benchmark dataset and then analyze the performance of ScanRanker, an algorithm for quality assessment of tandem MS spectra, on this ground truth data. We found that OMSSA and MSGFDB have limitations in their scoring functions but were able to form a proper consensus prediction using majority vote for our benchmark data. Unfortunately, ScanRanker's results do not correlate well with the consensus and ScanRanker is also too slow to be used in the capacity it is supposed to be used.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121191196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Determination of the exact value of the regularization parameter in smoothness priors method with respect to the corresponding cut-off frequencies for designing filters 确定平滑先验法中正则化参数相对于相应截止频率的精确值，用于设计滤波器

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661683

Y. Isler

A filter is an electrical network or software that alters the amplitude and/or phase characteristics of a signal with respect to frequency. Recently, a new detrending method has been presented to remove the slow nonstationary trends from biomedical signals, which is equivalent to high-pass filtering that removes very low frequency components from the given signal. Although many recently published papers, related to the analysis of biomedical signals like the heart rate variability signal, have used the smoothness priors detrending method, there is no given exact relationship between the regularization parameter and the cut-off frequency of the corresponding high pass filter. In this study, we present this relationship by an empirical formula which would allow the researchers to calculate the parameter from the desired frequency response for not only a high pass filter but also other filter types.

滤波器是改变信号相对于频率的幅度和/或相位特性的电气网络或软件。近年来，人们提出了一种新的去趋势方法来去除生物医学信号中缓慢的非平稳趋势，这种方法相当于高通滤波从给定信号中去除极低频分量。虽然最近发表的许多与心率变异性信号等生物医学信号分析相关的论文都使用了平滑先验去趋势方法，但是正则化参数与相应高通滤波器的截止频率之间并没有给出确切的关系。在本研究中，我们通过一个经验公式来呈现这种关系，该公式允许研究人员从期望的频率响应中计算参数，不仅适用于高通滤波器，也适用于其他滤波器类型。

引用次数: 1

Period-doubling route to chaos in shunting inhibitory cellular neural networks 分流抑制细胞神经网络的倍周期混沌路径

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661682

M. Akhmet, M. O. Fen

In this study, we investigate the dynamics of shunting inhibitory cellular neural networks with external inputs in the form of relay functions. The presence of chaos through period-doubling cascade is proved theoretically. An example that confirms the theoretical results is illustrated.

在这项研究中，我们研究了以中继功能形式的外部输入的分流抑制细胞神经网络的动力学。从理论上证明了倍周期级联混沌的存在。最后通过算例验证了理论结果。

引用次数: 0

A genetic algorithm approach to active subnetwork search applied to GWAS data 遗传算法在主动子网搜索中的应用

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661681

Ozan Ozisik, Burcu Bakir-Gungor, B. Diri, O. U. Sezerman

An active subnetwork is a group of interconnected genes that show condition-specific differences. It has been observed that the gene products that have alterations associated with a disease of interest, incline to be part of the subnetworks among the overall interaction network. Hence, the integration of the interaction data with the genotypic data underlying disease states facilitates the separation of the subnetworks perturbed in a given disorder from the rest of the network. In the literature, active subnetwork search is used to discover disease related regulatory pathways, dysregulated genes, functional modules, cancer markers, to classify diseases, and to predict response to treatment. In this study, a genetic algorithm based method is developed for active subnetwork search and applied to WTCCC Rheumatoid Arthritis genome-wide association study dataset. The relevance of the identified subnetworks against the disease is compared in terms of biological pathways. Our results show that the proposed method works well in detecting the significant RA associated subnetworks, and it is also applicable to recognize subnetworks of other complex diseases.

活跃的子网络是一组相互连接的基因，它们表现出特定条件的差异。已经观察到，与感兴趣的疾病相关的基因产物具有改变，倾向于成为整个相互作用网络中的子网络的一部分。因此，将相互作用数据与疾病状态下的基因型数据整合起来，有助于将在给定疾病中受干扰的子网与网络的其余部分分离开来。在文献中，主动子网络搜索用于发现疾病相关的调控通路、失调基因、功能模块、癌症标志物，对疾病进行分类，预测对治疗的反应。本研究提出了一种基于遗传算法的主动子网络搜索方法，并将其应用于WTCCC类风湿性关节炎全基因组关联研究数据集。从生物学途径的角度比较了已确定的子网络与疾病的相关性。我们的研究结果表明，该方法可以很好地检测RA相关的重要子网，并且也适用于识别其他复杂疾病的子网。

{"title":"A genetic algorithm approach to active subnetwork search applied to GWAS data","authors":"Ozan Ozisik, Burcu Bakir-Gungor, B. Diri, O. U. Sezerman","doi":"10.1109/HIBIT.2013.6661681","DOIUrl":"https://doi.org/10.1109/HIBIT.2013.6661681","url":null,"abstract":"An active subnetwork is a group of interconnected genes that show condition-specific differences. It has been observed that the gene products that have alterations associated with a disease of interest, incline to be part of the subnetworks among the overall interaction network. Hence, the integration of the interaction data with the genotypic data underlying disease states facilitates the separation of the subnetworks perturbed in a given disorder from the rest of the network. In the literature, active subnetwork search is used to discover disease related regulatory pathways, dysregulated genes, functional modules, cancer markers, to classify diseases, and to predict response to treatment. In this study, a genetic algorithm based method is developed for active subnetwork search and applied to WTCCC Rheumatoid Arthritis genome-wide association study dataset. The relevance of the identified subnetworks against the disease is compared in terms of biological pathways. Our results show that the proposed method works well in detecting the significant RA associated subnetworks, and it is also applicable to recognize subnetworks of other complex diseases.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125721297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Development of a social support intervention with a network of caregivers to find wandering Alzheimer's patients as soon as possible: A social computing application in healthcare 通过护理人员网络开发社会支持干预，以尽快发现流浪的阿尔茨海默病患者:医疗保健中的社会计算应用

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661678

Y. Yuce, K. H. Gulkesen

Locating and securing an Alzheimer's patient who is outdoors and in wandering state is crucial to patient's safety. Although advances in geotracking and mobile technology have made locating patients instantly possible, reaching them while in wandering state may still take time. However, a social network of caregivers may help shorten the time that it takes to reach and secure a wandering AD patient. This study proposes a social computing application in healthcare, which is designed to form and direct a social support network of caregivers for locating and securing wandering AD patients as soon as possible. The proposed system consists of three major components; a tracking device, a middleware and a mobile application. The tracking device has a Subscriber Identity Module for Global System for Mobile Communications Network (GSM) installed on it, and is responsible for communication between an AD patient and the system (e.g. transmission of location updates in varying periods). The middleware employs a supervision mechanism to detect potentially wandering patients, a tracking mechanism to locate a wandering patient and a coordination mechanism to communicate with and direct caregivers to wandering patient. The mobile application is the mediator of the interaction (e.g. necessary communication steps to get involved in the search of a wandering patient) between a caregiver and the system during a wandering patient search session. The communication backbone of the system involves the Internet and a GSM network. The major system component, i.e. middleware, is being implemented using Java. Family caregivers will be interviewed prior to and after the use of the system. In order to find out the impact of the system in terms of depression, anxiety and burden, Center For Epidemiologic Studies Depression Scale, Patient Health Questionnaire and Zarit Burden Interview will be applied to them during these interviews respectively.

对在户外徘徊的老年痴呆症患者进行定位和保护对患者的安全至关重要。虽然地理追踪和移动技术的进步已经使即时定位病人成为可能，但在病人徘徊的状态下到达他们可能仍然需要时间。然而，照顾者的社会网络可能有助于缩短接触和保护流浪AD患者所需的时间。本研究提出了一种社会计算在医疗保健中的应用，旨在形成和指导护理人员的社会支持网络，以尽快定位和保护流浪AD患者。拟议的系统由三个主要部分组成;一个跟踪设备，一个中间件和一个移动应用程序。跟踪设备上安装有用于全球移动通信网络系统(GSM)的用户身份模块，负责AD患者与系统之间的通信(例如，在不同时期传输位置更新)。中间件采用监督机制检测潜在的流浪患者，采用跟踪机制定位流浪患者，采用协调机制与流浪患者沟通并指导护理人员前往流浪患者。移动应用程序是在漫游患者搜索会话期间护理人员和系统之间交互的中介(例如，参与搜索漫游患者所需的通信步骤)。该系统的通信骨干包括Internet和GSM网络。主要的系统组件，例如中间件，正在使用Java实现。在使用该系统之前和之后，将对家庭照顾者进行访谈。为了找出系统在抑郁、焦虑和负担方面的影响，在这些访谈中，将分别对他们进行流行病学研究中心抑郁量表、患者健康问卷和Zarit负担访谈。

{"title":"Development of a social support intervention with a network of caregivers to find wandering Alzheimer's patients as soon as possible: A social computing application in healthcare","authors":"Y. Yuce, K. H. Gulkesen","doi":"10.1109/HIBIT.2013.6661678","DOIUrl":"https://doi.org/10.1109/HIBIT.2013.6661678","url":null,"abstract":"Locating and securing an Alzheimer's patient who is outdoors and in wandering state is crucial to patient's safety. Although advances in geotracking and mobile technology have made locating patients instantly possible, reaching them while in wandering state may still take time. However, a social network of caregivers may help shorten the time that it takes to reach and secure a wandering AD patient. This study proposes a social computing application in healthcare, which is designed to form and direct a social support network of caregivers for locating and securing wandering AD patients as soon as possible. The proposed system consists of three major components; a tracking device, a middleware and a mobile application. The tracking device has a Subscriber Identity Module for Global System for Mobile Communications Network (GSM) installed on it, and is responsible for communication between an AD patient and the system (e.g. transmission of location updates in varying periods). The middleware employs a supervision mechanism to detect potentially wandering patients, a tracking mechanism to locate a wandering patient and a coordination mechanism to communicate with and direct caregivers to wandering patient. The mobile application is the mediator of the interaction (e.g. necessary communication steps to get involved in the search of a wandering patient) between a caregiver and the system during a wandering patient search session. The communication backbone of the system involves the Internet and a GSM network. The major system component, i.e. middleware, is being implemented using Java. Family caregivers will be interviewed prior to and after the use of the system. In order to find out the impact of the system in terms of depression, anxiety and burden, Center For Epidemiologic Studies Depression Scale, Patient Health Questionnaire and Zarit Burden Interview will be applied to them during these interviews respectively.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131615367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A dynamic Bayesian framwork to learn temporal gene interactions using external knowledge 利用外部知识学习时间基因相互作用的动态贝叶斯框架

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661680

U. Agyuz, S. Isci, C. Ozturk, A. Ademoglu, H. Otu

One of the main problems in systems biology is learning gene interaction networks from experimental data. This turns out to be a challenging task as the experimental data is sparse and noisy, and network learning algorithms are computationally intense. Bayesian Networks (BN) have become a popular choice for learning such networks as BNs avoid overfitting and are robust to noise. In this paper we build up on our established framework, Bayesian Network Prior, where we incorporate existing biological knowledge in learning gene interaction networks. However, biological phenomena are time-dependent and there is need to extend the static structure of learning approaches to a temporal level. Here, we present a Dynamic BN framework, which learns interaction networks between different time points in time-series data. Both intra and inter networks are learnt and compared to standard DBN learning algorithms. Our results based on synthetic and simulated gene expression data suggest that the proposed method outperforms existing approaches in identifying the underlying network structure. The proposed framework is robust to errors in the incorporated knowledge and can combine various experimental data types together with existing knowledge when learning networks.

系统生物学的主要问题之一是从实验数据中学习基因相互作用网络。这是一项具有挑战性的任务，因为实验数据稀疏且有噪声，并且网络学习算法的计算量很大。由于贝叶斯网络避免了过拟合和对噪声的鲁棒性，因此贝叶斯网络已成为学习此类网络的热门选择。在本文中，我们建立在我们已建立的框架，贝叶斯网络先验，其中我们将现有的生物学知识纳入学习基因相互作用网络。然而，生物现象是时间依赖的，有必要将学习方法的静态结构扩展到时间水平。在这里，我们提出了一个动态BN框架，它学习时间序列数据中不同时间点之间的交互网络。学习内部和内部网络，并与标准DBN学习算法进行比较。我们基于合成和模拟基因表达数据的结果表明，所提出的方法在识别潜在网络结构方面优于现有方法。该框架对纳入的知识中的错误具有鲁棒性，并且可以在学习网络时将各种实验数据类型与现有知识结合起来。

{"title":"A dynamic Bayesian framwork to learn temporal gene interactions using external knowledge","authors":"U. Agyuz, S. Isci, C. Ozturk, A. Ademoglu, H. Otu","doi":"10.1109/HIBIT.2013.6661680","DOIUrl":"https://doi.org/10.1109/HIBIT.2013.6661680","url":null,"abstract":"One of the main problems in systems biology is learning gene interaction networks from experimental data. This turns out to be a challenging task as the experimental data is sparse and noisy, and network learning algorithms are computationally intense. Bayesian Networks (BN) have become a popular choice for learning such networks as BNs avoid overfitting and are robust to noise. In this paper we build up on our established framework, Bayesian Network Prior, where we incorporate existing biological knowledge in learning gene interaction networks. However, biological phenomena are time-dependent and there is need to extend the static structure of learning approaches to a temporal level. Here, we present a Dynamic BN framework, which learns interaction networks between different time points in time-series data. Both intra and inter networks are learnt and compared to standard DBN learning algorithms. Our results based on synthetic and simulated gene expression data suggest that the proposed method outperforms existing approaches in identifying the underlying network structure. The proposed framework is robust to errors in the incorporated knowledge and can combine various experimental data types together with existing knowledge when learning networks.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131494425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Use of open linked data in bioinformatics space: A case study 生物信息学空间中开放关联数据的使用:一个案例研究

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-11-14 DOI: 10.1109/HIBIT.2013.6661679

R. Çelebi, Ozgur Gumus, Yeşim AYDIN SON

In the life sciences, semantic web can support many aspects of bio- and health informatics, with exciting applications appearing in areas ranging from plant genetics to drug discovery. Using semantic technologies with open linked data, provides two kinds of advantages: ability to search multiple datasets through a single framework and ability to search relationships and paths of relationships that go across different datasets. The Bio2RDF project creates a network of coherently linked data across the biological databases. As part of the Bio2RDF project, an integrated bioinformatics warehouse on the semantic web is built. In this paper, a use case with a query for multiple distant data sources which are semantically available through Bio2RDF is defined. The validation of the results by traditional search techniques and discussion for future directions is presented.

在生命科学中，语义网可以支持生物和健康信息学的许多方面，从植物遗传学到药物发现等领域都出现了令人兴奋的应用。对开放链接数据使用语义技术提供了两种优势:通过单一框架搜索多个数据集的能力，以及搜索跨不同数据集的关系和关系路径的能力。Bio2RDF项目创建了一个跨生物数据库的连贯链接数据网络。作为Bio2RDF项目的一部分，在语义网上建立了一个集成的生物信息学仓库。在本文中，定义了一个查询多个远程数据源的用例，这些数据源通过Bio2RDF在语义上可用。对传统搜索技术的结果进行了验证，并对未来的发展方向进行了讨论。

引用次数: 3

Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction 微rna基因预测的数据挖掘:类不平衡和特征数对微rna基因预测的影响

2013 8th International Symposium on Health Informatics and Bioinformatics

Pub Date : 2013-09-01 DOI: 10.1109/HIBIT.2013.6661685

Muserref Duygu Saçar, J. Allmer

MicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features.

MicroRNAs (miRNAs)是一种小的非编码rna，参与基因表达的转录后调节。它们的短(18-24)单链成熟序列涉及靶向特定基因。事实证明，实验方法是有限的，即使不是不可能，也很难通过实验建立所有的mirna及其靶标。因此，人们提出了许多预测miRNA基因和miRNA靶点的工具。这些工具中的大多数都是基于机器学习方法，并且在该领域中主要采用两类分类。不幸的是，真正的负数据是不可能获得的，目前只能获得负数据的近似值。此外，我们最近表明，现有的积极数据并非完美无缺。在这里，我们研究了班级不平衡对学习者准确率的影响，发现在最佳和最差准确率和召回值之间存在高达50%的差异。此外，我们研究了特征数量的增加，并发现了一条曲线，在召回率为0.97、精度为0.91时达到最大值，在包含超过100个特征后，性能会迅速下降。

{"title":"Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction","authors":"Muserref Duygu Saçar, J. Allmer","doi":"10.1109/HIBIT.2013.6661685","DOIUrl":"https://doi.org/10.1109/HIBIT.2013.6661685","url":null,"abstract":"MicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122236302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 8th International Symposium on Health Informatics and Bioinformatics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀