2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文中文

Measuring phenotype semantic similarity using Human Phenotype Ontology 利用人类表型本体测量表型语义相似度

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822617

Jiajie Peng, Hansheng Xue, Y. Shao, Xuequn Shang, Yadong Wang, Jin Chen

It is critical yet remains to be challenging to make right disease diagnosis based on complex clinical characteristic and heterogeneous genetic background. Recently, Human Phenotype Ontology (HPO)-based phenotype similarity has been widely used to aid disease diagnosis. However, the existing measurements are revised based on the Gene Ontology-based term similarity models, which are not optimized for human phenotype ontologies. We propose a new similarity measure called PhenoSim. Our model includes a noise reduction component to model the noisy patient phenotype data, and a path-constrained Information Content-based method for measuring phenotype semantics similarity. Evaluation tests showed that PhenoSim could improve the performance of HPO-based phenotype similarity measurement.

基于复杂的临床特征和异质性的遗传背景，做出正确的疾病诊断至关重要，但仍具有挑战性。近年来，基于人类表型本体(Human Phenotype Ontology, HPO)的表型相似性已被广泛应用于疾病诊断。然而，现有的测量是基于基于基因本体论的术语相似性模型进行修订的，这些模型没有针对人类表型本体论进行优化。我们提出了一种新的相似性度量方法，称为PhenoSim。我们的模型包括一个降噪组件来模拟嘈杂的患者表型数据，以及一个基于路径约束的信息内容的方法来测量表型语义相似性。评价试验表明，PhenoSim可以提高基于hpo的表型相似性测量的性能。

引用次数: 5

Ontological features of Electronic Health Records reveal distinct association patterns in liver cancer 电子健康记录的本体论特征揭示了肝癌的不同关联模式

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822667

L. Chan, S. Wong, W. H. Chiu

Electronic Health Record (EHR) system is not only aimed to provide a digital and structural form of patient records but also support the clinical decision, patient care and patient advice. The EHR database is still an under-explored big data resource that has hosted a large number of cases with complete recovery, good prognosis, reliable diagnostic tests and effective treatments. A set of 112 abdominal computed tomography imaging examination reports, consisting of 59 cases of hepatocellular carcinoma (HCC) or liver metastases (so called HCC group for simplicity) and 53 cases with no abnormality detected (NAD group), was collected from four hospitals in Hong Kong. We extracted terms related to liver cancer from the reports and mapped them to ontological features using Systematized Nomenclature of Medicine (SNOMED) Clinical Terms (CT). Each feature value was further weighted using a systematic PubMed search method. Association levels between every two features in HCC and NAD groups were quantified using Pearson's correlation coefficient. The distribution of association levels in HCC group was compared with that in NAD group. HCC group reveals a distinct association pattern that signifies liver cancer and provides clinical decision support for suspected cases.

电子健康档案(EHR)系统不仅旨在提供数字化和结构化形式的患者记录，而且还支持临床决策、患者护理和患者建议。电子病历数据库仍然是一个未被充分开发的大数据资源，它拥有大量恢复完全、预后良好、诊断检测可靠、治疗有效的病例。收集了香港四家医院的112份腹部计算机断层成像检查报告，其中肝细胞癌(HCC)或肝转移(简称HCC组)59例，未发现异常(NAD组)53例。我们从报告中提取与肝癌相关的术语，并使用系统化医学命名法(SNOMED)临床术语(CT)将其映射到本体论特征。使用系统的PubMed搜索方法对每个特征值进行进一步加权。使用Pearson相关系数量化HCC组和NAD组中每两个特征之间的关联水平。比较HCC组与NAD组的关联水平分布。HCC组显示出明显的关联模式，表明肝癌，并为疑似病例提供临床决策支持。

{"title":"Ontological features of Electronic Health Records reveal distinct association patterns in liver cancer","authors":"L. Chan, S. Wong, W. H. Chiu","doi":"10.1109/BIBM.2016.7822667","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822667","url":null,"abstract":"Electronic Health Record (EHR) system is not only aimed to provide a digital and structural form of patient records but also support the clinical decision, patient care and patient advice. The EHR database is still an under-explored big data resource that has hosted a large number of cases with complete recovery, good prognosis, reliable diagnostic tests and effective treatments. A set of 112 abdominal computed tomography imaging examination reports, consisting of 59 cases of hepatocellular carcinoma (HCC) or liver metastases (so called HCC group for simplicity) and 53 cases with no abnormality detected (NAD group), was collected from four hospitals in Hong Kong. We extracted terms related to liver cancer from the reports and mapped them to ontological features using Systematized Nomenclature of Medicine (SNOMED) Clinical Terms (CT). Each feature value was further weighted using a systematic PubMed search method. Association levels between every two features in HCC and NAD groups were quantified using Pearson's correlation coefficient. The distribution of association levels in HCC group was compared with that in NAD group. HCC group reveals a distinct association pattern that signifies liver cancer and provides clinical decision support for suspected cases.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116247100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

aWGRS: Automates paired-end whole genome re-sequencing data analysis framework aWGRS:自动对端全基因组重测序数据分析框架

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822646

Xiujuan Sun, Fa Zhang, Xiaohua Wan, Jinzhi Zhang

In order to enable people to avoid too many cumbersome and complex operations of the command line and repeated parameter adjustments, automates pair-end whole genome re-sequence (aWGRS) data processing whereby pre-installed dependencies are presented in this paper, which are used to map reads to a reference and realign variations. This method presents aWGRS which is a method that takes as input paired-end reads and a reference genome and returns re-sequencing information. The concept behind the development of this tool is that re-sequencing requires several steps: alignment to the reference, single nucleotide polymorphisms (SNPs) calling, Insertion / Deletion (InDels) calling, structure variant (SVs) calling, and annotation. By introducing and adjusting a new concept called the recall rate, the coverage rate and accuracy rate can be met at the same time. Within the range of recall rate, a variation is evaluated by two criteria: the quality value and the number of reads that support it, and one read with higher quality value and larger supported number will be picked out finally. Genome-wide genetic variations between precocious trifoliate orange and its wild type are identified in [1], and empirical results show that there is a big reduction in the amount of variation and great improvement of accuracy between the results of aWGRS and [1] which offered by the Beijing Genomics Institute (BGI). Overall, the adjustable parameters adopted in aWGRS can affect the results of the experiment and the default filtering strategy using the mutation recall rate also can attain good results automatically.

为了使人们避免过多的繁琐和复杂的命令行操作和重复的参数调整，本文提出了自动化的对端全基因组重序列(aWGRS)数据处理，其中预装了依赖关系，用于将reads映射到参考并重新排列变异。aWGRS是一种以对端reads和参考基因组作为输入并返回重测序信息的方法。该工具开发背后的概念是，重测序需要几个步骤:与参考比对、单核苷酸多态性(snp)调用、插入/删除(InDels)调用、结构变体(SVs)调用和注释。通过引入和调整召回率的新概念，可以同时满足覆盖率和准确率。在召回率范围内，通过质量值和支持该变体的读取数两个标准对其进行评价，最终选出质量值较高且支持数较大的读取。早早熟三叶柑与野生型的全基因组遗传变异已在[1]中得到鉴定，实证结果表明，aWGRS结果与北京华大基因研究所[1]提供的结果相比，变异量有较大减少，准确性有较大提高。总的来说，aWGRS采用的可调参数会影响实验结果，采用突变召回率的默认滤波策略也能自动获得较好的效果。

{"title":"aWGRS: Automates paired-end whole genome re-sequencing data analysis framework","authors":"Xiujuan Sun, Fa Zhang, Xiaohua Wan, Jinzhi Zhang","doi":"10.1109/BIBM.2016.7822646","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822646","url":null,"abstract":"In order to enable people to avoid too many cumbersome and complex operations of the command line and repeated parameter adjustments, automates pair-end whole genome re-sequence (aWGRS) data processing whereby pre-installed dependencies are presented in this paper, which are used to map reads to a reference and realign variations. This method presents aWGRS which is a method that takes as input paired-end reads and a reference genome and returns re-sequencing information. The concept behind the development of this tool is that re-sequencing requires several steps: alignment to the reference, single nucleotide polymorphisms (SNPs) calling, Insertion / Deletion (InDels) calling, structure variant (SVs) calling, and annotation. By introducing and adjusting a new concept called the recall rate, the coverage rate and accuracy rate can be met at the same time. Within the range of recall rate, a variation is evaluated by two criteria: the quality value and the number of reads that support it, and one read with higher quality value and larger supported number will be picked out finally. Genome-wide genetic variations between precocious trifoliate orange and its wild type are identified in [1], and empirical results show that there is a big reduction in the amount of variation and great improvement of accuracy between the results of aWGRS and [1] which offered by the Beijing Genomics Institute (BGI). Overall, the adjustable parameters adopted in aWGRS can affect the results of the experiment and the default filtering strategy using the mutation recall rate also can attain good results automatically.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116354110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised single-cell analysis in triple-negative breast cancer: A case study 无监督的单细胞分析在三阴性乳腺癌:一个案例研究

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822581

A. Athreya, Alan J. Gaglio, Z. Kalbarczyk, R. Iyer, J. Cairns, Krishna R. Kalari, R. Weinshilboum, Liewei Wang

This paper demonstrates an unsupervised learning approach to identify genes with significant differential expression across single-cell subpopulations induced by therapeutic treatment. Identifying this set of genes makes it possible to use well-established bioinformatics approaches such as pathway analysis to establish their biological relevance. Then, a biologist can use his/her prior knowledge to investigate in the laboratory, a few particular candidates among the subset of genes overlapping with relevant pathways. Due to the large size of the human genome and limitations in cost and skilled resources, biologists benefit from analytical methods combined with pathway analysis to design laboratory experiments focusing on only a few significant genes. As an example, we show how model-based unsupervised methods can identify a small set of genes (1% of the genome) that have significant differential expression in single-cells and are also highly correlated to pathways (p-value < 1E − 7) with anticancer effects driven by the antidiabetic drug metformin. Further analysis of genes on these relevant pathways reveal three candidate genes previously implicated in several anticancer mechanisms in other cancers, not driven by metformin. Identification of these genes can help biologists and clinicians design laboratory experiments to establish the molecular mechanisms of metformin in triple-negative breast cancer. In a domain where there is no prior knowledge of small biologically significant data, we demonstrate that careful data-driven methods can infer such significant small data to explain biological mechanisms.

本文展示了一种无监督学习方法来识别治疗性治疗诱导的单细胞亚群中显著差异表达的基因。识别这组基因使得使用成熟的生物信息学方法(如通路分析)来确定它们的生物学相关性成为可能。然后，生物学家可以利用他/她的先验知识在实验室中进行调查，在与相关途径重叠的基因子集中找到一些特定的候选基因。由于人类基因组的庞大规模以及成本和技术资源的限制，生物学家受益于分析方法与途径分析相结合，以设计仅关注少数重要基因的实验室实验。作为一个例子，我们展示了基于模型的无监督方法如何识别一小组基因(基因组的1%)，这些基因在单细胞中具有显著的差异表达，并且与抗糖尿病药物二甲双胍驱动的抗癌作用通路高度相关(p值< 1E−7)。对这些相关通路上基因的进一步分析揭示了三个候选基因先前与其他癌症的几种抗癌机制有关，而不是由二甲双胍驱动的。这些基因的鉴定可以帮助生物学家和临床医生设计实验室实验，以建立二甲双胍在三阴性乳腺癌中的分子机制。在一个没有重要的小生物数据先验知识的领域，我们证明了谨慎的数据驱动方法可以推断出如此重要的小数据来解释生物机制。

{"title":"Unsupervised single-cell analysis in triple-negative breast cancer: A case study","authors":"A. Athreya, Alan J. Gaglio, Z. Kalbarczyk, R. Iyer, J. Cairns, Krishna R. Kalari, R. Weinshilboum, Liewei Wang","doi":"10.1109/BIBM.2016.7822581","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822581","url":null,"abstract":"This paper demonstrates an unsupervised learning approach to identify genes with significant differential expression across single-cell subpopulations induced by therapeutic treatment. Identifying this set of genes makes it possible to use well-established bioinformatics approaches such as pathway analysis to establish their biological relevance. Then, a biologist can use his/her prior knowledge to investigate in the laboratory, a few particular candidates among the subset of genes overlapping with relevant pathways. Due to the large size of the human genome and limitations in cost and skilled resources, biologists benefit from analytical methods combined with pathway analysis to design laboratory experiments focusing on only a few significant genes. As an example, we show how model-based unsupervised methods can identify a small set of genes (1% of the genome) that have significant differential expression in single-cells and are also highly correlated to pathways (p-value < 1E − 7) with anticancer effects driven by the antidiabetic drug metformin. Further analysis of genes on these relevant pathways reveal three candidate genes previously implicated in several anticancer mechanisms in other cancers, not driven by metformin. Identification of these genes can help biologists and clinicians design laboratory experiments to establish the molecular mechanisms of metformin in triple-negative breast cancer. In a domain where there is no prior knowledge of small biologically significant data, we demonstrate that careful data-driven methods can infer such significant small data to explain biological mechanisms.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114843091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An improved ensemble learning method with SMOTE for protein interaction hot spots prediction 基于SMOTE的改进集成学习方法在蛋白质相互作用热点预测中的应用

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822756

Qianqian Huang, Xiaolong Zhang

In the protein-protein interactions, only a small subset of hot spot residues contributes significantly to the binding free energy. Therefore, there is an imbalance between the number of hot spots and non-hot spots. The prediction of hot spot residues is very important in the protein-protein interaction. This paper presents an improved ensemble learning method-Adaboost with SMOTE method to deal with the imbalanced data and predict protein hot spots in the latest database SKEMPI. Firstly, the amino acid information such as hydrophobicity of the amino acid and protein structural features is exacted. Then mRMR algorithm was used to select the features. Finally, the protein database is further handled by SMOTE to deal with the imbalance data, the protein hot spots are predicted by the ensemble learning method-Adaboost. Experimental results show that the proposed method has the ability to improve the predict accuracy.

在蛋白质-蛋白质相互作用中，只有一小部分热点残基对结合自由能有显著贡献。因此，热点和非热点的数量是不平衡的。热点残基的预测在蛋白质相互作用中具有十分重要的意义。本文提出了一种改进的集成学习方法adaboost与SMOTE相结合的方法来处理最新数据库SKEMPI中的不平衡数据并预测蛋白质热点。首先，提取氨基酸的疏水性、蛋白质的结构特征等氨基酸信息;然后利用mRMR算法对特征进行选择。最后，利用SMOTE对蛋白质数据库进行进一步处理，对不平衡数据进行处理，利用集成学习方法adaboost对蛋白质热点进行预测。实验结果表明，该方法具有提高预测精度的能力。

引用次数: 11

DeepSplice: Deep classification of novel splice junctions revealed by RNA-seq DeepSplice:通过RNA-seq揭示的新型剪接连接的深度分类

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822541

Yi Zhang, Xinan Liu, J. MacLeod, Jinze Liu

Alternative splicing (AS) is a regulated process that enables the production of multiple mRNA transcripts from a single multi-exon gene. The availability of large-scale RNA-seq datasets has made it possible to predict splice junctions, as well as splice sites through spliced alignment to the reference genome. This greatly enhances the capability to decipher gene structures and explore the diversity of splicing variants. However, existing ab initio aligners are vulnerable to false positive spliced alignments as a result of sequence errors and random sequence matches. These spurious alignments can lead to a significant set of false positive splice junction predictions, confusing downstream analyses of splice variant detection and abundance estimation. In this work, we illustrate that splice junction sequence characteristics can be ascertained from experimental data with deep learning techniques. We employ deep convolutional neural networks for a novel splice junction classification tool named DeepSplice that (i) outperforms state-of-the-art methods for predicting splice sites, (ii) shows high computational efficiency and (iii) can be applied to self-defined training data by users.

选择性剪接(AS)是一种受调控的过程，可以使单个多外显子基因产生多个mRNA转录物。大规模RNA-seq数据集的可用性使得通过与参考基因组的剪接比对来预测剪接连接以及剪接位点成为可能。这大大提高了破译基因结构和探索剪接变异多样性的能力。然而，现有的从头算比对器由于序列错误和随机序列匹配而容易出现拼接比对的假阳性。这些虚假的比对可能导致一组显著的假阳性剪接连接预测，混淆剪接变异检测和丰度估计的下游分析。在这项工作中，我们说明了剪接序列特征可以通过深度学习技术从实验数据中确定。我们将深度卷积神经网络用于一种名为DeepSplice的新型剪接结分类工具，该工具(i)优于最先进的剪接位点预测方法，(ii)显示出高计算效率，(iii)可以应用于用户自定义的训练数据。

{"title":"DeepSplice: Deep classification of novel splice junctions revealed by RNA-seq","authors":"Yi Zhang, Xinan Liu, J. MacLeod, Jinze Liu","doi":"10.1109/BIBM.2016.7822541","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822541","url":null,"abstract":"Alternative splicing (AS) is a regulated process that enables the production of multiple mRNA transcripts from a single multi-exon gene. The availability of large-scale RNA-seq datasets has made it possible to predict splice junctions, as well as splice sites through spliced alignment to the reference genome. This greatly enhances the capability to decipher gene structures and explore the diversity of splicing variants. However, existing ab initio aligners are vulnerable to false positive spliced alignments as a result of sequence errors and random sequence matches. These spurious alignments can lead to a significant set of false positive splice junction predictions, confusing downstream analyses of splice variant detection and abundance estimation. In this work, we illustrate that splice junction sequence characteristics can be ascertained from experimental data with deep learning techniques. We employ deep convolutional neural networks for a novel splice junction classification tool named DeepSplice that (i) outperforms state-of-the-art methods for predicting splice sites, (ii) shows high computational efficiency and (iii) can be applied to self-defined training data by users.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127716584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Risk feature assessment of readmission for diabetes 糖尿病患者再入院的风险特征评估

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822578

Qian Zhu, Anirudh Akkati, Pornpoh Hongwattanakul

About 382 million people have Diabetes in 2013, and the International Diabetes Federation estimated that there are 4.9 million people died from Diabetes in 2014. Diabetes continues to be a chronic disease plagued by frequent hospital readmissions. In order to better understand the risk features impacting readmissions for future prevention and management, in this study, we programmatically analyzed a large clinical dataset containing more than 100,000 clinical records for diabetes patients from 130 US hospitals. Specifically, we developed three different machine learning algorithms, Logistic Regression, Random Forest and manipulated Random Forest to identify and prioritize the most significant risk features. By comparing the results generated by these three methods, the manipulated Random Forest illustrates greater capacity of generating a more complete and concrete list of readmission related risk features. Such method is generalizable and can be applied in other disease oriented studies.

2013年约有3.82亿人患有糖尿病，国际糖尿病联合会估计，2014年有490万人死于糖尿病。糖尿病一直是一种慢性疾病，经常再次住院。为了更好地了解影响再入院的风险特征，以便未来预防和管理，在本研究中，我们通过程序分析了一个大型临床数据集，其中包含来自130家美国医院的10万多例糖尿病患者的临床记录。具体来说，我们开发了三种不同的机器学习算法，逻辑回归，随机森林和操纵随机森林来识别和优先考虑最重要的风险特征。通过比较这三种方法产生的结果，操纵随机森林显示出更大的能力产生更完整和具体的再入院相关风险特征列表。该方法具有通用性，可应用于其他疾病导向的研究。

引用次数: 1

LCM-DS: A novel approach of predicting drug-drug interactions for new drugs via Dempster-Shafer theory of evidence LCM-DS:一种基于Dempster-Shafer证据理论预测新药药物相互作用的新方法

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822571

Jianyu Shi, Ke Gao, Xuequn Shang, S. Yiu

There is an urgent need to discover or predict DDIs, which would cause serious adverse drug reactions. However, preclinical detection of DDIs bear high cost. Similarity-based computational approaches can be the assistance of experimental approaches. Utilizing pre-market drug similarities, they are able to predict DDIs on a large scale. However, they neglect the topological structure among DDIs and non-DDIs and have a burden of slow training and much memory. Or, they bear the bias that the pairs between a newly-given drug and the drugs having many DDIs tend to obtain high ranks. More importantly, they lack an effective combination of multiple predictions. To address these issues, we develop a local classification-based model (LCM), which has the advantages of faster training, less memory requirement as well as no that bias. We further design a novel supervised algorithm of fusion based on Dempster-Shafer (DS) theory of evidence for combine multiple predictions. Finally, the experiments demonstrate that our LCM-DS is significantly superior to three state-of-the-art approaches and outperforms both individual LCMs and classical fusion algorithms.

迫切需要发现或预测可能引起严重药物不良反应的ddi。然而，ddi的临床前检测成本较高。基于相似度的计算方法可以作为实验方法的辅助。利用上市前药物的相似性，他们能够大规模地预测ddi。然而，它们忽略了ddi和非ddi之间的拓扑结构，并且具有训练慢和内存大的负担。或者，它们承受着一种偏见，即新给药和ddi较多的药物之间的配对往往会获得较高的排名。更重要的是，它们缺乏多种预测的有效组合。为了解决这些问题，我们开发了一种基于局部分类的模型(LCM)，该模型具有训练速度快，内存需求少以及没有偏见的优点。我们进一步设计了一种新的基于Dempster-Shafer (DS)证据理论的监督融合算法，用于组合多个预测。最后，实验表明，我们的LCM-DS明显优于三种最先进的方法，并且优于单个lcm和经典融合算法。

{"title":"LCM-DS: A novel approach of predicting drug-drug interactions for new drugs via Dempster-Shafer theory of evidence","authors":"Jianyu Shi, Ke Gao, Xuequn Shang, S. Yiu","doi":"10.1109/BIBM.2016.7822571","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822571","url":null,"abstract":"There is an urgent need to discover or predict DDIs, which would cause serious adverse drug reactions. However, preclinical detection of DDIs bear high cost. Similarity-based computational approaches can be the assistance of experimental approaches. Utilizing pre-market drug similarities, they are able to predict DDIs on a large scale. However, they neglect the topological structure among DDIs and non-DDIs and have a burden of slow training and much memory. Or, they bear the bias that the pairs between a newly-given drug and the drugs having many DDIs tend to obtain high ranks. More importantly, they lack an effective combination of multiple predictions. To address these issues, we develop a local classification-based model (LCM), which has the advantages of faster training, less memory requirement as well as no that bias. We further design a novel supervised algorithm of fusion based on Dempster-Shafer (DS) theory of evidence for combine multiple predictions. Finally, the experiments demonstrate that our LCM-DS is significantly superior to three state-of-the-art approaches and outperforms both individual LCMs and classical fusion algorithms.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126272708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

SeqMaker: A next generation sequencing simulator with variations, sequencing errors and amplification bias integrated SeqMaker:下一代测序模拟器与变异，测序误差和扩增偏差集成

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822634

Shifu Chen, Yue Han, Lanting Guo, Jing-Shan Hu, Jia Gu

Tuning bioinformatics pipelines and training software parameters require sequencing data with known ground truth, which are actually difficult to get from real sequencing data. Particularly, for those applications of detecting low frequency variations (like ctDNA sequencing), it is hard to tell whether a called variation is a true positive, or a false positive caused by errors from sequencing or other processes. In these cases, simulated data with configured variations can be used to troubleshoot and validate bioinformatics programs. Although lots of next generation sequencing simulators have already been developed, most of them lack of capability to simulate lots of practical features, such like target capturing sequencing, copy number variations, gene fusions, amplification bias and sequencing errors. In this paper, we will present SeqMaker, a modern NGS simulator with capability to simulate different kinds of variations, with amplification bias and sequencing errors integrated. Target capturing sequencing is simply supported by using a capturing panel description file, other characteristics like sequencing error rate, average duplication level, DNA template length distribution and quality distribution can be easily configured with a simple JSON format profile file. With the integration sequencing errors and amplification bias, SeqMaker is able to simulate more real next generation sequencing data. The configurable variants and capturing regions make SeqMaker very useful to generate data for training bioinformatics pipelines for applications like somatic mutation calling.

生物信息学管道的调整和软件参数的训练需要已知地真值的测序数据，而这很难从真实的测序数据中得到。特别是，对于那些检测低频变异的应用(如ctDNA测序)，很难判断所谓的变异是真阳性，还是由测序或其他过程错误引起的假阳性。在这些情况下，具有配置变化的模拟数据可用于排除故障并验证生物信息学程序。虽然新一代测序模拟器已经被开发出来，但它们大多缺乏模拟靶标捕获测序、拷贝数变异、基因融合、扩增偏倚和测序误差等许多实际功能的能力。在本文中，我们将介绍SeqMaker，一个现代的NGS模拟器，能够模拟不同种类的变异，并集成了扩增偏差和测序误差。通过使用捕获面板描述文件简单地支持目标捕获测序，其他特征，如测序错误率，平均重复水平，DNA模板长度分布和质量分布可以通过简单的JSON格式配置文件轻松配置。通过整合测序误差和扩增偏差，SeqMaker能够模拟更真实的下一代测序数据。可配置的变体和捕获区域使SeqMaker在为体细胞突变呼叫等应用程序生成训练生物信息学管道的数据方面非常有用。

{"title":"SeqMaker: A next generation sequencing simulator with variations, sequencing errors and amplification bias integrated","authors":"Shifu Chen, Yue Han, Lanting Guo, Jing-Shan Hu, Jia Gu","doi":"10.1109/BIBM.2016.7822634","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822634","url":null,"abstract":"Tuning bioinformatics pipelines and training software parameters require sequencing data with known ground truth, which are actually difficult to get from real sequencing data. Particularly, for those applications of detecting low frequency variations (like ctDNA sequencing), it is hard to tell whether a called variation is a true positive, or a false positive caused by errors from sequencing or other processes. In these cases, simulated data with configured variations can be used to troubleshoot and validate bioinformatics programs. Although lots of next generation sequencing simulators have already been developed, most of them lack of capability to simulate lots of practical features, such like target capturing sequencing, copy number variations, gene fusions, amplification bias and sequencing errors. In this paper, we will present SeqMaker, a modern NGS simulator with capability to simulate different kinds of variations, with amplification bias and sequencing errors integrated. Target capturing sequencing is simply supported by using a capturing panel description file, other characteristics like sequencing error rate, average duplication level, DNA template length distribution and quality distribution can be easily configured with a simple JSON format profile file. With the integration sequencing errors and amplification bias, SeqMaker is able to simulate more real next generation sequencing data. The configurable variants and capturing regions make SeqMaker very useful to generate data for training bioinformatics pipelines for applications like somatic mutation calling.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"11 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Innovative microRNA-lncRNA-mRNA co-expression analysis to understand the pathogenesis and progression of diabetic kidney disease 创新microRNA-lncRNA-mRNA共表达分析，了解糖尿病肾病的发病和进展

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822601

Lihua Zhang, Rong Li, Qiuping Yang, Yanan Wu, Jingshan Huang, Bin Wu

Diabetic kidney disease (DKD) is a serious disease that presents a major health problem worldwide. There is a desperate need to explore novel biomarkers to further facilitate the early diagnosis and effective treatment in DKD patients so that to prevent them to develop end-stage renal disease (ESRD). However, most of regulation mechanisms at genetic level in DKD still remain unclear. In this work-in-progress paper, we describe our innovative methodologies that integrate biological, statistics, and computational approaches to investigate important roles performed by regulations among microRNAs (miRs), long non-coding RNAs (lncRNAs), and messenger RNAs (mRNAs) in DKD. We conducted a series of experiments and identified a list of miRs and lncRNAs as potential novel biomarkers, along with the set of target genes regulated by discovered miRs. Our initial analysis results are promising in better understanding regulation mechanisms of miRs and lncRNAs on the pathogenesis and progression of DKD.

糖尿病肾病(DKD)是一种严重的疾病，是世界范围内的主要健康问题。迫切需要探索新的生物标志物，进一步促进DKD患者的早期诊断和有效治疗，以防止其发展为终末期肾脏疾病(ESRD)。然而，DKD在遗传水平上的调控机制仍不清楚。在这篇正在进行的论文中，我们描述了我们的创新方法，该方法整合了生物学，统计学和计算方法，以研究microRNAs (miRs)，长链非编码rna (lncRNAs)和信使rna (mrna)之间的调控在DKD中发挥的重要作用。我们进行了一系列实验，并确定了一系列miRs和lncrna作为潜在的新型生物标志物，以及一组由发现的miRs调控的靶基因。我们的初步分析结果有望更好地理解miRs和lncrna对DKD发病和进展的调控机制。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀