ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine最新文献_第6页

Chromatin and Genomic determinants of alternative splicing. 选择性剪接的染色质和基因组决定因素。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2015-09-01 DOI: 10.1145/2808719.2808755

Kun Wang, Kan Cao, Sridhar Hannenhalli

Alternative splicing significantly contributes to proteomic diversity and mis-regulation of splicing can cause diseases in human. Although both genomic and chromatin features have been shown to associate with splicing, the mechanisms by which various chromatin marks influence splicing is not clear for the most part. Moreover, it is not known whether the influence of specific genomic features on splicing is potentially modulated by the chromatin context. Here we report a deep neural network (DNN) model for predicting exon inclusion based on comprehensive genomic and chromatin features. Our analysis in three cell lines shows that, while both genomic and chromatin features can predict splicing to varying degrees, genomic features are the primary drivers of splicing, and the predictive power of chromatin features can largely be explained by their correlation with genomic features; chromatin features do not yield substantial independent contribution to splicing predictability. However, our model identified specific interactions between chromatin and genomic features suggesting that the effect of genomic elements may be modulated by chromatin context.

选择性剪接对蛋白质组多样性有重要影响，剪接调控不当可导致人类疾病。尽管基因组和染色质特征都与剪接有关，但各种染色质标记影响剪接的机制在很大程度上尚不清楚。此外，目前尚不清楚特定基因组特征对剪接的影响是否可能受到染色质背景的调节。在这里，我们报告了一个深度神经网络(DNN)模型，用于预测基于综合基因组和染色质特征的外显子包含。我们对三种细胞系的分析表明，虽然基因组和染色质特征都可以在不同程度上预测剪接，但基因组特征是剪接的主要驱动因素，染色质特征的预测能力在很大程度上可以通过它们与基因组特征的相关性来解释;染色质特征对剪接的可预测性没有实质性的独立贡献。然而，我们的模型确定了染色质和基因组特征之间的特定相互作用，这表明基因组元件的作用可能受到染色质背景的调节。

{"title":"Chromatin and Genomic determinants of alternative splicing.","authors":"Kun Wang, Kan Cao, Sridhar Hannenhalli","doi":"10.1145/2808719.2808755","DOIUrl":"https://doi.org/10.1145/2808719.2808755","url":null,"abstract":"Alternative splicing significantly contributes to proteomic diversity and mis-regulation of splicing can cause diseases in human. Although both genomic and chromatin features have been shown to associate with splicing, the mechanisms by which various chromatin marks influence splicing is not clear for the most part. Moreover, it is not known whether the influence of specific genomic features on splicing is potentially modulated by the chromatin context. Here we report a deep neural network (DNN) model for predicting exon inclusion based on comprehensive genomic and chromatin features. Our analysis in three cell lines shows that, while both genomic and chromatin features can predict splicing to varying degrees, genomic features are the primary drivers of splicing, and the predictive power of chromatin features can largely be explained by their correlation with genomic features; chromatin features do not yield substantial independent contribution to splicing predictability. However, our model identified specific interactions between chromatin and genomic features suggesting that the effect of genomic elements may be modulated by chromatin context.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2015 ","pages":"345-354"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2808719.2808755","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

icuARM-II: improving the reliability of personalized risk prediction in pediatric intensive care units. icuARM-II：提高儿科重症监护病房个性化风险预测的可靠性。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2014-09-01 DOI: 10.1145/2649387.2649440

Chih-Wen Cheng, Nikhil Chanani, Kevin Maher, Wang

Clinicians in intensive care units (ICUs) rely on standardized scores as risk prediction models to predict a patient's vulnerability to life-threatening events. Conventional Current scales calculate scores from a fixed set of conditions collected within a specific time window. However, modern monitoring technologies generate complex, temporal, and multimodal patient data that conventional prediction models scales cannot fully utilize. Thus, a more sophisticated model is needed to tailor individual characteristics and incorporate multiple temporal modalities for a personalized risk prediction. Furthermore, most scales models focus on adult patients. To address this needdeficiency, we propose a newly designed ICU risk prediction system, called icuARM-II, using a large-scaled pediatric ICU database from Children's Healthcare of Atlanta. This novel database contains clinical data collected in 5,739 ICU visits from 4,975 patients. We propose a temporal association rule mining framework giving clinicians a potential to perform predict risks prediction based on all available patient conditions without being restricted by a fixed observation window. We also develop a new metric that can rigidly assesses the reliability of all all generated association rules. In addition, the icuARM-II features an interactive user interface. Using the icuARM-II, our results demonstrated showed a use case of short-term mortality prediction using lab testing results, which demonstrated a potential new solution for reliable ICU risk prediction using personalized clinical data in a previously neglected population.

重症监护室（ICU）的临床医生依靠标准化评分作为风险预测模型，来预测病人是否容易发生危及生命的事件。传统的电流量表根据在特定时间窗口内收集到的一组固定条件计算分数。然而，现代监测技术会产生复杂的、时间性的和多模态的患者数据，传统的预测模型量表无法充分利用这些数据。因此，需要一个更复杂的模型来调整个体特征，并结合多种时间模式进行个性化风险预测。此外，大多数量表模型都侧重于成年患者。为了解决这一不足，我们利用亚特兰大儿童医疗保健中心的大型儿科 ICU 数据库，提出了一种新设计的 ICU 风险预测系统，称为 icuARM-II。这个新型数据库包含从 4975 名患者的 5739 次 ICU 访问中收集的临床数据。我们提出了一种时间关联规则挖掘框架，使临床医生可以根据所有可用的患者情况进行风险预测，而不受固定观察窗口的限制。我们还开发了一种新指标，可以严格评估所有生成关联规则的可靠性。此外，icuARM-II 还具有交互式用户界面。利用 icuARM-II，我们的成果展示了一个利用实验室检测结果预测短期死亡率的用例，这为利用个性化临床数据在以前被忽视的人群中进行可靠的 ICU 风险预测提供了一个潜在的新解决方案。

{"title":"icuARM-II: improving the reliability of personalized risk prediction in pediatric intensive care units.","authors":"Chih-Wen Cheng, Nikhil Chanani, Kevin Maher, Wang","doi":"10.1145/2649387.2649440","DOIUrl":"10.1145/2649387.2649440","url":null,"abstract":"Clinicians in intensive care units (ICUs) rely on standardized scores as risk prediction models to predict a patient's vulnerability to life-threatening events. Conventional Current scales calculate scores from a fixed set of conditions collected within a specific time window. However, modern monitoring technologies generate complex, temporal, and multimodal patient data that conventional prediction models scales cannot fully utilize. Thus, a more sophisticated model is needed to tailor individual characteristics and incorporate multiple temporal modalities for a personalized risk prediction. Furthermore, most scales models focus on adult patients. To address this needdeficiency, we propose a newly designed ICU risk prediction system, called icuARM-II, using a large-scaled pediatric ICU database from Children's Healthcare of Atlanta. This novel database contains clinical data collected in 5,739 ICU visits from 4,975 patients. We propose a temporal association rule mining framework giving clinicians a potential to perform predict risks prediction based on all available patient conditions without being restricted by a fixed observation window. We also develop a new metric that can rigidly assesses the reliability of all all generated association rules. In addition, the icuARM-II features an interactive user interface. Using the icuARM-II, our results demonstrated showed a use case of short-term mortality prediction using lab testing results, which demonstrated a potential new solution for reliable ICU risk prediction using personalized clinical data in a previously neglected population.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":" ","pages":"211-219"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983419/pdf/nihms805837.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34313365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

omniClassifier: a Desktop Grid Computing System for Big Data Prediction Modeling. omniClassifier：用于大数据预测建模的桌面网格计算系统。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2014-09-01 DOI: 10.1145/2649387.2649439

John H Phan, Sonal Kothari, May D Wang

Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of "Big Data". Grid computing is a potential solution to the computational challenges of Big Data. Desktop grid computing, which uses idle CPU cycles of commodity desktop machines, coupled with commercial cloud computing resources can enable research labs to gain easier and more cost effective access to vast computing resources. We have developed omniClassifier, a multi-purpose prediction modeling application that provides researchers with a tool for conducting machine learning research within the guidelines of recommended best-practices. omniClassifier is implemented as a desktop grid computing system using the Berkeley Open Infrastructure for Network Computing (BOINC) middleware. In addition to describing implementation details, we use various gene expression datasets to demonstrate the potential scalability of omniClassifier for efficient and robust Big Data prediction modeling. A prototype of omniClassifier can be accessed at http://omniclassifier.bme.gatech.edu/.

稳健的预测模型对许多科学、工程和生物医学应用都很重要。然而，优化预测模型的最佳实践程序在计算上非常复杂，尤其是从成百上千个参数中选择模型时更是如此。随着 "大数据 "时代的到来，计算复杂度随着这些领域的数据增长而进一步提高。网格计算是应对大数据计算挑战的潜在解决方案。桌面网格计算可利用商品台式机的闲置 CPU 周期，再加上商业云计算资源，可使研究实验室更轻松、更经济高效地获取大量计算资源。我们开发了多用途预测建模应用程序 omniClassifier，为研究人员提供了在推荐的最佳实践指导下开展机器学习研究的工具。OmniClassifier 是使用伯克利网络计算开放基础设施（BOINC）中间件作为桌面网格计算系统实现的。除了介绍实施细节外，我们还使用各种基因表达数据集来展示 omniClassifier 在高效、稳健的大数据预测建模方面的潜在可扩展性。您可以在 http://omniclassifier.bme.gatech.edu/ 上访问 omniClassifier 的原型。

{"title":"omniClassifier: a Desktop Grid Computing System for Big Data Prediction Modeling.","authors":"John H Phan, Sonal Kothari, May D Wang","doi":"10.1145/2649387.2649439","DOIUrl":"10.1145/2649387.2649439","url":null,"abstract":"Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of \"Big Data\". Grid computing is a potential solution to the computational challenges of Big Data. Desktop grid computing, which uses idle CPU cycles of commodity desktop machines, coupled with commercial cloud computing resources can enable research labs to gain easier and more cost effective access to vast computing resources. We have developed omniClassifier, a multi-purpose prediction modeling application that provides researchers with a tool for conducting machine learning research within the guidelines of recommended best-practices. omniClassifier is implemented as a desktop grid computing system using the Berkeley Open Infrastructure for Network Computing (BOINC) middleware. In addition to describing implementation details, we use various gene expression datasets to demonstrate the potential scalability of omniClassifier for efficient and robust Big Data prediction modeling. A prototype of omniClassifier can be accessed at http://omniclassifier.bme.gatech.edu/.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"514-523"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983434/pdf/nihms805844.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9852973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Are We There Yet? Feasibility of Continuous Stress Assessment via Wireless Physiological Sensors. 我们成功了吗？通过无线生理传感器进行连续压力评估的可行性

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2014-01-01 DOI: 10.1145/2649387.2649433

Mahbubur Rahman, Rummana Bari, Amin Ahsan Ali, Moushumi Sharmin, Andrew Raij, Karen Hovsepian, Syed Monowar Hossain, Emre Ertin, Ashley Kennedy, David H Epstein, Kenzie L Preston, Michelle Jobes, J Gayle Beck, Satish Kedia, Kenneth D Ward, Mustafa al'Absi, Santosh Kumar

Stress can lead to headaches and fatigue, precipitate addictive behaviors (e.g., smoking, alcohol and drug use), and lead to cardiovascular diseases and cancer. Continuous assessment of stress from sensors can be used for timely delivery of a variety of interventions to reduce or avoid stress. We investigate the feasibility of continuous stress measurement via two field studies using wireless physiological sensors - a four-week study with illicit drug users (n = 40), and a one-week study with daily smokers and social drinkers (n = 30). We find that 11+ hours/day of usable data can be obtained in a 4-week study. Significant learning effect is observed after the first week and data yield is seen to be increasing over time even in the fourth week. We propose a framework to analyze sensor data yield and find that losses in wireless channel is negligible; the main hurdle in further improving data yield is the attachment constraint. We show the feasibility of measuring stress minutes preceding events of interest and observe the sensor-derived stress to be rising prior to self-reported stress and smoking events.

压力会导致头痛和疲劳，诱发成瘾行为（如吸烟、酗酒和吸毒），并引发心血管疾病和癌症。传感器对压力的连续评估可用于及时提供各种干预措施，以减轻或避免压力。我们利用无线生理传感器进行了两项实地研究，调查了连续压力测量的可行性--一项是对非法药物使用者（40 人）进行的为期四周的研究，另一项是对日常吸烟者和社交饮酒者（30 人）进行的为期一周的研究。我们发现，在为期四周的研究中，每天可获得 11 小时以上的可用数据。第一周后观察到显著的学习效应，即使到了第四周，数据产量也会随着时间的推移而增加。我们提出了一个分析传感器数据产量的框架，并发现无线信道的损耗可以忽略不计；进一步提高数据产量的主要障碍是附件限制。我们证明了在相关事件发生前几分钟测量压力的可行性，并观察到传感器得出的压力在自我报告的压力和吸烟事件发生前有所上升。

{"title":"Are We There Yet? Feasibility of Continuous Stress Assessment via Wireless Physiological Sensors.","authors":"Mahbubur Rahman, Rummana Bari, Amin Ahsan Ali, Moushumi Sharmin, Andrew Raij, Karen Hovsepian, Syed Monowar Hossain, Emre Ertin, Ashley Kennedy, David H Epstein, Kenzie L Preston, Michelle Jobes, J Gayle Beck, Satish Kedia, Kenneth D Ward, Mustafa al'Absi, Santosh Kumar","doi":"10.1145/2649387.2649433","DOIUrl":"10.1145/2649387.2649433","url":null,"abstract":"Stress can lead to headaches and fatigue, precipitate addictive behaviors (e.g., smoking, alcohol and drug use), and lead to cardiovascular diseases and cancer. Continuous assessment of stress from sensors can be used for timely delivery of a variety of interventions to reduce or avoid stress. We investigate the feasibility of continuous stress measurement via two field studies using wireless physiological sensors - a four-week study with illicit drug users (n = 40), and a one-week study with daily smokers and social drinkers (n = 30). We find that 11+ hours/day of usable data can be obtained in a 4-week study. Significant learning effect is observed after the first week and data yield is seen to be increasing over time even in the fourth week. We propose a framework to analyze sensor data yield and find that losses in wireless channel is negligible; the main hurdle in further improving data yield is the attachment constraint. We show the feasibility of measuring stress minutes preceding events of interest and observe the sensor-derived stress to be rising prior to self-reported stress and smoking events.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"479-488"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374173/pdf/nihms-671146.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33047557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine. SimConcept：简化生物医学中复合命名实体的混合方法。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2014-01-01 DOI: 10.1145/2649387.2649420

Chih-Hsuan Wei, Robert Leaman, Zhiyong Lu

Many text-mining studies have focused on the issue of named entity recognition and normalization, especially in the field of biomedical natural language processing. However, entity recognition is a complicated and difficult task in biomedical text. One particular challenge is to identify and resolve composite named entities, where a single span refers to more than one concept(e.g., BRCA1/2). Most bioconcept recognition and normalization studies have either ignored this issue, used simple ad-hoc rules, or only handled coordination ellipsis, which is only one of the many types of composite mentions studied in this work. No systematic methods for simplifying composite mentions have been previously reported, making a robust approach greatly needed. To this end, we propose a hybrid approach by integrating a machine learning model with a pattern identification strategy to identify the antecedent and conjuncts regions of a concept mention, and then reassemble the composite mention using those identified regions. Our method, which we have named SimConcept, is the first method to systematically handle most types of composite mentions. Our method achieves high performance in identifying and resolving composite mentions for three fundamental biological entities: genes (89.29% in F-measure), diseases (85.52% in F-measure) and chemicals (84.04% in F-measure). Furthermore, our results show that, using our SimConcept method can subsequently help improve the performance of gene and disease concept recognition and normalization.

许多文本挖掘研究都关注命名实体识别和规范化问题，尤其是在生物医学自然语言处理领域。然而，在生物医学文本中，实体识别是一项复杂而艰巨的任务。一个特殊的挑战是识别和解决复合命名实体，即一个跨度指的是多个概念（如 BRCA1/2）。大多数生物概念识别和规范化研究要么忽略了这一问题，要么使用简单的临时规则，要么只处理了协调省略，而协调省略只是本文研究的多种复合提及类型之一。以前没有报道过简化复合提及的系统方法，因此非常需要一种稳健的方法。为此，我们提出了一种混合方法，将机器学习模型与模式识别策略相结合，以识别概念提及的先行词和连接词区域，然后使用这些识别出的区域重新组合复合提及。我们将这种方法命名为 SimConcept，它是第一种系统地处理大多数类型的复合提及的方法。我们的方法在识别和解决基因（F-measure 为 89.29%）、疾病（F-measure 为 85.52%）和化学物质（F-measure 为 84.04%）这三个基本生物实体的复合提及方面取得了很高的性能。此外，我们的结果表明，使用我们的 SimConcept 方法有助于提高基因和疾病概念识别和规范化的性能。

{"title":"SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.","authors":"Chih-Hsuan Wei, Robert Leaman, Zhiyong Lu","doi":"10.1145/2649387.2649420","DOIUrl":"10.1145/2649387.2649420","url":null,"abstract":"Many text-mining studies have focused on the issue of named entity recognition and normalization, especially in the field of biomedical natural language processing. However, entity recognition is a complicated and difficult task in biomedical text. One particular challenge is to identify and resolve composite named entities, where a single span refers to more than one concept(e.g., BRCA1/2). Most bioconcept recognition and normalization studies have either ignored this issue, used simple ad-hoc rules, or only handled coordination ellipsis, which is only one of the many types of composite mentions studied in this work. No systematic methods for simplifying composite mentions have been previously reported, making a robust approach greatly needed. To this end, we propose a hybrid approach by integrating a machine learning model with a pattern identification strategy to identify the antecedent and conjuncts regions of a concept mention, and then reassemble the composite mention using those identified regions. Our method, which we have named SimConcept, is the first method to systematically handle most types of composite mentions. Our method achieves high performance in identifying and resolving composite mentions for three fundamental biological entities: genes (89.29% in F-measure), diseases (85.52% in F-measure) and chemicals (84.04% in F-measure). Furthermore, our results show that, using our SimConcept method can subsequently help improve the performance of gene and disease concept recognition and normalization.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2014 ","pages":"138-146"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384177/pdf/nihms673019.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33193039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrated miRNA and mRNA Analysis of Time Series Microarray Data. 时间序列微阵列数据的集成miRNA和mRNA分析。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2014-01-01 DOI: 10.1145/2649387.2649411

Julian Dymacek, Nancy Lan Guo

The dynamic temporal regulatory effects of microRNA are not well known. We introduce a technique for integrating miRNA and mRNA time series microarray data with known disease pathology. The integrated analysis includes identifying both mRNA and miRNA that are signi cantly similar to the quantitative pathology. Potential regulatory miRNA/mRNA target pairs are identi ed through databases of both predicted and validated pairs. Finally, potential target pairs are ltered by examining the second derivatives of the fold changes over time. Our system was used on genome-wide microarray expression data of mouse lungs (n = 160) following aspiration of multi-walled carbon nanotubes. This system shows promise of readily identifying miRNA for further study as potential biomarker use.

microRNA的动态时间调控作用尚不清楚。我们介绍了一种将miRNA和mRNA时间序列微阵列数据与已知疾病病理相结合的技术。综合分析包括鉴定mRNA和miRNA，这些mRNA和miRNA与定量病理结果明显相似。通过预测和验证对的数据库确定潜在的调控miRNA/mRNA靶对。最后，通过检查折叠随时间变化的二阶导数来筛选潜在的目标对。我们的系统用于小鼠肺(n = 160)吸入多壁碳纳米管后的全基因组微阵列表达数据。该系统显示出易于识别miRNA作为潜在生物标志物进一步研究的前景。

引用次数: 6

Systematic Assessment of RNA-Seq Quantification Tools Using Simulated Sequence Data. 利用模拟序列数据对RNA-Seq定量工具进行系统评估。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2013-09-01 DOI: 10.1145/2506583.2506648

Raghu Chandramohan, Po-Yen Wu, John H Phan, May D Wang

RNA-sequencing (RNA-seq) technology has emerged as the preferred method for quantification of gene and isoform expression. Numerous RNA-seq quantification tools have been proposed and developed, bringing us closer to developing expression-based diagnostic tests based on this technology. However, because of the rapidly evolving technologies and algorithms, it is essential to establish a systematic method for evaluating the quality of RNA-seq quantification. We investigate how different RNA-seq experimental designs (i.e., variations in sequencing depth and read length) affect various quantification algorithms (i.e., HTSeq, Cufflinks, and MISO). Using simulated data, we evaluate the quantification tools based on four metrics, namely: (1) total number of usable fragments for quantification, (2) detection of genes and isoforms, (3) correlation, and (4) accuracy of expression quantification with respect to the ground truth. Results show that Cufflinks is able to use the largest number of fragments for quantification, leading to better detection of genes and isoforms. However, HTSeq produces more accurate expression estimates. Moreover, each quantification algorithm is affected differently by varying sequencing depth and read length, suggesting that the selection of quantification algorithms should be application-dependent.

rna测序(RNA-seq)技术已成为基因和异构体表达量化的首选方法。已经提出和开发了许多RNA-seq定量工具，使我们更接近开发基于该技术的基于表达的诊断测试。然而，由于技术和算法的快速发展，有必要建立一种系统的方法来评估RNA-seq定量的质量。我们研究了不同的RNA-seq实验设计(即测序深度和读取长度的变化)如何影响各种量化算法(即HTSeq, Cufflinks和MISO)。使用模拟数据，我们基于四个指标来评估量化工具，即:(1)可用于量化的片段总数，(2)基因和同种异构体的检测，(3)相关性，以及(4)相对于基本事实的表达量化准确性。结果表明，Cufflinks能够使用最多的片段进行定量，从而更好地检测基因和同工型。然而，HTSeq产生更准确的表达估计。此外，不同的测序深度和读取长度对每种量化算法的影响不同，这表明量化算法的选择应取决于应用。

{"title":"Systematic Assessment of RNA-Seq Quantification Tools Using Simulated Sequence Data.","authors":"Raghu Chandramohan, Po-Yen Wu, John H Phan, May D Wang","doi":"10.1145/2506583.2506648","DOIUrl":"10.1145/2506583.2506648","url":null,"abstract":"RNA-sequencing (RNA-seq) technology has emerged as the preferred method for quantification of gene and isoform expression. Numerous RNA-seq quantification tools have been proposed and developed, bringing us closer to developing expression-based diagnostic tests based on this technology. However, because of the rapidly evolving technologies and algorithms, it is essential to establish a systematic method for evaluating the quality of RNA-seq quantification. We investigate how different RNA-seq experimental designs (i.e., variations in sequencing depth and read length) affect various quantification algorithms (i.e., HTSeq, Cufflinks, and MISO). Using simulated data, we evaluate the quantification tools based on four metrics, namely: (1) total number of usable fragments for quantification, (2) detection of genes and isoforms, (3) correlation, and (4) accuracy of expression quantification with respect to the ground truth. Results show that Cufflinks is able to use the largest number of fragments for quantification, leading to better detection of genes and isoforms. However, HTSeq produces more accurate expression estimates. Moreover, each quantification algorithm is affected differently by varying sequencing depth and read length, suggesting that the selection of quantification algorithms should be application-dependent.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2013 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2506583.2506648","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34378450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Biological Interpretation of Morphological Patterns in Histopathological Whole-Slide Images. 组织病理学整张切片图像中形态学模式的生物学解读。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2012-10-01 DOI: 10.1145/2382936.2382964

Sonal Kothari, John H Phan, Adeboye O Osunkoya, May D Wang

We propose a framework for studying visual morphological patterns across histopathological whole-slide images (WSIs). Image representation is an important component of computer-aided decision support systems for histopathological cancer diagnosis. Such systems extract hundreds of quantitative image features from digitized tissue biopsy slides and produce models for prediction. The performance of these models depends on the identification of informative features for selection of appropriate regions-of-interest (ROIs) from heterogeneous WSIs and for development of models. However, identification of informative features is hindered by the semantic gap between human interpretation of visual morphological patterns and quantitative image features. We address this challenge by using data mining and information visualization tools to study spatial patterns formed by features extracted from sub-sections of WSIs. Using ovarian serous cystadenocarcinoma (OvCa) WSIs provided by the cancer genome atlas (TCGA), we show that (1) individual and (2) multivariate image features correspond to biologically relevant ROIs, and (3) supervised image feature selection can map histopathology domain knowledge to quantitative image features.

我们提出了一个研究组织病理学全切片图像（WSI）视觉形态模式的框架。图像表示是组织病理学癌症诊断计算机辅助决策支持系统的重要组成部分。此类系统从数字化组织活检切片中提取数百个定量图像特征，并生成预测模型。这些模型的性能取决于信息特征的识别，以便从异构的 WSI 中选择适当的感兴趣区（ROI）并开发模型。然而，由于人类对视觉形态模式的解释与定量图像特征之间存在语义差距，因此信息特征的识别受到阻碍。为了应对这一挑战，我们利用数据挖掘和信息可视化工具来研究从 WSI 的子截面中提取的特征所形成的空间模式。利用癌症基因组图谱（TCGA）提供的卵巢浆液性囊腺癌（OvCa）WSIs，我们证明了（1）单个和（2）多元图像特征对应于生物相关的 ROI，以及（3）监督图像特征选择可以将组织病理学领域的知识映射到定量图像特征。

{"title":"Biological Interpretation of Morphological Patterns in Histopathological Whole-Slide Images.","authors":"Sonal Kothari, John H Phan, Adeboye O Osunkoya, May D Wang","doi":"10.1145/2382936.2382964","DOIUrl":"10.1145/2382936.2382964","url":null,"abstract":"We propose a framework for studying visual morphological patterns across histopathological whole-slide images (WSIs). Image representation is an important component of computer-aided decision support systems for histopathological cancer diagnosis. Such systems extract hundreds of quantitative image features from digitized tissue biopsy slides and produce models for prediction. The performance of these models depends on the identification of informative features for selection of appropriate regions-of-interest (ROIs) from heterogeneous WSIs and for development of models. However, identification of informative features is hindered by the semantic gap between human interpretation of visual morphological patterns and quantitative image features. We address this challenge by using data mining and information visualization tools to study spatial patterns formed by features extracted from sub-sections of WSIs. Using ovarian serous cystadenocarcinoma (OvCa) WSIs provided by the cancer genome atlas (TCGA), we show that (1) individual and (2) multivariate image features correspond to biologically relevant ROIs, and (3) supervised image feature selection can map histopathology domain knowledge to quantitative image features.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2012 ","pages":"218-225"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5859578/pdf/nihms807306.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35939491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Collective Ranking Method for Genome-wide Association Studies. 全基因组关联研究的集体排序方法。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2012-10-01 DOI: 10.1145/2382936.2382976

Jie Liu, Humberto Vidaillet, Elizabeth Burnside, David Page

Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms of their relevance to the given phenotype. However, because certain SNPs are known to be highly correlated with one another across individuals, it can be beneficial to take into account these correlations when ranking. If a SNP appears associated with the phenotype, and we question whether this association is real, the extent to which its neighbors (correlated SNPs) also appear associated can be informative. Therefore, we propose CollectRank, a ranking approach which allows SNPs to reinforce one another via the correlation structure. CollectRank is loosely analogous to the well-known PageRank algorithm. We first evaluate CollectRank on synthetic data generated from a variety of genetic models under different settings. The numerical results suggest CollectRank can significantly outperform common GWAS methods at the cost of a small amount of extra computation. We further evaluate CollectRank on two real-world GWAS on breast cancer and atrial fibrillation/flutter, and CollectRank performs well in both studies. We finally provide a theoretical analysis that also suggests CollectRank's advantages.

全基因组关联研究(GWAS)分析整个人类基因组的遗传变异(SNPs)，寻找与某些表型(最常见的疾病，如乳腺癌)相关的SNPs。在GWAS中，我们根据snp与给定表型的相关性寻求snp的排名。然而，由于已知某些snp在个体之间彼此高度相关，因此在排序时考虑这些相关性可能是有益的。如果一个SNP出现与表型相关，我们质疑这种关联是否真实，那么它的邻居(相关SNP)也出现相关的程度可以提供信息。因此，我们提出了CollectRank，这是一种允许snp通过相关结构相互加强的排序方法。CollectRank松散地类似于众所周知的PageRank算法。我们首先在不同设置下由各种遗传模型生成的合成数据上评估CollectRank。数值结果表明，CollectRank可以在少量额外计算的代价下显著优于常见的GWAS方法。我们进一步评估了CollectRank对乳腺癌和心房颤动/扑动的两项真实GWAS, CollectRank在两项研究中均表现良好。最后，我们提供了一个理论分析，也表明了CollectRank的优势。

{"title":"A Collective Ranking Method for Genome-wide Association Studies.","authors":"Jie Liu, Humberto Vidaillet, Elizabeth Burnside, David Page","doi":"10.1145/2382936.2382976","DOIUrl":"https://doi.org/10.1145/2382936.2382976","url":null,"abstract":"Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms of their relevance to the given phenotype. However, because certain SNPs are known to be highly correlated with one another across individuals, it can be beneficial to take into account these correlations when ranking. If a SNP appears associated with the phenotype, and we question whether this association is real, the extent to which its neighbors (correlated SNPs) also appear associated can be informative. Therefore, we propose CollectRank, a ranking approach which allows SNPs to reinforce one another via the correlation structure. CollectRank is loosely analogous to the well-known PageRank algorithm. We first evaluate CollectRank on synthetic data generated from a variety of genetic models under different settings. The numerical results suggest CollectRank can significantly outperform common GWAS methods at the cost of a small amount of extra computation. We further evaluate CollectRank on two real-world GWAS on breast cancer and atrial fibrillation/flutter, and CollectRank performs well in both studies. We finally provide a theoretical analysis that also suggests CollectRank's advantages.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2012 ","pages":"313-320"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2382936.2382976","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37889997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ranking Docked Models of Protein-Protein Complexes Using Predicted Partner-Specific Protein-Protein Interfaces: A Preliminary Study. 利用预测的伴侣特异性蛋白质-蛋白质界面对蛋白质-蛋白质复合物进行排序对接模型：初步研究。

ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pub Date : 2011-08-01 DOI: 10.1145/2147805.2147866

Li C Xue, Rafael A Jordan, Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar

Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. Selecting near-native conformations from the large number of possible models generated by docking software presents a significant challenge in practice. We introduce a novel method for ranking docked conformations based on the degree of overlap between the interface residues of a docked conformation formed by a pair of proteins with the set of predicted interface residues between them. Our approach relies on a method, called PS-HomPPI, for reliably predicting protein-protein interface residues by taking into account information derived from both interacting proteins. PS-HomPPI infers the residues of a query protein that are likely to interact with a partner protein based on known interface residues of the homo-interologs of the query-partner protein pair, i.e., pairs of interacting proteins that are homologous to the query protein and partner protein. Our results on Docking Benchmark 3.0 show that the quality of the ranking of docked conformations using our method is consistently superior to that produced using ClusPro cluster-size-based and energy-based criteria for 61 out of the 64 docking complexes for which PS-HomPPI produces interface predictions. An implementation of our method for ranking docked models is freely available at: http://einstein.cs.iastate.edu/DockRank/.

计算蛋白质-蛋白质对接是确定蛋白质相互作用形成的复合物构象的一种有价值的工具。从对接软件生成的大量可能的模型中选择接近本地的构象在实践中是一个重大的挑战。本文介绍了一种基于一对蛋白质所形成的对接构象的界面残基之间的重叠程度以及它们之间的预测界面残基集对对接构象进行排序的新方法。我们的方法依赖于一种称为PS-HomPPI的方法，该方法通过考虑来自两种相互作用蛋白质的信息来可靠地预测蛋白质-蛋白质界面残基。PS-HomPPI根据已知的查询-伴侣蛋白对同源同源物的界面残基，即与查询蛋白和伴侣蛋白同源的相互作用蛋白对，推断出可能与伴侣蛋白相互作用的查询蛋白残基。我们在对接基准3.0上的结果表明，对于PS-HomPPI产生界面预测的64个对接配合物中的61个，使用我们的方法对对接构象进行排序的质量始终优于使用ClusPro基于簇大小和基于能量的标准产生的排序质量。我们对停靠模型进行排名的方法的实现可以免费获得：http://einstein.cs.iastate.edu/DockRank/。

{"title":"Ranking Docked Models of Protein-Protein Complexes Using Predicted Partner-Specific Protein-Protein Interfaces: A Preliminary Study.","authors":"Li C Xue, Rafael A Jordan, Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar","doi":"10.1145/2147805.2147866","DOIUrl":"10.1145/2147805.2147866","url":null,"abstract":"Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. Selecting near-native conformations from the large number of possible models generated by docking software presents a significant challenge in practice. We introduce a novel method for ranking docked conformations based on the degree of overlap between the interface residues of a docked conformation formed by a pair of proteins with the set of predicted interface residues between them. Our approach relies on a method, called PS-HomPPI, for reliably predicting protein-protein interface residues by taking into account information derived from both interacting proteins. PS-HomPPI infers the residues of a query protein that are likely to interact with a partner protein based on known interface residues of the homo-interologs of the query-partner protein pair, i.e., pairs of interacting proteins that are homologous to the query protein and partner protein. Our results on Docking Benchmark 3.0 show that the quality of the ranking of docked conformations using our method is consistently superior to that produced using ClusPro cluster-size-based and energy-based criteria for 61 out of the 64 docking complexes for which PS-HomPPI produces interface predictions. An implementation of our method for ranking docked models is freely available at: http://einstein.cs.iastate.edu/DockRank/.","PeriodicalId":72044,"journal":{"name":"ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine","volume":"2011 ","pages":"441-445"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4403796/pdf/nihms314851.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33243558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0