首页 > 最新文献

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)最新文献

英文 中文
Facial expression recognition based on LLENet 基于LLENet的面部表情识别
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822814
Dan Meng, Guitao Cao, Zhihai He, W. Cao
Facial expression recognition plays an important role in lie detection, and computer-aided diagnosis. Many deep learning facial expression feature extraction methods have a great improvement in recognition accuracy and robutness than traditional feature extraction methods. However, most of current deep learning methods need special parameter tuning and ad hoc fine-tuning tricks. This paper proposes a novel feature extraction model called Locally Linear Embedding Network (LLENet) for facial expression recognition. The proposed LLENet first reconstructs image sets for the cropped images. Unlike previous deep convolutional neural networks that initialized convolutional kernels randomly, we learn multi-stage kernels from reconstructed image sets directly in a supervised way. Also, we create an improved LLE to select kernels, from which we can obtain the most representative feature maps. Furthermore, to better measure the contribution of these kernels, a new distance based on kernel Euclidean is proposed. After the procedure of multi-scale feature analysis, feature representations are finally sent into a linear classifier. Experimental results on facial expression datasets (CK+) show that the proposed model can capture most representative features and thus improves previous results.
面部表情识别在测谎和计算机辅助诊断中起着重要的作用。许多深度学习面部表情特征提取方法在识别精度和鲁棒性上都比传统特征提取方法有了很大的提高。然而,目前大多数深度学习方法需要特殊的参数调整和特殊的微调技巧。提出了一种新的面部表情特征提取模型——局部线性嵌入网络(LLENet)。提出的LLENet首先为裁剪后的图像重建图像集。与以往随机初始化卷积核的深度卷积神经网络不同,我们直接以监督的方式从重构图像集中学习多阶段核。此外,我们还创建了一个改进的LLE来选择内核,从中我们可以获得最具代表性的特征映射。此外,为了更好地衡量这些核的贡献,提出了一种基于核欧几里得的距离。经过多尺度特征分析后,将特征表示送入线性分类器。在面部表情数据集(CK+)上的实验结果表明,该模型能够捕获大多数具有代表性的特征,从而改进了先前的结果。
{"title":"Facial expression recognition based on LLENet","authors":"Dan Meng, Guitao Cao, Zhihai He, W. Cao","doi":"10.1109/BIBM.2016.7822814","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822814","url":null,"abstract":"Facial expression recognition plays an important role in lie detection, and computer-aided diagnosis. Many deep learning facial expression feature extraction methods have a great improvement in recognition accuracy and robutness than traditional feature extraction methods. However, most of current deep learning methods need special parameter tuning and ad hoc fine-tuning tricks. This paper proposes a novel feature extraction model called Locally Linear Embedding Network (LLENet) for facial expression recognition. The proposed LLENet first reconstructs image sets for the cropped images. Unlike previous deep convolutional neural networks that initialized convolutional kernels randomly, we learn multi-stage kernels from reconstructed image sets directly in a supervised way. Also, we create an improved LLE to select kernels, from which we can obtain the most representative feature maps. Furthermore, to better measure the contribution of these kernels, a new distance based on kernel Euclidean is proposed. After the procedure of multi-scale feature analysis, feature representations are finally sent into a linear classifier. Experimental results on facial expression datasets (CK+) show that the proposed model can capture most representative features and thus improves previous results.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128906021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A modified rough-fuzzy clustering algorithm with spatial information for HEp-2 cell image segmentation 一种基于空间信息的改进的HEp-2细胞图像粗模糊聚类算法
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822549
Shaswati Roy, P. Maji
Indirect immunofluorescence (IIF) analysis is the most effective test for antinuclear autoantibodies (ANAs) analysis, in order to reveal the occurrence of some autoimmune diseases, such as connective tissue disorders. In the tests of antinuclear antibodies, the human epithelial type 2 (HEp-2) cells is mostly used as substrate. However, the recognition of the staining pattern of ANAs in the IIF image requires proper detection of the region of interest. In this regard, automatic segmentation of IIF images is an essential prerequisite as manual segmentation is labor intensive, time consuming, and subjective. Recently, rough-fuzzy clustering has been shown to provide significant results for image segmentation by handling different uncertainties present in the images. But, the existing robust rough-fuzzy clustering algorithm does not consider spatial distribution of the image. This is useful when the image is distorted by noise and other artifacts. In this regard, the paper proposes a segmentation algorithm by incorporating the spatial constraint with the advantages of robust rough-fuzzy clustering. In the current study, class label of a pixel is influenced by its neighboring pixels depending on their spatial distance. In this way, more number of neighboring pixels can be incorporated into the calculation of a pixel feature. The performance of the proposed method is evaluated on several HEp-2 cell images and compared with the existing algorithms by presenting both qualitative and quantitative results.
间接免疫荧光(IIF)分析是抗核自身抗体(ANAs)分析中最有效的方法,可用于揭示结缔组织疾病等自身免疫性疾病的发生。在抗核抗体试验中,人上皮细胞2型(HEp-2)细胞多被用作底物。然而,识别IIF图像中ANAs的染色模式需要对感兴趣的区域进行适当的检测。在这方面,IIF图像的自动分割是必不可少的先决条件,因为人工分割是劳动密集、耗时且主观的。近年来,粗糙模糊聚类通过处理图像中存在的不同不确定性,在图像分割方面取得了显著的效果。但是,现有的鲁棒粗糙模糊聚类算法没有考虑图像的空间分布。当图像被噪声和其他伪影扭曲时,这是有用的。为此,本文提出了一种结合空间约束和鲁棒粗模糊聚类优点的分割算法。在目前的研究中,像素的类标号受到其相邻像素的空间距离的影响。这样,可以将更多的相邻像素合并到像素特征的计算中。在多幅HEp-2细胞图像上评价了该方法的性能,并与现有算法进行了定性和定量对比。
{"title":"A modified rough-fuzzy clustering algorithm with spatial information for HEp-2 cell image segmentation","authors":"Shaswati Roy, P. Maji","doi":"10.1109/BIBM.2016.7822549","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822549","url":null,"abstract":"Indirect immunofluorescence (IIF) analysis is the most effective test for antinuclear autoantibodies (ANAs) analysis, in order to reveal the occurrence of some autoimmune diseases, such as connective tissue disorders. In the tests of antinuclear antibodies, the human epithelial type 2 (HEp-2) cells is mostly used as substrate. However, the recognition of the staining pattern of ANAs in the IIF image requires proper detection of the region of interest. In this regard, automatic segmentation of IIF images is an essential prerequisite as manual segmentation is labor intensive, time consuming, and subjective. Recently, rough-fuzzy clustering has been shown to provide significant results for image segmentation by handling different uncertainties present in the images. But, the existing robust rough-fuzzy clustering algorithm does not consider spatial distribution of the image. This is useful when the image is distorted by noise and other artifacts. In this regard, the paper proposes a segmentation algorithm by incorporating the spatial constraint with the advantages of robust rough-fuzzy clustering. In the current study, class label of a pixel is influenced by its neighboring pixels depending on their spatial distance. In this way, more number of neighboring pixels can be incorporated into the calculation of a pixel feature. The performance of the proposed method is evaluated on several HEp-2 cell images and compared with the existing algorithms by presenting both qualitative and quantitative results.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121116899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Visual orchestration and autonomous execution of distributed and heterogeneous computational biology pipelines 分布式和异构计算生物学管道的可视化编排和自主执行
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822615
Xin Mou, H. Jamil, R. Rinker
Data integration continues to baffle researchers even though substantial progress has been made. Although the emergence of technologies such as XML, web services, semantic web and cloud computing have helped, a system in which biologists are comfortable articulating new applications and developing them without technical assistance from a computing expert is yet to be realized. The distance between a friendly graphical interface that does little, and a “traditional” system though clunky yet powerful, is deemed too great more often than not. The question that remains unanswered is, if a user can state her query involving a set of complex, heterogeneous and distributed life sciences resources in an easy to use language and execute it without further help from a computer savvy programmer. In this paper, we present a declarative meta-language, called VisFlow, for requirement specification, and a translator for mapping requirements into executable queries in a variant of SQL augmented with integration artifacts.
尽管已经取得了实质性进展,但数据整合仍然困扰着研究人员。尽管诸如XML、web服务、语义网和云计算等技术的出现有所帮助,但生物学家在没有计算专家的技术帮助下轻松表述和开发新应用程序的系统尚未实现。一个没有什么功能的友好图形界面和一个虽然笨重但功能强大的“传统”系统之间的距离往往被认为太大了。仍未解决的问题是,用户是否可以用一种易于使用的语言陈述涉及一组复杂、异构和分布式生命科学资源的查询,并在没有精通计算机的程序员进一步帮助的情况下执行该查询。在本文中,我们提出了一种声明性元语言,称为VisFlow,用于需求规范,以及一种转换器,用于将需求映射到带有集成构件的SQL变体中的可执行查询。
{"title":"Visual orchestration and autonomous execution of distributed and heterogeneous computational biology pipelines","authors":"Xin Mou, H. Jamil, R. Rinker","doi":"10.1109/BIBM.2016.7822615","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822615","url":null,"abstract":"Data integration continues to baffle researchers even though substantial progress has been made. Although the emergence of technologies such as XML, web services, semantic web and cloud computing have helped, a system in which biologists are comfortable articulating new applications and developing them without technical assistance from a computing expert is yet to be realized. The distance between a friendly graphical interface that does little, and a “traditional” system though clunky yet powerful, is deemed too great more often than not. The question that remains unanswered is, if a user can state her query involving a set of complex, heterogeneous and distributed life sciences resources in an easy to use language and execute it without further help from a computer savvy programmer. In this paper, we present a declarative meta-language, called VisFlow, for requirement specification, and a translator for mapping requirements into executable queries in a variant of SQL augmented with integration artifacts.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Combinational logic network for digitally coded gene expression of gastric cancer 胃癌数字编码基因表达的组合逻辑网络
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822788
Sungjin Park, S. Nam
In general, Boolean networks have been addressed in time-series datasets. However, in the recent field of next-generation sequencing-based cancer genomics, cross-sectional data sets having enormous numbers of patients have been accumulated. Here, we deal with representation of cross-sectional datasets using Boolean networks, and specifically, combinational logic network approach. We then applied the approach to a real cancer patient dataset, demonstrating the feasibility of using Boolean networks in graphical representation of cross-sectional datasets.
一般来说,布尔网络已经在时间序列数据集中得到了解决。然而,在最近的基于下一代测序的癌症基因组学领域,已经积累了大量患者的横断面数据集。在这里,我们使用布尔网络,特别是组合逻辑网络方法来处理横截面数据集的表示。然后,我们将该方法应用于一个真实的癌症患者数据集,证明了在横截面数据集的图形表示中使用布尔网络的可行性。
{"title":"Combinational logic network for digitally coded gene expression of gastric cancer","authors":"Sungjin Park, S. Nam","doi":"10.1109/BIBM.2016.7822788","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822788","url":null,"abstract":"In general, Boolean networks have been addressed in time-series datasets. However, in the recent field of next-generation sequencing-based cancer genomics, cross-sectional data sets having enormous numbers of patients have been accumulated. Here, we deal with representation of cross-sectional datasets using Boolean networks, and specifically, combinational logic network approach. We then applied the approach to a real cancer patient dataset, demonstrating the feasibility of using Boolean networks in graphical representation of cross-sectional datasets.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126787694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some comparisons of gene expression classifiers 基因表达分类器的一些比较
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822783
Shinuk Kim, M. Kon, Hyowon Lee
Numerous computational studies related to cancer have been published, but increasing prediction accuracy of molecular datasets remains a challenge. Here we present a comparison of prediction based on a feature selection method combined with machine learning, for microRNA-Seq (miRNA-Seq) and mRNA-Seq data. We have tested three different approaches: support vector machine, decision tree and k nearest neighbors, under two different feature selection methods: fisher feature selection and infinite feature selection.
许多与癌症相关的计算研究已经发表,但提高分子数据集的预测准确性仍然是一个挑战。在这里,我们对microRNA-Seq (miRNA-Seq)和mRNA-Seq数据进行了基于特征选择方法和机器学习相结合的预测比较。我们在两种不同的特征选择方法:fisher特征选择和无限特征选择下测试了三种不同的方法:支持向量机、决策树和k近邻。
{"title":"Some comparisons of gene expression classifiers","authors":"Shinuk Kim, M. Kon, Hyowon Lee","doi":"10.1109/BIBM.2016.7822783","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822783","url":null,"abstract":"Numerous computational studies related to cancer have been published, but increasing prediction accuracy of molecular datasets remains a challenge. Here we present a comparison of prediction based on a feature selection method combined with machine learning, for microRNA-Seq (miRNA-Seq) and mRNA-Seq data. We have tested three different approaches: support vector machine, decision tree and k nearest neighbors, under two different feature selection methods: fisher feature selection and infinite feature selection.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124053063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Android application for therapeutic feed and fluid calculation in neonatal care - a way to fast, accurate and safe health-care delivery 用于新生儿护理的治疗性饲料和液体计算的Android应用程序-一种快速,准确和安全的医疗保健交付方式
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822650
A. Biswas, Romil Roy, Sourya Bhattacharyya, Deepak Khaneja, S. D. Bhattacharya, J. Mukhopadhyay
Delivering medical care to newborn babies in their early days of life, involves complex mathematical calculation for feeding, intravenous fluid and electrolytes requirements. Manual calculation of this process is time consuming and potential source of medical error. This work proposes a standalone Android application for newborn care unit, which can run in any handheld Android device like mobile phones and helps health-care professionals to calculate certain parameters regarding the feed and intravenous fluid to be given to a newborn baby. The parameters are - total fluid intake, Glucose Infusion Rate, energy, protein, lipid amount, electrolytes, etc. Its logic is based on the medical guidelines for feed and fluid management of newborn babies. It maintains consistency in a large set of inter-related variables using an existential abstraction approach excluding the possibility of having wrong proportions of dextrose, protein, lipid or fluid volume by showing error and warning messages wherever needed, which acts as a safety measure to avoid medication errors. The objective of the work is to make the medical calculation process faster, safer and accurate. A prototype of the application is being tested in a Sick Newborn Care Unit (SNCU) in Kolkata,India for evaluation.
向生命早期的新生儿提供医疗护理,涉及对喂养、静脉输液和电解质需求的复杂数学计算。人工计算这一过程既耗时又可能导致医疗错误。本工作提出了一种新生儿护理单元的独立Android应用程序,该应用程序可以在任何手持Android设备(如手机)上运行,并帮助医疗保健专业人员计算有关给新生儿喂食和静脉输液的某些参数。参数有:总液体摄入量、葡萄糖输注率、能量、蛋白质、脂质量、电解质等。其逻辑是基于新生儿喂养和液体管理的医学指南。它使用存在抽象方法保持大量相互关联变量的一致性,通过在需要时显示错误和警告信息,排除葡萄糖、蛋白质、脂质或液体体积比例错误的可能性,这是一种避免用药错误的安全措施。该工作的目的是使医疗计算过程更快、更安全、更准确。该应用程序的原型正在印度加尔各答的一个患病新生儿护理单位(SNCU)进行测试,以进行评估。
{"title":"Android application for therapeutic feed and fluid calculation in neonatal care - a way to fast, accurate and safe health-care delivery","authors":"A. Biswas, Romil Roy, Sourya Bhattacharyya, Deepak Khaneja, S. D. Bhattacharya, J. Mukhopadhyay","doi":"10.1109/BIBM.2016.7822650","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822650","url":null,"abstract":"Delivering medical care to newborn babies in their early days of life, involves complex mathematical calculation for feeding, intravenous fluid and electrolytes requirements. Manual calculation of this process is time consuming and potential source of medical error. This work proposes a standalone Android application for newborn care unit, which can run in any handheld Android device like mobile phones and helps health-care professionals to calculate certain parameters regarding the feed and intravenous fluid to be given to a newborn baby. The parameters are - total fluid intake, Glucose Infusion Rate, energy, protein, lipid amount, electrolytes, etc. Its logic is based on the medical guidelines for feed and fluid management of newborn babies. It maintains consistency in a large set of inter-related variables using an existential abstraction approach excluding the possibility of having wrong proportions of dextrose, protein, lipid or fluid volume by showing error and warning messages wherever needed, which acts as a safety measure to avoid medication errors. The objective of the work is to make the medical calculation process faster, safer and accurate. A prototype of the application is being tested in a Sick Newborn Care Unit (SNCU) in Kolkata,India for evaluation.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126432404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Factorial analysis of error correction performance using simulated next-generation sequencing data 利用模拟新一代测序数据进行误差校正性能的析因分析
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822685
Isaac Akogwu, Nan Wang, Chaoyang Zhang, Hwanseok Choi, H. Hong, P. Gong
Error correction is a critical initial step in next-generation sequencing (NGS) data analysis. Although more than 60 tools have been developed, there is no systematic evidence-based comparison with regard to their strength and weakness, especially in terms of correction accuracy. Here we report a full factorial simulation study to examine how NGS dataset characteristics (genome size, coverage depth and read length in particular) affect error correction performance (precision and F-score), as well as to compare performance sensitivity/resistance of six k-mer spectrum-based methods to variations in dataset characteristics. Multi-way ANOVA tests indicate that choice of correction method and dataset characteristics had significant effects on performance metrics. Overall, BFC, Bless, Bloocoo and Musket performed better than Lighter and Trowel on 27 synthetic datasets. For each chosen method, read length and coverage depth showed more pronounced impact on performance than genome size. This study shed insights to the performance behavior of error correction methods in response to the common variables one would encounter in real-world NGS datasets. It also warrants further studies of wet lab-generated experimental NGS data to validate findings obtained from this simulation study.
错误校正是新一代测序(NGS)数据分析的关键步骤。虽然已经开发了60多种工具,但对于它们的优缺点,特别是在校正准确性方面,还没有系统的基于证据的比较。在这里,我们报告了一项全因子模拟研究,以研究NGS数据集特征(基因组大小,覆盖深度和读取长度)如何影响纠错性能(精度和f分数),以及比较六种基于k-mer谱的方法对数据集特征变化的性能敏感性/阻力。多因素方差分析表明,校正方法和数据集特征的选择对性能指标有显著影响。总体而言,BFC、Bless、Bloocoo和Musket在27个合成数据集上的表现优于Lighter和Trowel。对于每种选择的方法,读取长度和覆盖深度比基因组大小对性能的影响更明显。这项研究揭示了错误校正方法在响应现实世界NGS数据集中可能遇到的常见变量时的性能行为。它还需要进一步研究湿实验室生成的实验NGS数据,以验证从模拟研究中获得的结果。
{"title":"Factorial analysis of error correction performance using simulated next-generation sequencing data","authors":"Isaac Akogwu, Nan Wang, Chaoyang Zhang, Hwanseok Choi, H. Hong, P. Gong","doi":"10.1109/BIBM.2016.7822685","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822685","url":null,"abstract":"Error correction is a critical initial step in next-generation sequencing (NGS) data analysis. Although more than 60 tools have been developed, there is no systematic evidence-based comparison with regard to their strength and weakness, especially in terms of correction accuracy. Here we report a full factorial simulation study to examine how NGS dataset characteristics (genome size, coverage depth and read length in particular) affect error correction performance (precision and F-score), as well as to compare performance sensitivity/resistance of six k-mer spectrum-based methods to variations in dataset characteristics. Multi-way ANOVA tests indicate that choice of correction method and dataset characteristics had significant effects on performance metrics. Overall, BFC, Bless, Bloocoo and Musket performed better than Lighter and Trowel on 27 synthetic datasets. For each chosen method, read length and coverage depth showed more pronounced impact on performance than genome size. This study shed insights to the performance behavior of error correction methods in response to the common variables one would encounter in real-world NGS datasets. It also warrants further studies of wet lab-generated experimental NGS data to validate findings obtained from this simulation study.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128689090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A method of removing Ocular Artifacts from EEG using Discrete Wavelet Transform and Kalman Filtering 基于离散小波变换和卡尔曼滤波的脑电信号眼部伪影去除方法
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822742
Yan Chen, Qinglin Zhao, Bin Hu, Jianpeng Li, Hua Jiang, Wenhua Lin, Yang Li, Shuangshuang Zhou, Hong Peng
Electroencephalogram (EEG) is a noninvasive method to record electrical activity of brain and it has been used extensively in research of brain function due to its high time resolution. However raw EEG is a mixture of signals, which contains noises such as Ocular Artifact (OA) that is irrelevant to the cognitive function of brain. To remove OAs from EEG, many methods have been proposed, such as Independent Components Analysis (ICA), Discrete Wavelet Transform (DWT), Adaptive Noise Cancellation (ANC) and Wavelet Packet Transform (WPT). In this paper, we present a novel hybrid de-noising method which uses Discrete Wavelet Transform (DWT) and Kalman Filtering to remove OAs in EEG. Firstly, we used this method on simulated data. The Mean Squared Error (MSE) of DWT-Kalman method was 0.0017, significantly lower compared to results using WPT-ICA and DWT-ANC, which were 0.0468 and 0.0052, respectively. Meanwhile, the Mean Absolute Error (MAE) using DWT-Kalman achieved an average of 0.0052, which also performed better than WPT-ICA and DWT-ANC, which were 0.0218 and 0.0115, respectively. Then we applied the proposed approach to the raw data collected by our prototype three-channel EEG collector and 64-channel Braincap from BRAIN PRODUCTS. On both data, our method achieved satisfying results. This method does not rely on any particular electrode or the number of electrodes in certain system, so it is recommended for ubiquitous applications.
脑电图(EEG)是一种记录脑电活动的无创方法,由于其高时间分辨率,在脑功能研究中得到了广泛的应用。然而,原始脑电图是一种混合信号,其中包含与大脑认知功能无关的噪声,如眼伪影(OA)。为了消除脑电信号中的噪声,人们提出了许多方法,如独立分量分析(ICA)、离散小波变换(DWT)、自适应噪声消除(ANC)和小波包变换(WPT)。本文提出了一种基于离散小波变换和卡尔曼滤波的混合去噪方法。首先,我们将该方法应用于模拟数据。DWT-Kalman方法的均方误差(MSE)为0.0017,显著低于WPT-ICA和DWT-ANC方法的结果(分别为0.0468和0.0052)。同时,使用DWT-Kalman的平均绝对误差(Mean Absolute Error, MAE)平均为0.0052,也优于WPT-ICA和DWT-ANC,分别为0.0218和0.0115。然后,我们将所提出的方法应用于我们的原型三通道EEG采集器和来自BRAIN PRODUCTS的64通道Braincap收集的原始数据。在这两个数据上,我们的方法都取得了令人满意的结果。该方法不依赖于任何特定的电极或特定系统中电极的数量,因此推荐用于普遍应用。
{"title":"A method of removing Ocular Artifacts from EEG using Discrete Wavelet Transform and Kalman Filtering","authors":"Yan Chen, Qinglin Zhao, Bin Hu, Jianpeng Li, Hua Jiang, Wenhua Lin, Yang Li, Shuangshuang Zhou, Hong Peng","doi":"10.1109/BIBM.2016.7822742","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822742","url":null,"abstract":"Electroencephalogram (EEG) is a noninvasive method to record electrical activity of brain and it has been used extensively in research of brain function due to its high time resolution. However raw EEG is a mixture of signals, which contains noises such as Ocular Artifact (OA) that is irrelevant to the cognitive function of brain. To remove OAs from EEG, many methods have been proposed, such as Independent Components Analysis (ICA), Discrete Wavelet Transform (DWT), Adaptive Noise Cancellation (ANC) and Wavelet Packet Transform (WPT). In this paper, we present a novel hybrid de-noising method which uses Discrete Wavelet Transform (DWT) and Kalman Filtering to remove OAs in EEG. Firstly, we used this method on simulated data. The Mean Squared Error (MSE) of DWT-Kalman method was 0.0017, significantly lower compared to results using WPT-ICA and DWT-ANC, which were 0.0468 and 0.0052, respectively. Meanwhile, the Mean Absolute Error (MAE) using DWT-Kalman achieved an average of 0.0052, which also performed better than WPT-ICA and DWT-ANC, which were 0.0218 and 0.0115, respectively. Then we applied the proposed approach to the raw data collected by our prototype three-channel EEG collector and 64-channel Braincap from BRAIN PRODUCTS. On both data, our method achieved satisfying results. This method does not rely on any particular electrode or the number of electrodes in certain system, so it is recommended for ubiquitous applications.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Research on early risk predictive model and discriminative feature selection of cancer based on real-world routine physical examination data 基于真实体检数据的癌症早期风险预测模型及判别特征选择研究
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822746
Guixia Kang, Zhuang Ni
most cancers at early stages show no obvious symptoms and curative treatment is not an option any more when cancer is diagnosed. Therefore, making accurate predictions for the risk of early cancer has become urgently necessary in the field of medicine. In this paper, our purpose is to fully utilize real-world routine physical examination data to analyze the most discriminative features of cancer based on ReliefF algorithm and generate early risk predictive model of cancer taking advantage of three machine learning (ML) algorithms. We use physical examination data with a return visit followed 1 month later derived from CiMing Health Checkup Center. The ReliefF algorithm selects the top 30 features written as Sub(30) based on weight value from our data collections consisting of 34 features and 2300 candidates. The 4-layer (2 hidden layers) deep neutral network (DNN) based on B-P algorithm, the support machine vector with the linear kernel and decision tree CART are proposed for predicting the risk of cancer by 5-fold cross validation. We implement these criteria such as predictive accuracy, AUC-ROC, sensitivity and specificity to identify the discriminative ability of three proposed method for cancer. The results show that compared with the other two methods, SVM obtains higher AUC and specificity of 0.926 and 95.27%, respectively. The superior predictive accuracy (86%) is achieved by DNN. Moreover, the fuzzy interval of threshold in DNN is proposed and the sensitivity, specificity and accuracy of DNN is 90.20%, 94.22% and 93.22%, respectively, using the revised threshold interval. The research indicates that the application of ML methods together with risk feature selection based on real-world routine physical examination data is meaningful and promising in the area of cancer prediction.
大多数癌症在早期阶段没有明显的症状,当癌症被诊断出来时,治愈性治疗不再是一种选择。因此,对早期癌症的风险进行准确的预测已成为医学领域的迫切需要。在本文中,我们的目的是充分利用真实世界的常规体检数据,基于ReliefF算法分析癌症最具判别性的特征,并利用三种机器学习(ML)算法生成癌症早期风险预测模型。我们使用慈明健康体检中心1个月后复诊的体检数据。ReliefF算法根据权重值从我们的数据集合(包含34个特征和2300个候选特征)中选择前30个写为Sub(30)的特征。提出了基于B-P算法的4层(2层隐藏层)深度神经网络(DNN)、线性核支持机向量和决策树CART,通过5次交叉验证预测癌症风险。我们运用预测准确度、AUC-ROC、敏感性和特异性等标准来鉴定三种方法对癌症的鉴别能力。结果表明,与其他两种方法相比,SVM的AUC和特异度分别为0.926和95.27%。DNN的预测准确率高达86%。提出了深度神经网络阈值的模糊区间,采用修正后的阈值区间,深度神经网络的灵敏度、特异性和准确性分别为90.20%、94.22%和93.22%。研究表明,将机器学习方法与基于真实常规体检数据的风险特征选择相结合,在癌症预测领域具有重要意义和前景。
{"title":"Research on early risk predictive model and discriminative feature selection of cancer based on real-world routine physical examination data","authors":"Guixia Kang, Zhuang Ni","doi":"10.1109/BIBM.2016.7822746","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822746","url":null,"abstract":"most cancers at early stages show no obvious symptoms and curative treatment is not an option any more when cancer is diagnosed. Therefore, making accurate predictions for the risk of early cancer has become urgently necessary in the field of medicine. In this paper, our purpose is to fully utilize real-world routine physical examination data to analyze the most discriminative features of cancer based on ReliefF algorithm and generate early risk predictive model of cancer taking advantage of three machine learning (ML) algorithms. We use physical examination data with a return visit followed 1 month later derived from CiMing Health Checkup Center. The ReliefF algorithm selects the top 30 features written as Sub(30) based on weight value from our data collections consisting of 34 features and 2300 candidates. The 4-layer (2 hidden layers) deep neutral network (DNN) based on B-P algorithm, the support machine vector with the linear kernel and decision tree CART are proposed for predicting the risk of cancer by 5-fold cross validation. We implement these criteria such as predictive accuracy, AUC-ROC, sensitivity and specificity to identify the discriminative ability of three proposed method for cancer. The results show that compared with the other two methods, SVM obtains higher AUC and specificity of 0.926 and 95.27%, respectively. The superior predictive accuracy (86%) is achieved by DNN. Moreover, the fuzzy interval of threshold in DNN is proposed and the sensitivity, specificity and accuracy of DNN is 90.20%, 94.22% and 93.22%, respectively, using the revised threshold interval. The research indicates that the application of ML methods together with risk feature selection based on real-world routine physical examination data is meaningful and promising in the area of cancer prediction.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133780512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mining sequential patterns from uncertain big DNA in the spark framework 在spark框架中从不确定的大DNA中挖掘序列模式
Pub Date : 2016-12-01 DOI: 10.1109/BIBM.2016.7822641
Fan Jiang, C. Leung, O. Sarumi, Christine Y. Zhang
Big data has become ubiquitous as high volumes of wide varieties of valuable data of different veracities (e.g., precise, imprecise or uncertain data) are made available at a high velocity through fast throughput machines and techniques for data gathering and curation in many real life applications in various domains and application areas such as bioinformatics, biomedicine, finance, social networking, and weather forecasting. In bioinformatics, terabytes of deoxyribonucleic acid (DNA) sequences can now be generated within a few hours with the use of next generation sequencing (NGS) technologies such as Illumina HiSeq X and Illumina Genome Analyzer. Due to the nature of these NGS technologies, generated data are usually inherent with some noise or other forms of error. These uncertain data are embedded with a wealth of information in the form of frequent patterns. Mining frequently occurring patterns (e.g., motifs) from these big uncertain DNA sequences is a challenge in bioinformatics and biomedicine. Many existing algorithms are serial and mine DNA sequence motifs using precise data mining methods. Mining of motifs from big DNA sequences is a computationally intensive task because of the high volume and the associated uncertainty of these DNA sequences. In this paper, we propose a scalable algorithm for high performance computing on bioinformatics. Specifically, our parallel algorithm uses a fault-tolerant collection of resilient distributed datasets (RDDs) in Apache Spark computing framework to mine sequence motifs from uncertain big DNA data. Experimental results show that our algorithm extracts accurate motifs within a short time frame.
在生物信息学、生物医学、金融、社交网络和天气预报等不同领域和应用领域中,通过快速吞吐量的机器和数据收集和管理技术,大量各种不同真实性的有价值数据(例如精确、不精确或不确定数据)以高速提供,大数据已经变得无处不在。在生物信息学领域,使用下一代测序(NGS)技术,如Illumina HiSeq X和Illumina Genome Analyzer,现在可以在几个小时内生成tb级的脱氧核糖核酸(DNA)序列。由于这些NGS技术的性质,生成的数据通常带有一些噪声或其他形式的误差。这些不确定的数据以频繁模式的形式嵌入了丰富的信息。从这些大的不确定DNA序列中挖掘频繁出现的模式(例如,基序)是生物信息学和生物医学的一个挑战。现有的许多算法都是串行的,使用精确的数据挖掘方法来挖掘DNA序列基序。由于这些DNA序列的高容量和相关的不确定性,从大DNA序列中挖掘基序是一项计算密集型的任务。在本文中,我们提出了一种可扩展的生物信息学高性能计算算法。具体来说,我们的并行算法使用Apache Spark计算框架中的弹性分布式数据集(rdd)的容错集合,从不确定的大DNA数据中挖掘序列基序。实验结果表明,该算法能在较短的时间内提取出准确的图案。
{"title":"Mining sequential patterns from uncertain big DNA in the spark framework","authors":"Fan Jiang, C. Leung, O. Sarumi, Christine Y. Zhang","doi":"10.1109/BIBM.2016.7822641","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822641","url":null,"abstract":"Big data has become ubiquitous as high volumes of wide varieties of valuable data of different veracities (e.g., precise, imprecise or uncertain data) are made available at a high velocity through fast throughput machines and techniques for data gathering and curation in many real life applications in various domains and application areas such as bioinformatics, biomedicine, finance, social networking, and weather forecasting. In bioinformatics, terabytes of deoxyribonucleic acid (DNA) sequences can now be generated within a few hours with the use of next generation sequencing (NGS) technologies such as Illumina HiSeq X and Illumina Genome Analyzer. Due to the nature of these NGS technologies, generated data are usually inherent with some noise or other forms of error. These uncertain data are embedded with a wealth of information in the form of frequent patterns. Mining frequently occurring patterns (e.g., motifs) from these big uncertain DNA sequences is a challenge in bioinformatics and biomedicine. Many existing algorithms are serial and mine DNA sequence motifs using precise data mining methods. Mining of motifs from big DNA sequences is a computationally intensive task because of the high volume and the associated uncertainty of these DNA sequences. In this paper, we propose a scalable algorithm for high performance computing on bioinformatics. Specifically, our parallel algorithm uses a fault-tolerant collection of resilient distributed datasets (RDDs) in Apache Spark computing framework to mine sequence motifs from uncertain big DNA data. Experimental results show that our algorithm extracts accurate motifs within a short time frame.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127017972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1