Pub Date : 2023-11-27DOI: 10.1186/s13040-023-00348-8
Guohua Huang, Xiaohong Huang, Wei Luo
DNA N6-adenine methylation (N6-methyladenine, 6mA) plays a key regulating role in the cellular processes. Precisely recognizing 6mA sites is of importance to further explore its biological functions. Although there are many developed computational methods for 6mA site prediction over the past decades, there is a large root left to improve. We presented a cross validation-based stacking ensemble model for 6mA site prediction, called 6mA-StackingCV. The 6mA-StackingCV is a type of meta-learning algorithm, which uses output of cross validation as input to the final classifier. The 6mA-StackingCV reached the state of the art performances in the Rosaceae independent test. Extensive tests demonstrated the stability and the flexibility of the 6mA-StackingCV. We implemented the 6mA-StackingCV as a user-friendly web application, which allows one to restrictively choose representations or learning algorithms. This application is freely available at http://www.biolscience.cn/6mA-stackingCV/ . The source code and experimental data is available at https://github.com/Xiaohong-source/6mA-stackingCV .
DNA n6 -腺嘌呤甲基化(n6 - methylladenine, 6mA)在细胞过程中起着关键的调节作用。准确识别6mA位点对进一步探索其生物学功能具有重要意义。在过去的几十年里,虽然有许多成熟的6mA场址预测计算方法,但仍有很大的改进余地。我们提出了一个基于交叉验证的用于6mA位点预测的堆叠集成模型,称为6mA- stackingcv。6mA-StackingCV是一种元学习算法,它使用交叉验证的输出作为最终分类器的输入。6mA-StackingCV在蔷薇科独立测试中达到了最先进的性能。广泛的测试证明了6mA-StackingCV的稳定性和灵活性。我们将6mA-StackingCV实现为一个用户友好的web应用程序,它允许人们限制性地选择表示或学习算法。该应用程序可在http://www.biolscience.cn/6mA-stackingCV/免费获得。源代码和实验数据可在https://github.com/Xiaohong-source/6mA-stackingCV上获得。
{"title":"6mA-StackingCV: an improved stacking ensemble model for predicting DNA N6-methyladenine site.","authors":"Guohua Huang, Xiaohong Huang, Wei Luo","doi":"10.1186/s13040-023-00348-8","DOIUrl":"10.1186/s13040-023-00348-8","url":null,"abstract":"<p><p>DNA N6-adenine methylation (N6-methyladenine, 6mA) plays a key regulating role in the cellular processes. Precisely recognizing 6mA sites is of importance to further explore its biological functions. Although there are many developed computational methods for 6mA site prediction over the past decades, there is a large root left to improve. We presented a cross validation-based stacking ensemble model for 6mA site prediction, called 6mA-StackingCV. The 6mA-StackingCV is a type of meta-learning algorithm, which uses output of cross validation as input to the final classifier. The 6mA-StackingCV reached the state of the art performances in the Rosaceae independent test. Extensive tests demonstrated the stability and the flexibility of the 6mA-StackingCV. We implemented the 6mA-StackingCV as a user-friendly web application, which allows one to restrictively choose representations or learning algorithms. This application is freely available at http://www.biolscience.cn/6mA-stackingCV/ . The source code and experimental data is available at https://github.com/Xiaohong-source/6mA-stackingCV .</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"34"},"PeriodicalIF":4.5,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680251/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138446729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-25DOI: 10.1186/s13040-023-00350-0
Massimiliano Datres, Elisa Paolazzi, Marco Chierici, Matteo Pozzi, Antonio Colangelo, Marcello Dorian Donzella, Giuseppe Jurman
Background: Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements such as reproducibility, interpretability, reduced computational workload, bias-free modeling and careful image preprocessing.
Results: First, an automatic preprocessing procedure is devised, aimed to remove artifacts from clinical data, feeding then the resulting images to an aggregated per-patient model to mimic the clinicians decision process. The predictions are based on multiple snapshots obtained through resampling, reducing the risk of misleading outcomes by removing the low confidence predictions. Each patient's outcome is explained by returning the images the prediction is based upon, supporting clinicians in verifying diagnoses without the need for evaluating the full set of endoscopic images. As a major theoretical contribution, quantization is employed to reduce the complexity and the computational cost of the model, allowing its deployment on small power devices with an almost negligible 3% performance degradation. Such quantization procedure holds relevance not only in the context of per-patient models but also for assessing its feasibility in providing real-time support to clinicians even in low-resources environments. The pipeline is demonstrated on a private dataset of endoscopic images of 758 IBD patients and 601 healthy controls, achieving Matthews Correlation Coefficient 0.9 as top performance on test set.
Conclusion: We highlighted how a comprehensive pre-processing pipeline plays a crucial role in identifying and removing artifacts from data, solving one of the principal challenges encountered when working with clinical data. Furthermore, we constructively showed how it is possible to emulate clinicians decision process and how it offers significant advantages, particularly in terms of explainability and trust within the healthcare context. Last but not least, we proved that quantization can be a useful tool to reduce the time and resources consumption with an acceptable degradation of the model performs. The quantization study proposed in this work points up the potential development of real-time quantized algorithms as valuable tools to support clinicians during endoscopy procedures.
{"title":"Endoscopy-based IBD identification by a quantized deep learning pipeline.","authors":"Massimiliano Datres, Elisa Paolazzi, Marco Chierici, Matteo Pozzi, Antonio Colangelo, Marcello Dorian Donzella, Giuseppe Jurman","doi":"10.1186/s13040-023-00350-0","DOIUrl":"10.1186/s13040-023-00350-0","url":null,"abstract":"<p><strong>Background: </strong>Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements such as reproducibility, interpretability, reduced computational workload, bias-free modeling and careful image preprocessing.</p><p><strong>Results: </strong>First, an automatic preprocessing procedure is devised, aimed to remove artifacts from clinical data, feeding then the resulting images to an aggregated per-patient model to mimic the clinicians decision process. The predictions are based on multiple snapshots obtained through resampling, reducing the risk of misleading outcomes by removing the low confidence predictions. Each patient's outcome is explained by returning the images the prediction is based upon, supporting clinicians in verifying diagnoses without the need for evaluating the full set of endoscopic images. As a major theoretical contribution, quantization is employed to reduce the complexity and the computational cost of the model, allowing its deployment on small power devices with an almost negligible 3% performance degradation. Such quantization procedure holds relevance not only in the context of per-patient models but also for assessing its feasibility in providing real-time support to clinicians even in low-resources environments. The pipeline is demonstrated on a private dataset of endoscopic images of 758 IBD patients and 601 healthy controls, achieving Matthews Correlation Coefficient 0.9 as top performance on test set.</p><p><strong>Conclusion: </strong>We highlighted how a comprehensive pre-processing pipeline plays a crucial role in identifying and removing artifacts from data, solving one of the principal challenges encountered when working with clinical data. Furthermore, we constructively showed how it is possible to emulate clinicians decision process and how it offers significant advantages, particularly in terms of explainability and trust within the healthcare context. Last but not least, we proved that quantization can be a useful tool to reduce the time and resources consumption with an acceptable degradation of the model performs. The quantization study proposed in this work points up the potential development of real-time quantized algorithms as valuable tools to support clinicians during endoscopy procedures.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"33"},"PeriodicalIF":4.5,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138435274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-15DOI: 10.1186/s13040-023-00349-7
Sana Munquad, Asim Bikas Das
Background and objective: The classification of glioma subtypes is essential for precision therapy. Due to the heterogeneity of gliomas, the subtype-specific molecular pattern can be captured by integrating and analyzing high-throughput omics data from different genomic layers. The development of a deep-learning framework enables the integration of multi-omics data to classify the glioma subtypes to support the clinical diagnosis.
Results: Transcriptome and methylome data of glioma patients were preprocessed, and differentially expressed features from both datasets were identified. Subsequently, a Cox regression analysis determined genes and CpGs associated with survival. Gene set enrichment analysis was carried out to examine the biological significance of the features. Further, we identified CpG and gene pairs by mapping them in the promoter region of corresponding genes. The methylation and gene expression levels of these CpGs and genes were embedded in a lower-dimensional space with an autoencoder. Next, ANN and CNN were used to classify subtypes using the latent features from embedding space. CNN performs better than ANN for subtyping lower-grade gliomas (LGG) and glioblastoma multiforme (GBM). The subtyping accuracy of CNN was 98.03% (± 0.06) and 94.07% (± 0.01) in LGG and GBM, respectively. The precision of the models was 97.67% in LGG and 90.40% in GBM. The model sensitivity was 96.96% in LGG and 91.18% in GBM. Additionally, we observed the superior performance of CNN with external datasets. The genes and CpGs pairs used to develop the model showed better performance than the random CpGs-gene pairs, preprocessed data, and single omics data.
Conclusions: The current study showed that a novel feature selection and data integration strategy led to the development of DeepAutoGlioma, an effective framework for diagnosing glioma subtypes.
{"title":"DeepAutoGlioma: a deep learning autoencoder-based multi-omics data integration and classification tools for glioma subtyping.","authors":"Sana Munquad, Asim Bikas Das","doi":"10.1186/s13040-023-00349-7","DOIUrl":"10.1186/s13040-023-00349-7","url":null,"abstract":"<p><strong>Background and objective: </strong>The classification of glioma subtypes is essential for precision therapy. Due to the heterogeneity of gliomas, the subtype-specific molecular pattern can be captured by integrating and analyzing high-throughput omics data from different genomic layers. The development of a deep-learning framework enables the integration of multi-omics data to classify the glioma subtypes to support the clinical diagnosis.</p><p><strong>Results: </strong>Transcriptome and methylome data of glioma patients were preprocessed, and differentially expressed features from both datasets were identified. Subsequently, a Cox regression analysis determined genes and CpGs associated with survival. Gene set enrichment analysis was carried out to examine the biological significance of the features. Further, we identified CpG and gene pairs by mapping them in the promoter region of corresponding genes. The methylation and gene expression levels of these CpGs and genes were embedded in a lower-dimensional space with an autoencoder. Next, ANN and CNN were used to classify subtypes using the latent features from embedding space. CNN performs better than ANN for subtyping lower-grade gliomas (LGG) and glioblastoma multiforme (GBM). The subtyping accuracy of CNN was 98.03% (± 0.06) and 94.07% (± 0.01) in LGG and GBM, respectively. The precision of the models was 97.67% in LGG and 90.40% in GBM. The model sensitivity was 96.96% in LGG and 91.18% in GBM. Additionally, we observed the superior performance of CNN with external datasets. The genes and CpGs pairs used to develop the model showed better performance than the random CpGs-gene pairs, preprocessed data, and single omics data.</p><p><strong>Conclusions: </strong>The current study showed that a novel feature selection and data integration strategy led to the development of DeepAutoGlioma, an effective framework for diagnosing glioma subtypes.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"32"},"PeriodicalIF":4.5,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652591/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134650252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The governance of health systems is complex in nature due to several intertwined and multi-dimensional factors contributing to it. Recent challenges of health systems reflect the need for innovative approaches that can minimize adverse consequences of policies. Hence, there is compelling evidence of a distinct outlook on the health ecosystem using artificial intelligence (AI). Therefore, this study aimed to investigate the roles of AI and its applications in health system governance through an interpretive scoping review of current evidence.
Method: This study intended to offer a research agenda and framework for the applications of AI in health systems governance. To include shreds of evidence with a greater focus on the application of AI in health governance from different perspectives, we searched the published literature from 2000 to 2023 through PubMed, Scopus, and Web of Science Databases.
Results: Our findings showed that integrating AI capabilities into health systems governance has the potential to influence three cardinal dimensions of health. These include social determinants of health, elements of governance, and health system tasks and goals. AI paves the way for strengthening the health system's governance through various aspects, i.e., intelligence innovations, flexible boundaries, multidimensional analysis, new insights, and cognition modifications to the health ecosystem area.
Conclusion: AI is expected to be seen as a tool with new applications and capabilities, with the potential to change each component of governance in the health ecosystem, which can eventually help achieve health-related goals.
背景:卫生系统的治理本质上是复杂的,这是由几个相互交织的多维度因素造成的。卫生系统最近面临的挑战反映出需要创新的方法,以最大限度地减少政策的不利后果。因此,有令人信服的证据表明,使用人工智能对健康生态系统有着独特的看法。因此,本研究旨在通过对现有证据的解释性范围审查,调查人工智能及其在卫生系统治理中的作用。方法:本研究旨在为人工智能在卫生系统治理中的应用提供一个研究议程和框架。为了从不同角度纳入更多关于人工智能在卫生治理中应用的证据,我们通过PubMed、Scopus和Web of Science数据库搜索了2000年至2023年发表的文献。结果:我们的研究结果表明,将人工智能能力融入卫生系统治理有可能影响健康的三个基本维度。其中包括健康的社会决定因素、治理要素以及卫生系统的任务和目标。人工智能通过各个方面为加强卫生系统的治理铺平了道路,即智能创新、灵活的边界、多维分析、新的见解和对卫生生态系统领域的认知修改。结论:人工智能有望被视为一种具有新应用和能力的工具,有可能改变健康生态系统中治理的每个组成部分,最终有助于实现与健康相关的目标。
{"title":"Research agenda for using artificial intelligence in health governance: interpretive scoping review and framework.","authors":"Maryam Ramezani, Amirhossein Takian, Ahad Bakhtiari, Hamid R Rabiee, Sadegh Ghazanfari, Saharnaz Sazgarnejad","doi":"10.1186/s13040-023-00346-w","DOIUrl":"10.1186/s13040-023-00346-w","url":null,"abstract":"<p><strong>Background: </strong>The governance of health systems is complex in nature due to several intertwined and multi-dimensional factors contributing to it. Recent challenges of health systems reflect the need for innovative approaches that can minimize adverse consequences of policies. Hence, there is compelling evidence of a distinct outlook on the health ecosystem using artificial intelligence (AI). Therefore, this study aimed to investigate the roles of AI and its applications in health system governance through an interpretive scoping review of current evidence.</p><p><strong>Method: </strong>This study intended to offer a research agenda and framework for the applications of AI in health systems governance. To include shreds of evidence with a greater focus on the application of AI in health governance from different perspectives, we searched the published literature from 2000 to 2023 through PubMed, Scopus, and Web of Science Databases.</p><p><strong>Results: </strong>Our findings showed that integrating AI capabilities into health systems governance has the potential to influence three cardinal dimensions of health. These include social determinants of health, elements of governance, and health system tasks and goals. AI paves the way for strengthening the health system's governance through various aspects, i.e., intelligence innovations, flexible boundaries, multidimensional analysis, new insights, and cognition modifications to the health ecosystem area.</p><p><strong>Conclusion: </strong>AI is expected to be seen as a tool with new applications and capabilities, with the potential to change each component of governance in the health ecosystem, which can eventually help achieve health-related goals.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"31"},"PeriodicalIF":4.5,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10617108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71414915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.1186/s13040-023-00347-9
Qian Yan, Wenjiang Zheng, Boqing Wang, Baoqian Ye, Huiyan Luo, Xinqian Yang, Ping Zhang, Xiongwen Wang
{"title":"Correction: A prognostic model based on seven immune-related genes predicts the overall survival of patients with hepatocellular carcinoma.","authors":"Qian Yan, Wenjiang Zheng, Boqing Wang, Baoqian Ye, Huiyan Luo, Xinqian Yang, Ping Zhang, Xiongwen Wang","doi":"10.1186/s13040-023-00347-9","DOIUrl":"10.1186/s13040-023-00347-9","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"30"},"PeriodicalIF":4.5,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10605871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54231823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Patients with Type 2 Diabetes Mellitus (T2DM) are at a higher risk of polypharmacy and more susceptible to irrational prescriptions; therefore, pharmacological therapy patterns are important to be monitored. The primary objective of this study was to highlight current prescription patterns in T2DM patients and compare them with existing Standards of Medical Care in Diabetes. The second objective was to analyze whether age and gender affect prescription patterns.
Method: This cross-sectional study was conducted using the Iran Health Insurance Organization (IHIO) prescription database. It was mined by an Association Rule Mining (ARM) technique, FP-Growth, in order to find co-prescribed drugs with anti-diabetic medications. The algorithm was implemented at different levels of the Anatomical Therapeutic Chemical (ATC) classification system, which assigns different codes to drugs based on their anatomy, pharmacological, therapeutic, and chemical properties to provide an in-depth analysis of co-prescription patterns.
Results: Altogether, the prescriptions of 914,652 patients were analyzed, of whom 91,505 were found to have diabetes. According to our results, prescribing Lipid Modifying Agents (C10) (56.3%), Agents Acting on The Renin-Angiotensin System (C09) (48.9%), Antithrombotic Agents (B01) (35.7%), and Beta Blocking Agents (C07) (30.1%) were meaningfully associated with the prescription of Drugs Used in Diabetes. Our study also revealed that female diabetic patients have a higher lift for taking Thyroid Preparations, and the older the patients were, the more they were prone to take neuropathy-related medications. Additionally, the results suggest that there are gender differences in the association between aspirin and diabetes drugs, with the differences becoming less pronounced in old age.
Conclusions: Almost all of the association rules found in this research were clinically meaningful, proving the potential of ARM for co-prescription pattern discovery. Moreover, implementing level-based ARM was effective in detecting difficult-to-spot rules. Additionally, the majority of drugs prescribed by physicians were consistent with the Standards of Medical Care in Diabetes.
{"title":"Prescription pattern analysis of Type 2 Diabetes Mellitus: a cross-sectional study in Isfahan, Iran.","authors":"Elnaz Ziad, Somayeh Sadat, Farshad Farzadfar, Mohammad-Reza Malekpour","doi":"10.1186/s13040-023-00344-y","DOIUrl":"10.1186/s13040-023-00344-y","url":null,"abstract":"<p><strong>Background: </strong>Patients with Type 2 Diabetes Mellitus (T2DM) are at a higher risk of polypharmacy and more susceptible to irrational prescriptions; therefore, pharmacological therapy patterns are important to be monitored. The primary objective of this study was to highlight current prescription patterns in T2DM patients and compare them with existing Standards of Medical Care in Diabetes. The second objective was to analyze whether age and gender affect prescription patterns.</p><p><strong>Method: </strong>This cross-sectional study was conducted using the Iran Health Insurance Organization (IHIO) prescription database. It was mined by an Association Rule Mining (ARM) technique, FP-Growth, in order to find co-prescribed drugs with anti-diabetic medications. The algorithm was implemented at different levels of the Anatomical Therapeutic Chemical (ATC) classification system, which assigns different codes to drugs based on their anatomy, pharmacological, therapeutic, and chemical properties to provide an in-depth analysis of co-prescription patterns.</p><p><strong>Results: </strong>Altogether, the prescriptions of 914,652 patients were analyzed, of whom 91,505 were found to have diabetes. According to our results, prescribing Lipid Modifying Agents (C10) (56.3%), Agents Acting on The Renin-Angiotensin System (C09) (48.9%), Antithrombotic Agents (B01) (35.7%), and Beta Blocking Agents (C07) (30.1%) were meaningfully associated with the prescription of Drugs Used in Diabetes. Our study also revealed that female diabetic patients have a higher lift for taking Thyroid Preparations, and the older the patients were, the more they were prone to take neuropathy-related medications. Additionally, the results suggest that there are gender differences in the association between aspirin and diabetes drugs, with the differences becoming less pronounced in old age.</p><p><strong>Conclusions: </strong>Almost all of the association rules found in this research were clinically meaningful, proving the potential of ARM for co-prescription pattern discovery. Moreover, implementing level-based ARM was effective in detecting difficult-to-spot rules. Additionally, the majority of drugs prescribed by physicians were consistent with the Standards of Medical Care in Diabetes.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"29"},"PeriodicalIF":4.5,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588025/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49683949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-09DOI: 10.1186/s13040-023-00345-x
Zhenxiang He, Xiaoxia Li, Yuling Chen, Nianzu Lv, Yong Cai
Automatic segmentation of skin lesions is a critical step in Computer Aided Diagnosis (CAD) of melanoma. However, due to the blurring of the lesion boundary, uneven color distribution, and low image contrast, resulting in poor segmentation result. Aiming at the problem of difficult segmentation of skin lesions, this paper proposes an Attention-based Dual-path Feature Fusion Network (ADFFNet) for automatic skin lesion segmentation. Firstly, in the spatial path, a Boundary Refinement (BR) module is designed for the output of low-level features to filter out irrelevant background information and retain more boundary details of the lesion area. Secondly, in the context path, a Multi-scale Feature Selection (MFS) module is constructed for high-level feature output to capture multi-scale context information and use the attention mechanism to filter out redundant semantic information. Finally, we design a Dual-path Feature Fusion (DFF) module, which uses high-level global attention information to guide the step-by-step fusion of high-level semantic features and low-level detail features, which is beneficial to restore image detail information and further improve the pixel-level segmentation accuracy of skin lesion. In the experiment, the ISIC 2018 and PH2 datasets are employed to evaluate the effectiveness of the proposed method. It achieves a performance of 0.890/ 0.925 and 0.933 /0.954 on the F1-score and SE index, respectively. Comparative analysis with state-of-the-art segmentation methods reveals that the ADFFNet algorithm exhibits superior segmentation performance.
{"title":"Attention-based dual-path feature fusion network for automatic skin lesion segmentation.","authors":"Zhenxiang He, Xiaoxia Li, Yuling Chen, Nianzu Lv, Yong Cai","doi":"10.1186/s13040-023-00345-x","DOIUrl":"10.1186/s13040-023-00345-x","url":null,"abstract":"<p><p>Automatic segmentation of skin lesions is a critical step in Computer Aided Diagnosis (CAD) of melanoma. However, due to the blurring of the lesion boundary, uneven color distribution, and low image contrast, resulting in poor segmentation result. Aiming at the problem of difficult segmentation of skin lesions, this paper proposes an Attention-based Dual-path Feature Fusion Network (ADFFNet) for automatic skin lesion segmentation. Firstly, in the spatial path, a Boundary Refinement (BR) module is designed for the output of low-level features to filter out irrelevant background information and retain more boundary details of the lesion area. Secondly, in the context path, a Multi-scale Feature Selection (MFS) module is constructed for high-level feature output to capture multi-scale context information and use the attention mechanism to filter out redundant semantic information. Finally, we design a Dual-path Feature Fusion (DFF) module, which uses high-level global attention information to guide the step-by-step fusion of high-level semantic features and low-level detail features, which is beneficial to restore image detail information and further improve the pixel-level segmentation accuracy of skin lesion. In the experiment, the ISIC 2018 and PH2 datasets are employed to evaluate the effectiveness of the proposed method. It achieves a performance of 0.890/ 0.925 and 0.933 /0.954 on the F1-score and SE index, respectively. Comparative analysis with state-of-the-art segmentation methods reveals that the ADFFNet algorithm exhibits superior segmentation performance.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"28"},"PeriodicalIF":4.5,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561442/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41155445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-06DOI: 10.1186/s13040-023-00343-z
Naya Nagy, Matthew Stuart-Edwards, Marius Nagy, Liam Mitchell, Athanasios Zovoilis
Squiggle data is the numerical output of DNA and RNA sequencing by the Nanopore next generation sequencing platform. Nanopore sequencing offers expanded applications compared to previous sequencing techniques but produces a large amount of data in the form of current measurements over time. The analysis of these segments of current measurements require more complex and computationally intensive algorithms than previous sequencing technologies. The purpose of this study is to investigate in principle the potential of using quantum computers to speed up Nanopore data analysis. Quantum circuits are designed to extract major features of squiggle current measurements. The circuits are analyzed theoretically in terms of size and performance. Practical experiments on IBM QX show the limitations of the state of the art quantum computer to tackle real life squiggle data problems. Nevertheless, pre-processing of the squiggle data using the inverse wavelet transform, as experimented and analyzed in this paper as well, reduces the dimensionality of the problem in order to fit a reasonable size quantum computer in the hopefully near future.
{"title":"Quantum analysis of squiggle data.","authors":"Naya Nagy, Matthew Stuart-Edwards, Marius Nagy, Liam Mitchell, Athanasios Zovoilis","doi":"10.1186/s13040-023-00343-z","DOIUrl":"10.1186/s13040-023-00343-z","url":null,"abstract":"<p><p>Squiggle data is the numerical output of DNA and RNA sequencing by the Nanopore next generation sequencing platform. Nanopore sequencing offers expanded applications compared to previous sequencing techniques but produces a large amount of data in the form of current measurements over time. The analysis of these segments of current measurements require more complex and computationally intensive algorithms than previous sequencing technologies. The purpose of this study is to investigate in principle the potential of using quantum computers to speed up Nanopore data analysis. Quantum circuits are designed to extract major features of squiggle current measurements. The circuits are analyzed theoretically in terms of size and performance. Practical experiments on IBM QX show the limitations of the state of the art quantum computer to tackle real life squiggle data problems. Nevertheless, pre-processing of the squiggle data using the inverse wavelet transform, as experimented and analyzed in this paper as well, reduces the dimensionality of the problem in order to fit a reasonable size quantum computer in the hopefully near future.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"27"},"PeriodicalIF":4.5,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41135068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-26DOI: 10.1186/s13040-023-00341-1
Sofia Martins, Roberta Coletti, Marta B Lopes
Gliomas are primary malignant brain tumors with poor survival and high resistance to available treatments. Improving the molecular understanding of glioma and disclosing novel biomarkers of tumor development and progression could help to find novel targeted therapies for this type of cancer. Public databases such as The Cancer Genome Atlas (TCGA) provide an invaluable source of molecular information on cancer tissues. Machine learning tools show promise in dealing with the high dimension of omics data and extracting relevant information from it. In this work, network inference and clustering methods, namely Joint Graphical lasso and Robust Sparse K-means Clustering, were applied to RNA-sequencing data from TCGA glioma patients to identify shared and distinct gene networks among different types of glioma (glioblastoma, astrocytoma, and oligodendroglioma) and disclose new patient groups and the relevant genes behind groups' separation. The results obtained suggest that astrocytoma and oligodendroglioma have more similarities compared with glioblastoma, highlighting the molecular differences between glioblastoma and the others glioma subtypes. After a comprehensive literature search on the relevant genes pointed our from our analysis, we identified potential candidates for biomarkers of glioma. Further molecular validation of these genes is encouraged to understand their potential role in diagnosis and in the design of novel therapies.
{"title":"Disclosing transcriptomics network-based signatures of glioma heterogeneity using sparse methods.","authors":"Sofia Martins, Roberta Coletti, Marta B Lopes","doi":"10.1186/s13040-023-00341-1","DOIUrl":"10.1186/s13040-023-00341-1","url":null,"abstract":"<p><p>Gliomas are primary malignant brain tumors with poor survival and high resistance to available treatments. Improving the molecular understanding of glioma and disclosing novel biomarkers of tumor development and progression could help to find novel targeted therapies for this type of cancer. Public databases such as The Cancer Genome Atlas (TCGA) provide an invaluable source of molecular information on cancer tissues. Machine learning tools show promise in dealing with the high dimension of omics data and extracting relevant information from it. In this work, network inference and clustering methods, namely Joint Graphical lasso and Robust Sparse K-means Clustering, were applied to RNA-sequencing data from TCGA glioma patients to identify shared and distinct gene networks among different types of glioma (glioblastoma, astrocytoma, and oligodendroglioma) and disclose new patient groups and the relevant genes behind groups' separation. The results obtained suggest that astrocytoma and oligodendroglioma have more similarities compared with glioblastoma, highlighting the molecular differences between glioblastoma and the others glioma subtypes. After a comprehensive literature search on the relevant genes pointed our from our analysis, we identified potential candidates for biomarkers of glioma. Further molecular validation of these genes is encouraged to understand their potential role in diagnosis and in the design of novel therapies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"26"},"PeriodicalIF":4.5,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10523751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41161853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-04DOI: 10.1186/s13040-023-00342-0
John T Gregg, Jason H Moore
<p><p>There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some algorithms model skew, even fewer model kurtosis, and none of them model bimodality and monotonicity. To overcome these challenges, we have implemented an algorithm for Skew and Tail-heaviness Adjusted Removal of Outliers (STAR_outliers) that robustly removes univariate outliers from distributions with many different shape profiles, including extreme skew, extreme kurtosis, bimodality, and monotonicity. We show that STAR_outliers removes simulated outliers with greater recall and precision than several general algorithms, and it also models the outlier bounds of real data distributions with greater accuracy.Background Reliably removing univariate outliers from arbitrarily shaped distributions is a difficult task. Incorrectly assuming unimodality or overestimating tail heaviness fails to remove outliers, while underestimating tail heaviness incorrectly removes regular data from the tails. Skew often produces one heavy tail and one light tail, and we show that several sophisticated outlier removal algorithms often fail to remove outliers from the light tail. Multivariate outlier detection algorithms have recently become popular, but having tested PyOD's multivariate outlier removal algorithms, we found them to be inadequate for univariate outlier removal. They usually do not allow for univariate input, and they do not fit their distributions of outliership scores with a model on which an outlier threshold can be accurately established. Thus, there is a need for a flexible outlier removal algorithm that can model arbitrarily shaped univariate distributions.Results In order to effectively model arbitrarily shaped univariate distributions, we have combined several well-established algorithms into a new algorithm called STAR_outliers. STAR_outliers removes more simulated true outliers and fewer non-outliers than several other univariate algorithms. These include several normality-assuming outlier removal methods, PyOD's isolation forest (IF) outlier removal algorithm (ACM Transactions on Knowledge Discovery from Data (TKDD) 6:3, 2012) with default settings, and an IQR based algorithm by Verardi and Vermandele that removes outliers while accounting for skew and kurtosis (Verardi and Vermandele, Journal de la Société Française de Statistique 157:90-114, 2016). Since the IF algorithm's default model poorly fit the outliership scores, we also compared the isolation forest algorithm with a model that entails removing as many datapoints as STAR_outliers does in order of decreasing outliership scores. We also compared these algorithms on the publicly available 2018 National Health and Nutrition Examination Survey (NHANES) data by setting the outlier threshold to keep values falling within the main 99.3 percent of the fitted model's domain. We show that our STAR_outliers algorithm removes signif
目前还没有任何单变量离群点检测算法,可以对任意形状的分布进行变换和建模,以去除单变量离群点。有些算法建模偏态,甚至更少建模峰度,没有一个算法建模双峰性和单调性。为了克服这些挑战,我们实现了一种针对偏度和尾重调整的异常值去除(STAR_outliers)的算法,该算法可以从具有许多不同形状轮廓的分布中稳健地去除单变量异常值,包括极端偏度、极端峰度、双峰性和单调性。我们表明,STAR_outliers比几种通用算法具有更高的召回率和精度来去除模拟的异常值,并且它还以更高的精度建模真实数据分布的异常边界。从任意形状的分布中可靠地去除单变量异常值是一项艰巨的任务。错误地假设单峰性或高估尾重不能去除异常值,而低估尾重则错误地从尾部去除常规数据。偏态通常会产生一条重尾和一条轻尾,我们表明一些复杂的离群值去除算法通常不能从轻尾中去除离群值。多元离群值检测算法最近变得很流行,但在测试了PyOD的多元离群值去除算法后,我们发现它们对于单变量离群值去除是不够的。它们通常不允许单变量输入,并且它们的异常值分数分布不能与可以准确建立异常值阈值的模型相拟合。因此,需要一种灵活的离群值去除算法来模拟任意形状的单变量分布。为了有效地模拟任意形状的单变量分布,我们将几种成熟的算法组合成一个名为STAR_outliers的新算法。与其他几种单变量算法相比,STAR_outliers删除了更多模拟的真实异常值和更少的非异常值。其中包括几种假设正态性的离群值去除方法,PyOD的隔离森林(IF)离群值去除算法(ACM Transactions on Knowledge Discovery from Data (TKDD) 6:3, 2012)默认设置,以及Verardi和Vermandele基于IQR的算法,该算法在考虑偏态和峰度的同时去除离群值(Verardi和Vermandele, Journal de la sociacims francalaise de statisque 157:90-114, 2016)。由于IF算法的默认模型不能很好地拟合离群值得分,因此我们还将隔离森林算法与一个模型进行了比较,该模型需要删除尽可能多的数据点,如STAR_outliers按照离群值得分的递减顺序删除。我们还将这些算法与公开的2018年国家健康和营养检查调查(NHANES)数据进行了比较,设置了异常值阈值,使数值落在拟合模型域的99.3%以内。我们发现,平均而言,我们的STAR_outliers算法比其他离群值去除方法从这些特征中去除的值明显接近0.7%。STAR_outliers是一个易于实现的python包,用于去除异常值,优于多种常用的单变量异常值去除方法。
{"title":"STAR_outliers: a python package that separates univariate outliers from non-normal distributions.","authors":"John T Gregg, Jason H Moore","doi":"10.1186/s13040-023-00342-0","DOIUrl":"10.1186/s13040-023-00342-0","url":null,"abstract":"<p><p>There are not currently any univariate outlier detection algorithms that transform and model arbitrarily shaped distributions to remove univariate outliers. Some algorithms model skew, even fewer model kurtosis, and none of them model bimodality and monotonicity. To overcome these challenges, we have implemented an algorithm for Skew and Tail-heaviness Adjusted Removal of Outliers (STAR_outliers) that robustly removes univariate outliers from distributions with many different shape profiles, including extreme skew, extreme kurtosis, bimodality, and monotonicity. We show that STAR_outliers removes simulated outliers with greater recall and precision than several general algorithms, and it also models the outlier bounds of real data distributions with greater accuracy.Background Reliably removing univariate outliers from arbitrarily shaped distributions is a difficult task. Incorrectly assuming unimodality or overestimating tail heaviness fails to remove outliers, while underestimating tail heaviness incorrectly removes regular data from the tails. Skew often produces one heavy tail and one light tail, and we show that several sophisticated outlier removal algorithms often fail to remove outliers from the light tail. Multivariate outlier detection algorithms have recently become popular, but having tested PyOD's multivariate outlier removal algorithms, we found them to be inadequate for univariate outlier removal. They usually do not allow for univariate input, and they do not fit their distributions of outliership scores with a model on which an outlier threshold can be accurately established. Thus, there is a need for a flexible outlier removal algorithm that can model arbitrarily shaped univariate distributions.Results In order to effectively model arbitrarily shaped univariate distributions, we have combined several well-established algorithms into a new algorithm called STAR_outliers. STAR_outliers removes more simulated true outliers and fewer non-outliers than several other univariate algorithms. These include several normality-assuming outlier removal methods, PyOD's isolation forest (IF) outlier removal algorithm (ACM Transactions on Knowledge Discovery from Data (TKDD) 6:3, 2012) with default settings, and an IQR based algorithm by Verardi and Vermandele that removes outliers while accounting for skew and kurtosis (Verardi and Vermandele, Journal de la Société Française de Statistique 157:90-114, 2016). Since the IF algorithm's default model poorly fit the outliership scores, we also compared the isolation forest algorithm with a model that entails removing as many datapoints as STAR_outliers does in order of decreasing outliership scores. We also compared these algorithms on the publicly available 2018 National Health and Nutrition Examination Survey (NHANES) data by setting the outlier threshold to keep values falling within the main 99.3 percent of the fitted model's domain. We show that our STAR_outliers algorithm removes signif","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"25"},"PeriodicalIF":4.5,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10476292/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10166430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}