Sisi Shao, Pedro Henrique Ribeiro, Christina M Ramirez, Jason H Moore
{"title":"A review of feature selection strategies utilizing graph data structures and Knowledge Graphs.","authors":"Sisi Shao, Pedro Henrique Ribeiro, Christina M Ramirez, Jason H Moore","doi":"10.1093/bib/bbae521","DOIUrl":null,"url":null,"abstract":"<p><p>Feature selection in Knowledge Graphs (KGs) is increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection (FS) within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in FS for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in FS techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG FS, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic FS algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11551862/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae521","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Feature selection in Knowledge Graphs (KGs) is increasingly utilized in diverse domains, including biomedical research, Natural Language Processing (NLP), and personalized recommendation systems. This paper delves into the methodologies for feature selection (FS) within KGs, emphasizing their roles in enhancing machine learning (ML) model efficacy, hypothesis generation, and interpretability. Through this comprehensive review, we aim to catalyze further innovation in FS for KGs, paving the way for more insightful, efficient, and interpretable analytical models across various domains. Our exploration reveals the critical importance of scalability, accuracy, and interpretability in FS techniques, advocating for the integration of domain knowledge to refine the selection process. We highlight the burgeoning potential of multi-objective optimization and interdisciplinary collaboration in advancing KG FS, underscoring the transformative impact of such methodologies on precision medicine, among other fields. The paper concludes by charting future directions, including the development of scalable, dynamic FS algorithms and the integration of explainable AI principles to foster transparency and trust in KG-driven models.
知识图谱(KG)中的特征选择越来越多地应用于各种领域,包括生物医学研究、自然语言处理(NLP)和个性化推荐系统。本文深入探讨了知识图谱中的特征选择(FS)方法,强调了它们在提高机器学习(ML)模型效率、假设生成和可解释性方面的作用。通过这一全面回顾,我们旨在促进 KGs 特征选择的进一步创新,为在各个领域建立更有洞察力、更高效、更可解释的分析模型铺平道路。我们的探索揭示了可行性分析技术中可扩展性、准确性和可解释性的极端重要性,倡导整合领域知识来完善选择过程。我们强调了多目标优化和跨学科合作在推进 KG FS 方面蓬勃发展的潜力,强调了此类方法对精准医疗等领域的变革性影响。论文最后描绘了未来的发展方向,包括开发可扩展的动态 FS 算法,以及整合可解释的人工智能原则,以提高 KG 驱动模型的透明度和信任度。
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.