Anna Bernasconi , Alberto García S. , Stefano Ceri , Oscar Pastor
{"title":"PoliViews:基因组数据概念建模的综合模块化方法","authors":"Anna Bernasconi , Alberto García S. , Stefano Ceri , Oscar Pastor","doi":"10.1016/j.datak.2023.102201","DOIUrl":null,"url":null,"abstract":"<div><p>The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a <em>concept-oriented, top-down</em> representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a <em>data-oriented, bottom-up</em> representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a <em>concepts layer</em>, describing genome elements and their conceptual connections, with (2) a <em>data layer</em>, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an <em>intra-data-type</em> integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an <em>inter-data-type</em> integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PoliViews: A comprehensive and modular approach to the conceptual modeling of genomic data\",\"authors\":\"Anna Bernasconi , Alberto García S. , Stefano Ceri , Oscar Pastor\",\"doi\":\"10.1016/j.datak.2023.102201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a <em>concept-oriented, top-down</em> representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a <em>data-oriented, bottom-up</em> representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a <em>concepts layer</em>, describing genome elements and their conceptual connections, with (2) a <em>data layer</em>, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an <em>intra-data-type</em> integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an <em>inter-data-type</em> integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.</p></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X23000617\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23000617","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
PoliViews: A comprehensive and modular approach to the conceptual modeling of genomic data
The human genome complexity is captured by many signals, representing for instance DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data types and formats is used to record these signals. Conceptual models can support the description and explanation of the genome’s elaborate structure and behavior. Among others, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior, which is independent of data formats. The Genomic Conceptual Model (GCM) provides instead a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. In this research, we join the two approaches to achieve PoliViews, a comprehensive model that links (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. Their dynamic connection is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual: data records can be semantically described by high-level concepts exploiting their links and, in turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. PoliViews enables expressing queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. Here, we demonstrate the approach by linking two major genomic data types, namely DNA variation and gene expression. For each type, we consider different eminent data sources; we describe their mapping with the corresponding view in the concepts layer, enabling an intra-data-type integration. Then, leveraging on the connections available in the concepts layer, we show how the distinct data types can be interoperated, enabling an inter-data-type integration. The PoliViews approach is shown through several examples of biological interest and can be further extended to any kind of genomic information.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.