{"title":"利用连续变量离散化进行分类数据分析,研究海洋生态系统的关联性","authors":"Hiroko Kato Solvang, Shinpei Imori, Martin Biuw, Ulf Lindstrøm, Tore Haug","doi":"10.1002/env.2867","DOIUrl":null,"url":null,"abstract":"<p>Understanding and predicting interactions between predators and prey and their environment are fundamental for understanding food web structure, dynamics, and ecosystem function in both terrestrial and marine ecosystems. Thus, estimating the conditional associations between species and their environments is important for exploring connections or cooperative links in the ecosystem, which in turn can help to clarify such directional relationships. For this purpose, a relevant and practical statistical method is required to link presence/absence observations with biomass, abundance, and physical quantities obtained as continuous real values. These data are sometimes sparse in oceanic space and too short as time series data. To meet this challenge, we provide an approach based on applying categorical data analysis to present/absent observations and real-number data. The real-number data used as explanatory variables for the present/absent response variable are discretized based on the optimal detection of thresholds without any prior biological/ecological information. These discretized data express two different levels, such as large/small or high/low, which give experts a simple interpretation for investigating complicated associations in marine ecosystems. This approach is implemented in the previous statistical method called CATDAP developed by Sakamoto and Akaike in 1979. Our proposed approach consists of a two-step procedure for categorical data analysis: (1) finding the appropriate threshold to discretize the real-number data for applying an independent test; and (2) identifying the best conditional probability model to investigate the possible associations among the data based on a statistical information criterion. We perform a simulation study to validate our proposed approach and investigate whether the method's observation includes many zeros (zero-inflated data), which can often occur in practical situations. Furthermore, the approach is applied to two datasets: (1) one collected during an international synoptic krill survey in the Scotia Sea west of the Antarctic Peninsula to investigate associations among krill, fin whale (<i>Balaenoptera physalus</i>), surface temperature, depth, slope in depth (flatter or steeper terrain), and temperature gradient (slope in temperature); (2) the other collected by ecosystem surveys conducted during August–September in 2014–2017 to investigate associations among common minke whales, the predatory fish Atlantic cod, and their main prey groups (zooplankton, 0-group fish) in Arctic Ocean waters to the west and north of Svalbard, Norway. The R code summarizing our proposed numerical procedure is presented in S4S1.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 6","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2867","citationCount":"0","resultStr":"{\"title\":\"Categorical data analysis using discretization of continuous variables to investigate associations in marine ecosystems\",\"authors\":\"Hiroko Kato Solvang, Shinpei Imori, Martin Biuw, Ulf Lindstrøm, Tore Haug\",\"doi\":\"10.1002/env.2867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Understanding and predicting interactions between predators and prey and their environment are fundamental for understanding food web structure, dynamics, and ecosystem function in both terrestrial and marine ecosystems. Thus, estimating the conditional associations between species and their environments is important for exploring connections or cooperative links in the ecosystem, which in turn can help to clarify such directional relationships. For this purpose, a relevant and practical statistical method is required to link presence/absence observations with biomass, abundance, and physical quantities obtained as continuous real values. These data are sometimes sparse in oceanic space and too short as time series data. To meet this challenge, we provide an approach based on applying categorical data analysis to present/absent observations and real-number data. The real-number data used as explanatory variables for the present/absent response variable are discretized based on the optimal detection of thresholds without any prior biological/ecological information. These discretized data express two different levels, such as large/small or high/low, which give experts a simple interpretation for investigating complicated associations in marine ecosystems. This approach is implemented in the previous statistical method called CATDAP developed by Sakamoto and Akaike in 1979. Our proposed approach consists of a two-step procedure for categorical data analysis: (1) finding the appropriate threshold to discretize the real-number data for applying an independent test; and (2) identifying the best conditional probability model to investigate the possible associations among the data based on a statistical information criterion. We perform a simulation study to validate our proposed approach and investigate whether the method's observation includes many zeros (zero-inflated data), which can often occur in practical situations. Furthermore, the approach is applied to two datasets: (1) one collected during an international synoptic krill survey in the Scotia Sea west of the Antarctic Peninsula to investigate associations among krill, fin whale (<i>Balaenoptera physalus</i>), surface temperature, depth, slope in depth (flatter or steeper terrain), and temperature gradient (slope in temperature); (2) the other collected by ecosystem surveys conducted during August–September in 2014–2017 to investigate associations among common minke whales, the predatory fish Atlantic cod, and their main prey groups (zooplankton, 0-group fish) in Arctic Ocean waters to the west and north of Svalbard, Norway. The R code summarizing our proposed numerical procedure is presented in S4S1.</p>\",\"PeriodicalId\":50512,\"journal\":{\"name\":\"Environmetrics\",\"volume\":\"35 6\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2867\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmetrics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/env.2867\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmetrics","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/env.2867","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Categorical data analysis using discretization of continuous variables to investigate associations in marine ecosystems
Understanding and predicting interactions between predators and prey and their environment are fundamental for understanding food web structure, dynamics, and ecosystem function in both terrestrial and marine ecosystems. Thus, estimating the conditional associations between species and their environments is important for exploring connections or cooperative links in the ecosystem, which in turn can help to clarify such directional relationships. For this purpose, a relevant and practical statistical method is required to link presence/absence observations with biomass, abundance, and physical quantities obtained as continuous real values. These data are sometimes sparse in oceanic space and too short as time series data. To meet this challenge, we provide an approach based on applying categorical data analysis to present/absent observations and real-number data. The real-number data used as explanatory variables for the present/absent response variable are discretized based on the optimal detection of thresholds without any prior biological/ecological information. These discretized data express two different levels, such as large/small or high/low, which give experts a simple interpretation for investigating complicated associations in marine ecosystems. This approach is implemented in the previous statistical method called CATDAP developed by Sakamoto and Akaike in 1979. Our proposed approach consists of a two-step procedure for categorical data analysis: (1) finding the appropriate threshold to discretize the real-number data for applying an independent test; and (2) identifying the best conditional probability model to investigate the possible associations among the data based on a statistical information criterion. We perform a simulation study to validate our proposed approach and investigate whether the method's observation includes many zeros (zero-inflated data), which can often occur in practical situations. Furthermore, the approach is applied to two datasets: (1) one collected during an international synoptic krill survey in the Scotia Sea west of the Antarctic Peninsula to investigate associations among krill, fin whale (Balaenoptera physalus), surface temperature, depth, slope in depth (flatter or steeper terrain), and temperature gradient (slope in temperature); (2) the other collected by ecosystem surveys conducted during August–September in 2014–2017 to investigate associations among common minke whales, the predatory fish Atlantic cod, and their main prey groups (zooplankton, 0-group fish) in Arctic Ocean waters to the west and north of Svalbard, Norway. The R code summarizing our proposed numerical procedure is presented in S4S1.
期刊介绍:
Environmetrics, the official journal of The International Environmetrics Society (TIES), an Association of the International Statistical Institute, is devoted to the dissemination of high-quality quantitative research in the environmental sciences.
The journal welcomes pertinent and innovative submissions from quantitative disciplines developing new statistical and mathematical techniques, methods, and theories that solve modern environmental problems. Articles must proffer substantive, new statistical or mathematical advances to answer important scientific questions in the environmental sciences, or must develop novel or enhanced statistical methodology with clear applications to environmental science. New methods should be illustrated with recent environmental data.