Pub Date : 2022-10-13DOI: 10.1080/26941899.2022.2151950
Jason B. Cho, Sven Serneels, D. Matteson
Non-fungible tokens (NFT) have recently emerged as a novel blockchain hosted financial asset class that has attracted major transaction volumes. Investment decisions rely on data and adequate preprocessing and application of analytics to them. Both owing to the non-fungible nature of the tokens and to a blockchain being the primary data source, NFT transaction data pose several challenges not commonly encountered in traditional financial data. Using data that consist of the transaction history of eight highly valued NFT collections, a selection of such challenges is illustrated. These are: price differentiation by token traits, the possible existence of lateral swaps and wash trades in the transaction history and finally, severe volatility. While this paper merely scratches the surface of how data analytics can be applied in this context, the data and challenges laid out here may present opportunities for future research on the topic.
{"title":"Non-Fungible Token Transactions: Data and Challenges","authors":"Jason B. Cho, Sven Serneels, D. Matteson","doi":"10.1080/26941899.2022.2151950","DOIUrl":"https://doi.org/10.1080/26941899.2022.2151950","url":null,"abstract":"Non-fungible tokens (NFT) have recently emerged as a novel blockchain hosted financial asset class that has attracted major transaction volumes. Investment decisions rely on data and adequate preprocessing and application of analytics to them. Both owing to the non-fungible nature of the tokens and to a blockchain being the primary data source, NFT transaction data pose several challenges not commonly encountered in traditional financial data. Using data that consist of the transaction history of eight highly valued NFT collections, a selection of such challenges is illustrated. These are: price differentiation by token traits, the possible existence of lateral swaps and wash trades in the transaction history and finally, severe volatility. While this paper merely scratches the surface of how data analytics can be applied in this context, the data and challenges laid out here may present opportunities for future research on the topic.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41828060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-05DOI: 10.1080/26941899.2022.2151948
G. Tarr, I. Wilms
Faced with changing markets and evolving consumer demands, beef industries are investing in grading systems to maximise value extraction throughout their entire supply chain. The Meat Standards Australia (MSA) system is a customer-oriented total quality management system that stands out internationally by predicting quality grades of specific muscles processed by a designated cooking method. The model currently underpinning the MSA system requires laborious effort to estimate and its prediction performance may be less accurate in the presence of unbalanced data sets where many"muscle x cook"combinations have few observations and/or few predictors of palatability are available. This paper proposes a novel predictive method for beef eating quality that bridges a spectrum of muscle x cook-specific models. At one extreme, each muscle x cook combination is modelled independently; at the other extreme a pooled predictive model is obtained across all muscle x cook combinations. Via a data-driven regularization method, we cover all muscle x cook-specific models along this spectrum. We demonstrate that the proposed predictive method attains considerable accuracy improvements relative to independent or pooled approaches on unique MSA data sets.
{"title":"Regularized Predictive Models for Beef Eating Quality of Individual Meals","authors":"G. Tarr, I. Wilms","doi":"10.1080/26941899.2022.2151948","DOIUrl":"https://doi.org/10.1080/26941899.2022.2151948","url":null,"abstract":"Faced with changing markets and evolving consumer demands, beef industries are investing in grading systems to maximise value extraction throughout their entire supply chain. The Meat Standards Australia (MSA) system is a customer-oriented total quality management system that stands out internationally by predicting quality grades of specific muscles processed by a designated cooking method. The model currently underpinning the MSA system requires laborious effort to estimate and its prediction performance may be less accurate in the presence of unbalanced data sets where many\"muscle x cook\"combinations have few observations and/or few predictors of palatability are available. This paper proposes a novel predictive method for beef eating quality that bridges a spectrum of muscle x cook-specific models. At one extreme, each muscle x cook combination is modelled independently; at the other extreme a pooled predictive model is obtained across all muscle x cook combinations. Via a data-driven regularization method, we cover all muscle x cook-specific models along this spectrum. We demonstrate that the proposed predictive method attains considerable accuracy improvements relative to independent or pooled approaches on unique MSA data sets.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46893809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-13DOI: 10.1080/26941899.2022.2081002
Marina Friedrich, E. Mahieu, Stephan Smeekes, Jakob Raymaekers, I. Wilms, D. Matteson
{"title":"Data Science in Science: Special Issue on Data Science in Environmental and Climate Sciences","authors":"Marina Friedrich, E. Mahieu, Stephan Smeekes, Jakob Raymaekers, I. Wilms, D. Matteson","doi":"10.1080/26941899.2022.2081002","DOIUrl":"https://doi.org/10.1080/26941899.2022.2081002","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41456847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-15DOI: 10.1080/26941899.2022.2043137
D. Matteson
{"title":"Data Science in Science: A New Journal with a Radically Collaborative Mission","authors":"D. Matteson","doi":"10.1080/26941899.2022.2043137","DOIUrl":"https://doi.org/10.1080/26941899.2022.2043137","url":null,"abstract":"","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41370299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-07DOI: 10.1080/26941899.2023.2194349
William Bekerman, J. Guinness
Wind is a critical component of the Earth system and has unmistakable impacts on everyday life. The CYGNSS satellite mission improves observational coverage of ocean winds via a fleet of eight micro-satellites that use reflected GNSS signals to infer surface wind speed. We present analyses characterizing variability in wind speed measurements among the eight CYGNSS satellites and between antennas. In particular, we use a carefully constructed Gaussian process model that leverages comparisons between CYGNSS and Jason-3 during a one-year period from September 2019 to September 2020. The CYGNSS sensors exhibit a range of biases, most of them between -1.0 m/s and +0.2 m/s with respect to Jason-3, indicating that some CYGNSS sensors are biased with respect to one another and with respect to Jason-3. The biases between the starboard and port antennas within a CYGNSS satellite are smaller. Our results are consistent with, yet sharper than, a more traditional paired comparison analysis. We also explore the possibility that the bias depends on wind speed, finding some evidence that CYGNSS satellites have positive biases with respect to Jason-3 at low wind speeds. However, we argue that there are subtle issues associated with estimating wind speed-dependent biases, so additional careful statistical modeling and analysis is warranted.
{"title":"Comparison of CYGNSS and Jason-3 Wind Speed Measurements via Gaussian Processes","authors":"William Bekerman, J. Guinness","doi":"10.1080/26941899.2023.2194349","DOIUrl":"https://doi.org/10.1080/26941899.2023.2194349","url":null,"abstract":"Wind is a critical component of the Earth system and has unmistakable impacts on everyday life. The CYGNSS satellite mission improves observational coverage of ocean winds via a fleet of eight micro-satellites that use reflected GNSS signals to infer surface wind speed. We present analyses characterizing variability in wind speed measurements among the eight CYGNSS satellites and between antennas. In particular, we use a carefully constructed Gaussian process model that leverages comparisons between CYGNSS and Jason-3 during a one-year period from September 2019 to September 2020. The CYGNSS sensors exhibit a range of biases, most of them between -1.0 m/s and +0.2 m/s with respect to Jason-3, indicating that some CYGNSS sensors are biased with respect to one another and with respect to Jason-3. The biases between the starboard and port antennas within a CYGNSS satellite are smaller. Our results are consistent with, yet sharper than, a more traditional paired comparison analysis. We also explore the possibility that the bias depends on wind speed, finding some evidence that CYGNSS satellites have positive biases with respect to Jason-3 at low wind speeds. However, we argue that there are subtle issues associated with estimating wind speed-dependent biases, so additional careful statistical modeling and analysis is warranted.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43811000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01Epub Date: 2023-01-18DOI: 10.1080/26941899.2022.2157349
M L Wallace, L McTeague, J L Graves, N Kissel, C Tortora, B Wheeler, S Iyengar
Coordinated emotional responses across psychophysiological and subjective indices is a cornerstone of adaptive emotional functioning. Using clustering to identify cross-diagnostic subgroups with similar emotion response profiles may suggest novel underlying mechanisms and treatments.However, many psychophysiological measures are non-normal even in homogenous samples, and over-reliance on traditional elliptical clustering approaches may inhibit the identification of meaningful subgroups. Finite mixture models that allow for non-elliptical cluster distributions is an emerging methodological field that may overcome this hurdle. Furthermore, succinctly quantifying pairwise cluster separation could enhance the clinical utility of the clustering solutions. However, a comprehensive examination of distance measures in the context of elliptical and non-elliptical model-based clustering is needed to provide practical guidance on the computation, benefits, and disadvantages of existing measures. We summarize several measures that can quantify the multivariate distance between two clusters and suggest practical computational tools. Through a simulation study, we evaluate the measures across three scenarios that allow for clusters to differ in location, scale, skewness, and rotation. We then demonstrate our approaches using psychophysiological and subjective responses to emotional imagery captured through the Transdiagnostic Anxiety Study. Finally, we synthesize findings to provide guidance on how to use distance measures in clustering applications.
{"title":"Quantifying Distances Between Non-Elliptical Clusters to Enhance the Identification of Meaningful Emotional Reactivity Subtypes.","authors":"M L Wallace, L McTeague, J L Graves, N Kissel, C Tortora, B Wheeler, S Iyengar","doi":"10.1080/26941899.2022.2157349","DOIUrl":"10.1080/26941899.2022.2157349","url":null,"abstract":"<p><p>Coordinated emotional responses across psychophysiological and subjective indices is a cornerstone of adaptive emotional functioning. Using clustering to identify cross-diagnostic subgroups with similar emotion response profiles may suggest novel underlying mechanisms and treatments.However, many psychophysiological measures are non-normal even in homogenous samples, and over-reliance on traditional elliptical clustering approaches may inhibit the identification of meaningful subgroups. Finite mixture models that allow for non-elliptical cluster distributions is an emerging methodological field that may overcome this hurdle. Furthermore, succinctly quantifying pairwise cluster separation could enhance the clinical utility of the clustering solutions. However, a comprehensive examination of distance measures in the context of elliptical and non-elliptical model-based clustering is needed to provide practical guidance on the computation, benefits, and disadvantages of existing measures. We summarize several measures that can quantify the multivariate distance between two clusters and suggest practical computational tools. Through a simulation study, we evaluate the measures across three scenarios that allow for clusters to differ in location, scale, skewness, and rotation. We then demonstrate our approaches using psychophysiological and subjective responses to emotional imagery captured through the Transdiagnostic Anxiety Study. Finally, we synthesize findings to provide guidance on how to use distance measures in clustering applications.</p>","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"1 1","pages":"34-59"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10166186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9450718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-10DOI: 10.1080/26941899.2022.2158145
Olukunle O. Owolabi, Toryn L. J. Schafer, Georgia E. Smits, Sanhita Sengupta, Sean E. Ryan, Lang Wang, D. Matteson, Mila Getmansky Sherman, D. Sunter
The U.S. electrical grid has undergone substantial transformation with increased penetration of wind and solar -- forms of variable renewable energy (VRE). Despite the benefits of VRE for decarbonization, it has garnered some controversy for inducing unwanted effects in regional electricity markets. In this study, the role of VRE penetration is examined on the system electricity price and price volatility based on hourly, real-time, historical data from six Independent System Operators (ISOs) in the U.S. using quantile and skew t-distribution regressions. After correcting for temporal effects, we found an increase in VRE penetration is associated with decrease in system electricity price in all ISOs studied. The increase in VRE penetration is associated with decrease in temporal price volatility in five out of six ISOs studied. The relationships are non-linear. These results are consistent with the modern portfolio theory where diverse volatile assets may lead to more stable and less risky portfolios.
{"title":"Role of Variable Renewable Energy Penetration on Electricity Price and its Volatility across Independent System Operators in the United States","authors":"Olukunle O. Owolabi, Toryn L. J. Schafer, Georgia E. Smits, Sanhita Sengupta, Sean E. Ryan, Lang Wang, D. Matteson, Mila Getmansky Sherman, D. Sunter","doi":"10.1080/26941899.2022.2158145","DOIUrl":"https://doi.org/10.1080/26941899.2022.2158145","url":null,"abstract":"The U.S. electrical grid has undergone substantial transformation with increased penetration of wind and solar -- forms of variable renewable energy (VRE). Despite the benefits of VRE for decarbonization, it has garnered some controversy for inducing unwanted effects in regional electricity markets. In this study, the role of VRE penetration is examined on the system electricity price and price volatility based on hourly, real-time, historical data from six Independent System Operators (ISOs) in the U.S. using quantile and skew t-distribution regressions. After correcting for temporal effects, we found an increase in VRE penetration is associated with decrease in system electricity price in all ISOs studied. The increase in VRE penetration is associated with decrease in temporal price volatility in five out of six ISOs studied. The relationships are non-linear. These results are consistent with the modern portfolio theory where diverse volatile assets may lead to more stable and less risky portfolios.","PeriodicalId":72770,"journal":{"name":"Data science in science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49445134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}