Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...]
通信技术的迅速发展和科学发现带来的技术不断进步要求建立特定的数据库 [...]
{"title":"Data in Astrophysics and Geophysics: Novel Research and Applications","authors":"V. Srećković, Milan S. Dimitrijević, Z. Mijić","doi":"10.3390/data9020032","DOIUrl":"https://doi.org/10.3390/data9020032","url":null,"abstract":"Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...]","PeriodicalId":502371,"journal":{"name":"Data","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139791366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang, Wentao Yang
China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts.
{"title":"The Yinshan Mountains Record over 10,000 Landslides","authors":"Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang, Wentao Yang","doi":"10.3390/data9020031","DOIUrl":"https://doi.org/10.3390/data9020031","url":null,"abstract":"China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts.","PeriodicalId":502371,"journal":{"name":"Data","volume":"104 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139794279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang, Wentao Yang
China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts.
{"title":"The Yinshan Mountains Record over 10,000 Landslides","authors":"Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang, Wentao Yang","doi":"10.3390/data9020031","DOIUrl":"https://doi.org/10.3390/data9020031","url":null,"abstract":"China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts.","PeriodicalId":502371,"journal":{"name":"Data","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139854228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...]
通信技术的迅速发展和科学发现带来的技术不断进步要求建立特定的数据库 [...]
{"title":"Data in Astrophysics and Geophysics: Novel Research and Applications","authors":"V. Srećković, Milan S. Dimitrijević, Z. Mijić","doi":"10.3390/data9020032","DOIUrl":"https://doi.org/10.3390/data9020032","url":null,"abstract":"Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...]","PeriodicalId":502371,"journal":{"name":"Data","volume":"64 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139851299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Khoruzhaya, T. Bobrovskaya, D. V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, E. I. Kremneva
Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.
颅内出血(ICH)是一种危及生命并导致残疾的危险疾病。及时和高质量的诊断对疾病的进程和预后起着重要作用。确定 ICH 的金标准是计算机断层扫描。这种方法需要高素质人员的及时参与,但这并不总是可能的,例如在人员短缺或工作量增加的情况下。在这种情况下,必须分秒必争,否则就会耽误时间。解决这一问题的办法似乎是制定一套诊断决策,包括使用人工智能,这将有助于及时发现非物质文化遗产患者,并提供及时、优质的医疗护理。然而,人工智能发展的主要障碍是缺乏用于训练和测试的高质量数据集。在本文中,我们介绍了一个数据集,其中包括 800 张脑 CT 扫描图像,这些图像由多个系列的 DICOM 图像组成,既有 ICH 的迹象,也有非 ICH 的迹象,并添加了临床和技术参数,同时还介绍了利用自然语言处理工具生成数据集的方法。该数据集是公开的,这有助于提高人工智能系统开发的竞争性,促进其进步和质量改进。
{"title":"Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification","authors":"A. Khoruzhaya, T. Bobrovskaya, D. V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, E. I. Kremneva","doi":"10.3390/data9020030","DOIUrl":"https://doi.org/10.3390/data9020030","url":null,"abstract":"Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.","PeriodicalId":502371,"journal":{"name":"Data","volume":"213 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139799614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Khoruzhaya, T. Bobrovskaya, D. V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, E. I. Kremneva
Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.
颅内出血(ICH)是一种危及生命并导致残疾的危险疾病。及时和高质量的诊断对疾病的进程和预后起着重要作用。确定 ICH 的金标准是计算机断层扫描。这种方法需要高素质人员的及时参与,但这并不总是可能的,例如在人员短缺或工作量增加的情况下。在这种情况下,必须分秒必争,否则就会耽误时间。解决这一问题的办法似乎是制定一套诊断决策,包括使用人工智能,这将有助于及时发现非物质文化遗产患者,并提供及时、优质的医疗护理。然而,人工智能发展的主要障碍是缺乏用于训练和测试的高质量数据集。在本文中,我们介绍了一个数据集,其中包括 800 张脑 CT 扫描图像,这些图像由多个系列的 DICOM 图像组成,既有 ICH 的迹象,也有非 ICH 的迹象,并添加了临床和技术参数,同时还介绍了利用自然语言处理工具生成数据集的方法。该数据集是公开的,这有助于提高人工智能系统开发的竞争性,促进其进步和质量改进。
{"title":"Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification","authors":"A. Khoruzhaya, T. Bobrovskaya, D. V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov, E. I. Kremneva","doi":"10.3390/data9020030","DOIUrl":"https://doi.org/10.3390/data9020030","url":null,"abstract":"Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement.","PeriodicalId":502371,"journal":{"name":"Data","volume":"131 2-3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139859584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordan Truman Paul Noel, Vinicius Prado da Fonseca, Amilcar Soares
Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing.
几十年来,动量一直是体育科学研究的一个方面。在已有的文献中,有时会出现结论不一致的情况。然而,如果动量确实是一种实际现象,那么它将影响体育的方方面面,从球员评估到赛前预测和投注。因此,我们使用基于动量的特征来量化一支球队的线性比赛趋势,开发了一个数据管道,使用近期比赛的小样本来评估球队的比赛质量,并使用几种机器学习技术来衡量基于动量的特征的预测能力与几个联赛中更传统的基于频率的特征的预测能力。更准确地说,我们使用我们的管道来确定基于动量特征的预测能力与标准统计特征在美国国家冰球联盟(NHL)、美国国家篮球协会(NBA)和欧洲五大甲级足球联赛中的差异。我们的研究结果表明,几乎没有证据表明动量在 NBA 中具有更强的预测能力。尽管如此,我们还是发现了一些动量对国家曲棍球协会的影响,这些影响产生了更好的赛前预测结果,而我们在欧洲足球/橄榄球中也发现了类似的趋势。我们的研究结果表明,基于动量的特征与基于频率的特征相结合,可以改进赛前预测模型,而且未来应更多地从特征/性能指标的角度研究动量,而不是从连续结果的依赖性角度研究动量,从而尝试将动量与二元胜负观拉开距离。
{"title":"A Comprehensive Data Pipeline for Comparing the Effects of Momentum on Sports Leagues","authors":"Jordan Truman Paul Noel, Vinicius Prado da Fonseca, Amilcar Soares","doi":"10.3390/data9020029","DOIUrl":"https://doi.org/10.3390/data9020029","url":null,"abstract":"Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing.","PeriodicalId":502371,"journal":{"name":"Data","volume":"257 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139821370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valerija Movcana, Arnis Strods, Karīna Narbute, Fēlikss Rūmnieks, Roberts Rimša, G. Mozolevskis, Maksims Ivanovs, Roberts Kadiķis, Karlis Zviedris, Laura Leja, Anastasija Zujeva, Tamāra Laimiņa, A. Abols
Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. OOC platforms offer more physiologically relevant microenvironments, enabling real-time monitoring of tissue, to develop functional tissue models. Imaging methods are the most common approach for daily monitoring of tissue development. Image-based machine learning serves as a valuable tool for enhancing and monitoring OOC models in real-time. This involves the classification of images generated through microscopy contributing to the refinement of model performance. This paper presents an image dataset, containing cell images generated from OOC setup with different cell types. There are 3072 images generated by an automated brightfield microscopy setup. For some images, parameters such as cell type, seeding density, time after seeding and flow rate are provided. These parameters along with predefined criteria can contribute to the evaluation of image quality and identification of potential artifacts. This dataset can be used as a basis for training machine learning classifiers for automated data analysis generated from an OOC setup providing more reliable tissue models, automated decision-making processes within the OOC framework and efficient research in the future.
{"title":"Organ-On-A-Chip (OOC) Image Dataset for Machine Learning and Tissue Model Evaluation","authors":"Valerija Movcana, Arnis Strods, Karīna Narbute, Fēlikss Rūmnieks, Roberts Rimša, G. Mozolevskis, Maksims Ivanovs, Roberts Kadiķis, Karlis Zviedris, Laura Leja, Anastasija Zujeva, Tamāra Laimiņa, A. Abols","doi":"10.3390/data9020028","DOIUrl":"https://doi.org/10.3390/data9020028","url":null,"abstract":"Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. OOC platforms offer more physiologically relevant microenvironments, enabling real-time monitoring of tissue, to develop functional tissue models. Imaging methods are the most common approach for daily monitoring of tissue development. Image-based machine learning serves as a valuable tool for enhancing and monitoring OOC models in real-time. This involves the classification of images generated through microscopy contributing to the refinement of model performance. This paper presents an image dataset, containing cell images generated from OOC setup with different cell types. There are 3072 images generated by an automated brightfield microscopy setup. For some images, parameters such as cell type, seeding density, time after seeding and flow rate are provided. These parameters along with predefined criteria can contribute to the evaluation of image quality and identification of potential artifacts. This dataset can be used as a basis for training machine learning classifiers for automated data analysis generated from an OOC setup providing more reliable tissue models, automated decision-making processes within the OOC framework and efficient research in the future.","PeriodicalId":502371,"journal":{"name":"Data","volume":"47 38","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139683915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordan Truman Paul Noel, Vinicius Prado da Fonseca, Amilcar Soares
Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing.
几十年来,动量一直是体育科学研究的一个方面。在已有的文献中,有时会出现结论不一致的情况。然而,如果动量确实是一种实际现象,那么它将影响体育的方方面面,从球员评估到赛前预测和投注。因此,我们使用基于动量的特征来量化一支球队的线性比赛趋势,开发了一个数据管道,使用近期比赛的小样本来评估球队的比赛质量,并使用几种机器学习技术来衡量基于动量的特征的预测能力与几个联赛中更传统的基于频率的特征的预测能力。更准确地说,我们使用我们的管道来确定基于动量特征的预测能力与标准统计特征在美国国家冰球联盟(NHL)、美国国家篮球协会(NBA)和欧洲五大甲级足球联赛中的差异。我们的研究结果表明,几乎没有证据表明动量在 NBA 中具有更强的预测能力。尽管如此,我们还是发现了一些动量对国家曲棍球协会的影响,这些影响产生了更好的赛前预测结果,而我们在欧洲足球/橄榄球中也发现了类似的趋势。我们的研究结果表明,基于动量的特征与基于频率的特征相结合,可以改进赛前预测模型,而且未来应更多地从特征/性能指标的角度研究动量,而不是从连续结果的依赖性角度研究动量,从而尝试将动量与二元胜负观拉开距离。
{"title":"A Comprehensive Data Pipeline for Comparing the Effects of Momentum on Sports Leagues","authors":"Jordan Truman Paul Noel, Vinicius Prado da Fonseca, Amilcar Soares","doi":"10.3390/data9020029","DOIUrl":"https://doi.org/10.3390/data9020029","url":null,"abstract":"Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing.","PeriodicalId":502371,"journal":{"name":"Data","volume":"26 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139881413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henrik tom Wörden, Florian Spreckelsen, Stefan Luther, Ulrich Parlitz, A. Schlemmer
Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”).
{"title":"Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems","authors":"Henrik tom Wörden, Florian Spreckelsen, Stefan Luther, Ulrich Parlitz, A. Schlemmer","doi":"10.3390/data9020024","DOIUrl":"https://doi.org/10.3390/data9020024","url":null,"abstract":"Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”).","PeriodicalId":502371,"journal":{"name":"Data","volume":"77 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139593674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}