Monitoring often requires insight into the monitored system as well as concrete specifications of expected behavior. More and more systems, however, provide information about their inner procedures by emitting provenance information in a W3C-standardized graph format. In this work, we present an approach to monitor such provenance data for anomalous behavior by performing spectral graph analysis on slices of the constructed provenance graph and by comparing the characteristics of each slice with those of a sliding window over recently seen slices. We argue that this approach not only simplifies the monitoring of heterogeneous distributed systems, but also enables applying a host of well-studied techniques to monitor such systems.
{"title":"Towards Specificationless Monitoring of Provenance-Emitting Systems","authors":"Martin Stoffers, Alexander Weinert","doi":"arxiv-2207.14163","DOIUrl":"https://doi.org/arxiv-2207.14163","url":null,"abstract":"Monitoring often requires insight into the monitored system as well as\u0000concrete specifications of expected behavior. More and more systems, however,\u0000provide information about their inner procedures by emitting provenance\u0000information in a W3C-standardized graph format. In this work, we present an approach to monitor such provenance data for\u0000anomalous behavior by performing spectral graph analysis on slices of the\u0000constructed provenance graph and by comparing the characteristics of each slice\u0000with those of a sliding window over recently seen slices. We argue that this\u0000approach not only simplifies the monitoring of heterogeneous distributed\u0000systems, but also enables applying a host of well-studied techniques to monitor\u0000such systems.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philip E. Bourne, Vivien Bonazzi, Amy Brand, Bonnie Carroll, Ian Foster, Ramanathan V. Guha, Robert Hanisch, Sallie Ann Keller, Mary Lee Kennedy, Christine Kirkpatrick, Barend Mons, Sarah M. Nusser, Michael Stebbins, George Strawn, Alex Szalay
On August 2, 2021 a group of concerned scientists and US funding agency and federal government officials met for an informal discussion to explore the value and need for a well-coordinated US Open Research Commons (ORC); an interoperable collection of data and compute resources within both the public and private sectors which are easy to use and accessible to all.
{"title":"Playing catch-up in building an open research commons","authors":"Philip E. Bourne, Vivien Bonazzi, Amy Brand, Bonnie Carroll, Ian Foster, Ramanathan V. Guha, Robert Hanisch, Sallie Ann Keller, Mary Lee Kennedy, Christine Kirkpatrick, Barend Mons, Sarah M. Nusser, Michael Stebbins, George Strawn, Alex Szalay","doi":"arxiv-2208.04682","DOIUrl":"https://doi.org/arxiv-2208.04682","url":null,"abstract":"On August 2, 2021 a group of concerned scientists and US funding agency and\u0000federal government officials met for an informal discussion to explore the\u0000value and need for a well-coordinated US Open Research Commons (ORC); an\u0000interoperable collection of data and compute resources within both the public\u0000and private sectors which are easy to use and accessible to all.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haiyi Mao, Minxue Jia, Jason Xiaotian Dou Haotian Zhang Panayiotis V. Benos
Metacells are disjoint and homogeneous groups of single-cell profiles, representing discrete and highly granular cell states. Existing metacell algorithms tend to use only one modality to infer metacells, even though single-cell multi-omics datasets profile multiple molecular modalities within the same cell. Here, we present textbf{C}ross-Mtextbf{O}dal textbf{E}mbedding for textbf{M}etaCell Identification (COEM), which utilizes an embedded space leveraging the information of both scATAC-seq and scRNA-seq to perform aggregation, balancing the trade-off between fine resolution and sufficient sequencing coverage. COEM outperforms the state-of-the-art method SEACells by efficiently identifying accurate and well-separated metacells across datasets with continuous and discrete cell types. Furthermore, COEM significantly improves peak-to-gene association analyses, and facilitates complex gene regulatory inference tasks.
{"title":"COEM: Cross-Modal Embedding for MetaCell Identification","authors":"Haiyi Mao, Minxue Jia, Jason Xiaotian Dou Haotian Zhang Panayiotis V. Benos","doi":"arxiv-2207.07734","DOIUrl":"https://doi.org/arxiv-2207.07734","url":null,"abstract":"Metacells are disjoint and homogeneous groups of single-cell profiles,\u0000representing discrete and highly granular cell states. Existing metacell\u0000algorithms tend to use only one modality to infer metacells, even though\u0000single-cell multi-omics datasets profile multiple molecular modalities within\u0000the same cell. Here, we present textbf{C}ross-Mtextbf{O}dal\u0000textbf{E}mbedding for textbf{M}etaCell Identification (COEM), which utilizes\u0000an embedded space leveraging the information of both scATAC-seq and scRNA-seq\u0000to perform aggregation, balancing the trade-off between fine resolution and\u0000sufficient sequencing coverage. COEM outperforms the state-of-the-art method\u0000SEACells by efficiently identifying accurate and well-separated metacells\u0000across datasets with continuous and discrete cell types. Furthermore, COEM\u0000significantly improves peak-to-gene association analyses, and facilitates\u0000complex gene regulatory inference tasks.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Here we briefly reflect on the philosophical foundations that ground the quest towards ever-detailed models and identify four practical dangers derived from this pursuit: explosion of the model's uncertainty space, model black-boxing, computational exhaustion and model attachment. We argue that the growth of a mathematical model should be carefully and continuously pondered lest models become extraneous constructs chasing the Cartesian dream.
{"title":"Mind the hubris in mathematical modeling","authors":"Arnald Puy, Andrea Saltelli","doi":"arxiv-2207.12230","DOIUrl":"https://doi.org/arxiv-2207.12230","url":null,"abstract":"Here we briefly reflect on the philosophical foundations that ground the\u0000quest towards ever-detailed models and identify four practical dangers derived\u0000from this pursuit: explosion of the model's uncertainty space, model\u0000black-boxing, computational exhaustion and model attachment. We argue that the\u0000growth of a mathematical model should be carefully and continuously pondered\u0000lest models become extraneous constructs chasing the Cartesian dream.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The mystery about the ingenious creator of Bitcoin concealing behind the pseudonym Satoshi Nakamoto has been fascinating the global public for more than a decade. Suddenly jumping out of the dark in 2008, this persona hurled the highly disruptive distributed ledger technology "blockchain" that has added the missing native value layer to the internet. Purposely agnostic without advocating any old or fielding new names, this paper first identifies the degrees of freedom Satoshi Nakamoto had available in the design of Bitcoin, and in fabricating snippets of personal data. By interweaving the substantial collection of previous and new circumstantial with direct evidence, like relevant locations and happenings in history and at the time, a consistent skeleton of Satoshi Nakamoto's biography transpires. The results underpin that the iconic creator of Bitcoin most likely encoded bits of information in his self-chosen alias, dates and blockchain parameters, which particularly point to the numbers 21 and 42, and the numeral systems used in Bitcoin's framework. Moreover, a psychogram of a reclusive and capricious genius is drawn, which sheds new light on Satoshi Nakamoto's background, mindset, pastimes, and penchant for puns; this study may also explain the motivation of his abrupt departure from the public, his continuing abstinence from engaging with the Bitcoin community, and from reaping the fruits of his mindboggling wealth. From a history of technology perspective, such an altruistic sacrifice for the benefit of his brainchild is entirely unprecedented.
{"title":"Satoshi Nakamoto and the Origins of Bitcoin -- Narratio in Nomine, Datis et Numeris","authors":"Jens Ducrée","doi":"arxiv-2206.10257","DOIUrl":"https://doi.org/arxiv-2206.10257","url":null,"abstract":"The mystery about the ingenious creator of Bitcoin concealing behind the\u0000pseudonym Satoshi Nakamoto has been fascinating the global public for more than\u0000a decade. Suddenly jumping out of the dark in 2008, this persona hurled the\u0000highly disruptive distributed ledger technology \"blockchain\" that has added the\u0000missing native value layer to the internet. Purposely agnostic without\u0000advocating any old or fielding new names, this paper first identifies the\u0000degrees of freedom Satoshi Nakamoto had available in the design of Bitcoin, and\u0000in fabricating snippets of personal data. By interweaving the substantial\u0000collection of previous and new circumstantial with direct evidence, like\u0000relevant locations and happenings in history and at the time, a consistent\u0000skeleton of Satoshi Nakamoto's biography transpires. The results underpin that\u0000the iconic creator of Bitcoin most likely encoded bits of information in his\u0000self-chosen alias, dates and blockchain parameters, which particularly point to\u0000the numbers 21 and 42, and the numeral systems used in Bitcoin's framework.\u0000Moreover, a psychogram of a reclusive and capricious genius is drawn, which\u0000sheds new light on Satoshi Nakamoto's background, mindset, pastimes, and\u0000penchant for puns; this study may also explain the motivation of his abrupt\u0000departure from the public, his continuing abstinence from engaging with the\u0000Bitcoin community, and from reaping the fruits of his mindboggling wealth. From\u0000a history of technology perspective, such an altruistic sacrifice for the\u0000benefit of his brainchild is entirely unprecedented.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
If Turing's groundbreaking paper in 1936 laid the foundation of the theory of computation (ToC), it is no exaggeration to say that Cook's paper in 1971, "The complexity of theorem proving procedures", [4] has pioneered the study of computational complexity. So computational complexity, as an independent research field, is 50 years old now (2021) if we date from Cook's article. This year coincides with the 100th birthday of Cook's mentor Hao Wang, one of the most important logicians. This paper traces the origin of computational complexity, and meanwhile, tries to sort out the instrumental role that Wang played in the process.
如果说图灵1936年的开创性论文奠定了计算理论(ToC)的基础,那么毫不夸张地说,库克1971年的论文“the complexity of theorem proving procedures”[4]开创了计算复杂性的研究。因此,如果我们从库克的文章开始算起,计算复杂性作为一个独立的研究领域,现在(2021年)已经有50年的历史了。今年恰逢库克的导师、最重要的逻辑学家之一王皓诞辰100周年。本文追溯了计算复杂性的起源,并试图梳理王在这一过程中所起的重要作用。
{"title":"50 Years of Computational Complexity: Hao Wang and the Theory of Computation","authors":"Nick Zhang","doi":"arxiv-2206.05274","DOIUrl":"https://doi.org/arxiv-2206.05274","url":null,"abstract":"If Turing's groundbreaking paper in 1936 laid the foundation of the theory of\u0000computation (ToC), it is no exaggeration to say that Cook's paper in 1971, \"The\u0000complexity of theorem proving procedures\", [4] has pioneered the study of\u0000computational complexity. So computational complexity, as an independent\u0000research field, is 50 years old now (2021) if we date from Cook's article. This\u0000year coincides with the 100th birthday of Cook's mentor Hao Wang, one of the\u0000most important logicians. This paper traces the origin of computational\u0000complexity, and meanwhile, tries to sort out the instrumental role that Wang\u0000played in the process.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"126 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Athanasios Vlontzos, Daniel Rueckert, Bernhard Kainz
Medical image analysis is a vibrant research area that offers doctors and medical practitioners invaluable insight and the ability to accurately diagnose and monitor disease. Machine learning provides an additional boost for this area. However, machine learning for medical image analysis is particularly vulnerable to natural biases like domain shifts that affect algorithmic performance and robustness. In this paper we analyze machine learning for medical image analysis within the framework of Technology Readiness Levels and review how causal analysis methods can fill a gap when creating robust and adaptable medical image analysis algorithms. We review methods using causality in medical imaging AI/ML and find that causal analysis has the potential to mitigate critical problems for clinical translation but that uptake and clinical downstream research has been limited so far.
{"title":"A Review of Causality for Learning Algorithms in Medical Image Analysis","authors":"Athanasios Vlontzos, Daniel Rueckert, Bernhard Kainz","doi":"arxiv-2206.05498","DOIUrl":"https://doi.org/arxiv-2206.05498","url":null,"abstract":"Medical image analysis is a vibrant research area that offers doctors and\u0000medical practitioners invaluable insight and the ability to accurately diagnose\u0000and monitor disease. Machine learning provides an additional boost for this\u0000area. However, machine learning for medical image analysis is particularly\u0000vulnerable to natural biases like domain shifts that affect algorithmic\u0000performance and robustness. In this paper we analyze machine learning for\u0000medical image analysis within the framework of Technology Readiness Levels and\u0000review how causal analysis methods can fill a gap when creating robust and\u0000adaptable medical image analysis algorithms. We review methods using causality\u0000in medical imaging AI/ML and find that causal analysis has the potential to\u0000mitigate critical problems for clinical translation but that uptake and\u0000clinical downstream research has been limited so far.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikos D. Fakotakis, Stavros Nousias, Gerasimos Arvanitis, Evangelia I. Zacharaki, Konstantinos Moustakas
Asthma is a common, usually long-term respiratory disease with negative impact on society and the economy worldwide. Treatment involves using medical devices (inhalers) that distribute medication to the airways, and its efficiency depends on the precision of the inhalation technique. Health monitoring systems equipped with sensors and embedded with sound signal detection enable the recognition of drug actuation and could be powerful tools for reliable audio content analysis. This paper revisits audio pattern recognition and machine learning techniques for asthma medication adherence assessment and presents the Respiratory and Drug Actuation (RDA) Suite(https://gitlab.com/vvr/monitoring-medication-adherence/rda-benchmark) for benchmarking and further research. The RDA Suite includes a set of tools for audio processing, feature extraction and classification and is provided along with a dataset consisting of respiratory and drug actuation sounds. The classification models in RDA are implemented based on conventional and advanced machine learning and deep network architectures. This study provides a comparative evaluation of the implemented approaches, examines potential improvements and discusses challenges and future tendencies.
{"title":"Revisiting Audio Pattern Recognition for Asthma Medication Adherence: Evaluation with the RDA Benchmark Suite","authors":"Nikos D. Fakotakis, Stavros Nousias, Gerasimos Arvanitis, Evangelia I. Zacharaki, Konstantinos Moustakas","doi":"arxiv-2205.15360","DOIUrl":"https://doi.org/arxiv-2205.15360","url":null,"abstract":"Asthma is a common, usually long-term respiratory disease with negative\u0000impact on society and the economy worldwide. Treatment involves using medical\u0000devices (inhalers) that distribute medication to the airways, and its\u0000efficiency depends on the precision of the inhalation technique. Health\u0000monitoring systems equipped with sensors and embedded with sound signal\u0000detection enable the recognition of drug actuation and could be powerful tools\u0000for reliable audio content analysis. This paper revisits audio pattern\u0000recognition and machine learning techniques for asthma medication adherence\u0000assessment and presents the Respiratory and Drug Actuation (RDA)\u0000Suite(https://gitlab.com/vvr/monitoring-medication-adherence/rda-benchmark) for\u0000benchmarking and further research. The RDA Suite includes a set of tools for\u0000audio processing, feature extraction and classification and is provided along\u0000with a dataset consisting of respiratory and drug actuation sounds. The\u0000classification models in RDA are implemented based on conventional and advanced\u0000machine learning and deep network architectures. This study provides a\u0000comparative evaluation of the implemented approaches, examines potential\u0000improvements and discusses challenges and future tendencies.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Moore's Law has been used by semiconductor industry as predicative indicators of the industry and it has become a self-fulfilling prophecy. Now more people tend to agree that the original Moore's Law started to falter. This paper proposes a possible quantitative modification to Moore's Law. It can cover other derivative laws of Moore's Law as well. It intends to more accurately predict the roadmap of chip's performance and energy consumption.
{"title":"Moore's Law is dead, long live Moore's Law!","authors":"Nick Zhang","doi":"arxiv-2205.15011","DOIUrl":"https://doi.org/arxiv-2205.15011","url":null,"abstract":"Moore's Law has been used by semiconductor industry as predicative indicators\u0000of the industry and it has become a self-fulfilling prophecy. Now more people\u0000tend to agree that the original Moore's Law started to falter. This paper\u0000proposes a possible quantitative modification to Moore's Law. It can cover\u0000other derivative laws of Moore's Law as well. It intends to more accurately\u0000predict the roadmap of chip's performance and energy consumption.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the rise of deep learning and automation requirements in the software industry has elevated Intelligent Software Engineering to new heights. The number of approaches and applications in code understanding is growing, with deep learning techniques being used in many of them to better capture the information in code data. In this survey, we present a comprehensive overview of the structures formed from code data. We categorize the models for understanding code in recent years into two groups: sequence-based and graph-based models, further make a summary and comparison of them. We also introduce metrics, datasets and the downstream tasks. Finally, we make some suggestions for future research in structural code understanding field.
{"title":"A Survey of Deep Learning Models for Structural Code Understanding","authors":"Ruoting Wu, Yuxin Zhang, Qibiao Peng, Liang Chen, Zibin Zheng","doi":"arxiv-2205.01293","DOIUrl":"https://doi.org/arxiv-2205.01293","url":null,"abstract":"In recent years, the rise of deep learning and automation requirements in the\u0000software industry has elevated Intelligent Software Engineering to new heights.\u0000The number of approaches and applications in code understanding is growing,\u0000with deep learning techniques being used in many of them to better capture the\u0000information in code data. In this survey, we present a comprehensive overview\u0000of the structures formed from code data. We categorize the models for\u0000understanding code in recent years into two groups: sequence-based and\u0000graph-based models, further make a summary and comparison of them. We also\u0000introduce metrics, datasets and the downstream tasks. Finally, we make some\u0000suggestions for future research in structural code understanding field.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"166 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}