Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10109
Denis Jered McInerney, William Dickinson, Lucy Flynn, Andrea Young, Geoffrey Young, J.-W. van de Meent, Byron C. Wallace
Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual"true"diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.
{"title":"Towards Reducing Diagnostic Errors with Interpretable Risk Prediction","authors":"Denis Jered McInerney, William Dickinson, Lucy Flynn, Andrea Young, Geoffrey Young, J.-W. van de Meent, Byron C. Wallace","doi":"10.48550/arXiv.2402.10109","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10109","url":null,"abstract":"Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual\"true\"diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"9 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10095
Shahar Yadin, Noam Elata, T. Michaeli
A prominent family of methods for learning data distributions relies on density ratio estimation (DRE), where a model is trained to $textit{classify}$ between data samples and samples from some reference distribution. These techniques are successful in simple low-dimensional settings but fail to achieve good results on complex high-dimensional data, like images. A different family of methods for learning distributions is that of denoising diffusion models (DDMs), in which a model is trained to $textit{denoise}$ data samples. These approaches achieve state-of-the-art results in image, video, and audio generation. In this work, we present $textit{Classification Diffusion Models}$ (CDMs), a generative technique that adopts the denoising-based formalism of DDMs while making use of a classifier that predicts the amount of noise added to a clean signal, similarly to DRE methods. Our approach is based on the observation that an MSE-optimal denoiser for white Gaussian noise can be expressed in terms of the gradient of a cross-entropy-optimal classifier for predicting the noise level. As we illustrate, CDM achieves better denoising results compared to DDM, and leads to at least comparable FID in image generation. CDM is also capable of highly efficient one-step exact likelihood estimation, achieving state-of-the-art results among methods that use a single step. Code is available on the project's webpage in https://shaharYadin.github.io/CDM/ .
{"title":"Classification Diffusion Models","authors":"Shahar Yadin, Noam Elata, T. Michaeli","doi":"10.48550/arXiv.2402.10095","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10095","url":null,"abstract":"A prominent family of methods for learning data distributions relies on density ratio estimation (DRE), where a model is trained to $textit{classify}$ between data samples and samples from some reference distribution. These techniques are successful in simple low-dimensional settings but fail to achieve good results on complex high-dimensional data, like images. A different family of methods for learning distributions is that of denoising diffusion models (DDMs), in which a model is trained to $textit{denoise}$ data samples. These approaches achieve state-of-the-art results in image, video, and audio generation. In this work, we present $textit{Classification Diffusion Models}$ (CDMs), a generative technique that adopts the denoising-based formalism of DDMs while making use of a classifier that predicts the amount of noise added to a clean signal, similarly to DRE methods. Our approach is based on the observation that an MSE-optimal denoiser for white Gaussian noise can be expressed in terms of the gradient of a cross-entropy-optimal classifier for predicting the noise level. As we illustrate, CDM achieves better denoising results compared to DDM, and leads to at least comparable FID in image generation. CDM is also capable of highly efficient one-step exact likelihood estimation, achieving state-of-the-art results among methods that use a single step. Code is available on the project's webpage in https://shaharYadin.github.io/CDM/ .","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"9 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10038
Saeed Khaki, JinJin Li, Lan Ma, Liu Yang, Prathap Ramachandra
Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent. However, proximal policy optimization (PPO) based RLHF is occasionally unstable requiring significant hyperparameter finetuning, and computationally expensive to maximize the estimated reward during alignment. Recently, direct preference optimization (DPO) is proposed to address those challenges. However, DPO relies on contrastive responses generated from human annotator and alternative LLM, instead of the policy model, limiting the effectiveness of the RLHF. In this paper, we addresses both challenges by systematically combining rejection sampling (RS) and DPO. Our proposed method, RS-DPO, initiates with the development of a supervised fine-tuned policy model (SFT). A varied set of k responses per prompt are sampled directly from the SFT model. RS-DPO identifies pairs of contrastive samples based on their reward distribution. Finally, we apply DPO with the contrastive samples to align the model to human preference. Our experiments indicate that our proposed method effectively fine-tunes LLMs with limited resource environments, leading to improved alignment with user intent. Furthermore, it outperforms existing methods, including RS, PPO, and DPO.
{"title":"RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models","authors":"Saeed Khaki, JinJin Li, Lan Ma, Liu Yang, Prathap Ramachandra","doi":"10.48550/arXiv.2402.10038","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10038","url":null,"abstract":"Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent. However, proximal policy optimization (PPO) based RLHF is occasionally unstable requiring significant hyperparameter finetuning, and computationally expensive to maximize the estimated reward during alignment. Recently, direct preference optimization (DPO) is proposed to address those challenges. However, DPO relies on contrastive responses generated from human annotator and alternative LLM, instead of the policy model, limiting the effectiveness of the RLHF. In this paper, we addresses both challenges by systematically combining rejection sampling (RS) and DPO. Our proposed method, RS-DPO, initiates with the development of a supervised fine-tuned policy model (SFT). A varied set of k responses per prompt are sampled directly from the SFT model. RS-DPO identifies pairs of contrastive samples based on their reward distribution. Finally, we apply DPO with the contrastive samples to align the model to human preference. Our experiments indicate that our proposed method effectively fine-tunes LLMs with limited resource environments, leading to improved alignment with user intent. Furthermore, it outperforms existing methods, including RS, PPO, and DPO.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"12 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.10053
Tianyi Zhou, Stefan Neumann, Kiran Garimella, A. Gionis
Timeline algorithms are key parts of online social networks, but during recent years they have been blamed for increasing polarization and disagreement in our society. Opinion-dynamics models have been used to study a variety of phenomena in online social networks, but an open question remains on how these models can be augmented to take into account the fine-grained impact of user-level timeline algorithms. We make progress on this question by providing a way to model the impact of timeline algorithms on opinion dynamics. Specifically, we show how the popular Friedkin--Johnsen opinion-formation model can be augmented based on aggregate information, extracted from timeline data. We use our model to study the problem of minimizing the polarization and disagreement; we assume that we are allowed to make small changes to the users' timeline compositions by strengthening some topics of discussion and penalizing some others. We present a gradient descent-based algorithm for this problem, and show that under realistic parameter settings, our algorithm computes a $(1+varepsilon)$-approximate solution in time $tilde{O}(msqrt{n} lg(1/varepsilon))$, where $m$ is the number of edges in the graph and $n$ is the number of vertices. We also present an algorithm that provably computes an $varepsilon$-approximation of our model in near-linear time. We evaluate our method on real-world data and show that it effectively reduces the polarization and disagreement in the network. Finally, we release an anonymized graph dataset with ground-truth opinions and more than 27,000 nodes (the previously largest publicly available dataset contains less than 550 nodes).
{"title":"Modeling the Impact of Timeline Algorithms on Opinion Dynamics Using Low-rank Updates","authors":"Tianyi Zhou, Stefan Neumann, Kiran Garimella, A. Gionis","doi":"10.48550/arXiv.2402.10053","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10053","url":null,"abstract":"Timeline algorithms are key parts of online social networks, but during recent years they have been blamed for increasing polarization and disagreement in our society. Opinion-dynamics models have been used to study a variety of phenomena in online social networks, but an open question remains on how these models can be augmented to take into account the fine-grained impact of user-level timeline algorithms. We make progress on this question by providing a way to model the impact of timeline algorithms on opinion dynamics. Specifically, we show how the popular Friedkin--Johnsen opinion-formation model can be augmented based on aggregate information, extracted from timeline data. We use our model to study the problem of minimizing the polarization and disagreement; we assume that we are allowed to make small changes to the users' timeline compositions by strengthening some topics of discussion and penalizing some others. We present a gradient descent-based algorithm for this problem, and show that under realistic parameter settings, our algorithm computes a $(1+varepsilon)$-approximate solution in time $tilde{O}(msqrt{n} lg(1/varepsilon))$, where $m$ is the number of edges in the graph and $n$ is the number of vertices. We also present an algorithm that provably computes an $varepsilon$-approximation of our model in near-linear time. We evaluate our method on real-world data and show that it effectively reduces the polarization and disagreement in the network. Finally, we release an anonymized graph dataset with ground-truth opinions and more than 27,000 nodes (the previously largest publicly available dataset contains less than 550 nodes).","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"8 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09635
Sedat Ozer, A. P. Ndigande
This paper proposes a deep learning based solution for multi-modal image alignment regarding UAV-taken images. Many recently proposed state-of-the-art alignment techniques rely on using Lucas-Kanade (LK) based solutions for a successful alignment. However, we show that we can achieve state of the art results without using LK-based methods. Our approach carefully utilizes a two-branch based convolutional neural network (CNN) based on feature embedding blocks. We propose two variants of our approach, where in the first variant (ModelA), we directly predict the new coordinates of only the four corners of the image to be aligned; and in the second one (ModelB), we predict the homography matrix directly. Applying alignment on the image corners forces algorithm to match only those four corners as opposed to computing and matching many (key)points, since the latter may cause many outliers, yielding less accurate alignment. We test our proposed approach on four aerial datasets and obtain state of the art results, when compared to the existing recent deep LK-based architectures.
{"title":"VisIRNet: Deep Image Alignment for UAV-taken Visible and Infrared Image Pairs","authors":"Sedat Ozer, A. P. Ndigande","doi":"10.48550/arXiv.2402.09635","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09635","url":null,"abstract":"This paper proposes a deep learning based solution for multi-modal image alignment regarding UAV-taken images. Many recently proposed state-of-the-art alignment techniques rely on using Lucas-Kanade (LK) based solutions for a successful alignment. However, we show that we can achieve state of the art results without using LK-based methods. Our approach carefully utilizes a two-branch based convolutional neural network (CNN) based on feature embedding blocks. We propose two variants of our approach, where in the first variant (ModelA), we directly predict the new coordinates of only the four corners of the image to be aligned; and in the second one (ModelB), we predict the homography matrix directly. Applying alignment on the image corners forces algorithm to match only those four corners as opposed to computing and matching many (key)points, since the latter may cause many outliers, yielding less accurate alignment. We test our proposed approach on four aerial datasets and obtain state of the art results, when compared to the existing recent deep LK-based architectures.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09740
Won-Kwang Park
In this study, we consider the application of orthogonality sampling method (OSM) with single and multiple sources for a fast identification of small objects in limited-aperture inverse scattering problem. We first apply the OSM with single source and show that the indicator function with single source can be expressed by the Bessel function of order zero of the first kind, infinite series of Bessel function of nonzero integer order of the first kind, range of signal receiver, and the location of emitter. Based on this result, we explain that the objects can be identified through the OSM with single source but the identification is significantly influenced by the location of source and applied frequency. For a successful improvement, we then consider the OSM with multiple sources. Based on the identified structure of the OSM with single source, we design an indicator function of the OSM with multiple sources and show that it can be expressed by the square of the Bessel function of order zero of the first kind an infinite series of the square of Bessel function of nonzero integer order of the first kind. Based on the theoretical results, we explain that the objects can be identified uniquely through the designed OSM. Several numerical experiments with experimental data provided by the Institute Fresnel demonstrate the pros and cons of the OSM with single source and how the designed OSM with multiple sources behave.
在本研究中,我们考虑应用单源和多源的正交采样法(OSM)来快速识别有限孔径反向散射问题中的小物体。我们首先应用了单源的正交采样法,结果表明单源的指示函数可以用第一类零阶贝塞尔函数、第一类非零整数阶贝塞尔函数的无穷级数、信号接收器的范围和发射器的位置来表示。基于这一结果,我们解释说,通过单源 OSM 可以识别物体,但识别效果受到源位置和应用频率的显著影响。为了成功改进,我们随后考虑了多源 OSM。根据单源 OSM 的识别结构,我们设计了多源 OSM 的指示函数,并证明它可以用第一类零阶贝塞尔函数的平方和第一类非零整数阶贝塞尔函数平方的无穷级数来表示。基于理论结果,我们解释了通过设计的 OSM 可以唯一地识别物体。利用菲涅尔研究所提供的实验数据进行的几项数值实验证明了单光源 OSM 的优缺点,以及所设计的多光源 OSM 的性能。
{"title":"Inversion of limited-aperture Fresnel experimental data using orthogonality sampling method with single and multiple sources","authors":"Won-Kwang Park","doi":"10.48550/arXiv.2402.09740","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09740","url":null,"abstract":"In this study, we consider the application of orthogonality sampling method (OSM) with single and multiple sources for a fast identification of small objects in limited-aperture inverse scattering problem. We first apply the OSM with single source and show that the indicator function with single source can be expressed by the Bessel function of order zero of the first kind, infinite series of Bessel function of nonzero integer order of the first kind, range of signal receiver, and the location of emitter. Based on this result, we explain that the objects can be identified through the OSM with single source but the identification is significantly influenced by the location of source and applied frequency. For a successful improvement, we then consider the OSM with multiple sources. Based on the identified structure of the OSM with single source, we design an indicator function of the OSM with multiple sources and show that it can be expressed by the square of the Bessel function of order zero of the first kind an infinite series of the square of Bessel function of nonzero integer order of the first kind. Based on the theoretical results, we explain that the objects can be identified uniquely through the designed OSM. Several numerical experiments with experimental data provided by the Institute Fresnel demonstrate the pros and cons of the OSM with single source and how the designed OSM with multiple sources behave.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.
{"title":"Reproducing, Extending, and Analyzing Naming Experiments","authors":"Rachel Alpern, Ido Lazer, Issar Tzachor, Hanit Hakim, Sapir Weissbuch, D. Feitelson","doi":"10.48550/arXiv.2402.10022","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10022","url":null,"abstract":"Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"23 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mary S'anchez-Gord'on, Ricardo Colomo Palacios, Alex Sanchez Gordon
A role model is a person who serves as an example for others to follow, especially in terms of values, behavior, achievements, and personal characteristics. In this paper, authors study how role models influence software practitioners careers, an aspect not studied in the literature before. By means of this study, authors aim to understand if there are any salient role model archetypes and what characteristics are valued by participants in their role models. To do so, authors use a thematic coding approach to analyze the data collected from interviewing ten Latin American software practitioners. Findings reveal that role models were perceived as sources of knowledge, yet the majority of participants, regardless of their career stage, displayed a stronger interest in the human side and the moral values that their role models embodied. This study also shows that any practitioner can be viewed as a role model.
{"title":"Characterizing Role Models in Software Practitioners' Career: An Interview Study","authors":"Mary S'anchez-Gord'on, Ricardo Colomo Palacios, Alex Sanchez Gordon","doi":"10.1145/3641822.3641883","DOIUrl":"https://doi.org/10.1145/3641822.3641883","url":null,"abstract":"A role model is a person who serves as an example for others to follow, especially in terms of values, behavior, achievements, and personal characteristics. In this paper, authors study how role models influence software practitioners careers, an aspect not studied in the literature before. By means of this study, authors aim to understand if there are any salient role model archetypes and what characteristics are valued by participants in their role models. To do so, authors use a thematic coding approach to analyze the data collected from interviewing ten Latin American software practitioners. Findings reveal that role models were perceived as sources of knowledge, yet the majority of participants, regardless of their career stage, displayed a stronger interest in the human side and the moral values that their role models embodied. This study also shows that any practitioner can be viewed as a role model.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09637
Eyad Shtaiwi, Ahmed Abdelhadi, Husheng Li, Zhu Han, H. V. Poor
Sixth-generation (6G) wireless communication systems, as stated in the European 6G flagship project Hexa-X, are anticipated to feature the integration of intelligence, communication, sensing, positioning, and computation. An important aspect of this integration is integrated sensing and communication (ISAC), in which the same waveform is used for both systems both sensing and communication, to address the challenge of spectrum scarcity. Recently, the orthogonal time frequency space (OTFS) waveform has been proposed to address OFDM's limitations due to the high Doppler spread in some future wireless communication systems. In this paper, we review existing OTFS waveforms for ISAC systems and provide some insights into future research. Firstly, we introduce the basic principles and a system model of OTFS and provide a foundational understanding of this innovative technology's core concepts and architecture. Subsequently, we present an overview of OTFS-based ISAC system frameworks. We provide a comprehensive review of recent research developments and the current state of the art in the field of OTFS-assisted ISAC systems to gain a thorough understanding of the current landscape and advancements. Furthermore, we perform a thorough comparison between OTFS-enabled ISAC operations and traditional OFDM, highlighting the distinctive advantages of OTFS, especially in high Doppler spread scenarios. Subsequently, we address the primary challenges facing OTFS-based ISAC systems, identifying potential limitations and drawbacks. Then, finally, we suggest future research directions, aiming to inspire further innovation in the 6G wireless communication landscape.
{"title":"Orthogonal Time Frequency Space for Integrated Sensing and Communication: A Survey","authors":"Eyad Shtaiwi, Ahmed Abdelhadi, Husheng Li, Zhu Han, H. V. Poor","doi":"10.48550/arXiv.2402.09637","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09637","url":null,"abstract":"Sixth-generation (6G) wireless communication systems, as stated in the European 6G flagship project Hexa-X, are anticipated to feature the integration of intelligence, communication, sensing, positioning, and computation. An important aspect of this integration is integrated sensing and communication (ISAC), in which the same waveform is used for both systems both sensing and communication, to address the challenge of spectrum scarcity. Recently, the orthogonal time frequency space (OTFS) waveform has been proposed to address OFDM's limitations due to the high Doppler spread in some future wireless communication systems. In this paper, we review existing OTFS waveforms for ISAC systems and provide some insights into future research. Firstly, we introduce the basic principles and a system model of OTFS and provide a foundational understanding of this innovative technology's core concepts and architecture. Subsequently, we present an overview of OTFS-based ISAC system frameworks. We provide a comprehensive review of recent research developments and the current state of the art in the field of OTFS-assisted ISAC systems to gain a thorough understanding of the current landscape and advancements. Furthermore, we perform a thorough comparison between OTFS-enabled ISAC operations and traditional OFDM, highlighting the distinctive advantages of OTFS, especially in high Doppler spread scenarios. Subsequently, we address the primary challenges facing OTFS-based ISAC systems, identifying potential limitations and drawbacks. Then, finally, we suggest future research directions, aiming to inspire further innovation in the 6G wireless communication landscape.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-15DOI: 10.48550/arXiv.2402.09722
Stephen Hausler, David Hall, Sutharsan Mahendren, Peyman Moghadam
Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.
{"title":"Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields","authors":"Stephen Hausler, David Hall, Sutharsan Mahendren, Peyman Moghadam","doi":"10.48550/arXiv.2402.09722","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09722","url":null,"abstract":"Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"18 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}