Pub Date : 2024-03-04DOI: 10.59275/j.melba.2024-d434
Aisha L. Shuaibu, Ivor J. A. Simpson
Methods for medical image registration infer geometric transformations that align pairs, or groups, of images by maximising an image similarity metric. This problem is ill-posed as several solutions may have equivalent likelihoods, also optimising purely for image similarity can yield implausible deformable transformations. For these reasons regularization terms are essential to obtain meaningful registration results. However, this requires the introduction of at least one hyperparameter, often termed λ, which serves as a trade-off between loss terms. In some approaches and situations, the quality of the estimated transformation greatly depends on hyperparameter choice, and different choices may be required depending on the characteristics of the data. Analyzing the effect of these hyperparameters requires labelled data, which is not commonly available at test-time. In this paper, we propose a novel method for evaluating the influence of hyperparameters and subsequently selecting an optimal value for given pair of images. Our approach, which we call HyperPredict, implements a Multi-Layer Perceptron that learns the effect of selecting particular hyperparameters for registering an image pair by predicting the resulting segmentation overlap and measures of deformation smoothness. This approach enables us to select optimal hyperparameters at test time without requiring labelled data, removing the need for a one-size-fits-all cross-validation approach. Furthermore, the criteria used to define optimal hyperparameter is flexible post-training, allowing us to efficiently choose specific properties (e.g. overlap of specific anatomical regions of interest, smoothness/plausibility of the final displacement field). We evaluate our proposed method on the OASIS brain MR standard benchmark dataset using a recent deep learning approach (cLapIRN) and an algorithmic method (Niftyreg). Our results demonstrate good performance in predicting the effects of regularization hyperparameters and highlight the benefits of our image-pair specific approach to hyperparameter selection.
{"title":"HyperPredict: Estimating Hyperparameter Effects for Instance-Specific Regularization in Deformable Image Registration","authors":"Aisha L. Shuaibu, Ivor J. A. Simpson","doi":"10.59275/j.melba.2024-d434","DOIUrl":"https://doi.org/10.59275/j.melba.2024-d434","url":null,"abstract":"Methods for medical image registration infer geometric transformations that align pairs, or groups, of images by maximising an image similarity metric. This problem is ill-posed as several solutions may have equivalent likelihoods, also optimising purely for image similarity can yield implausible deformable transformations. For these reasons regularization terms are essential to obtain meaningful registration results. However, this requires the introduction of at least one hyperparameter, often termed λ, which serves as a trade-off between loss terms. In some approaches and situations, the quality of the estimated transformation greatly depends on hyperparameter choice, and different choices may be required depending on the characteristics of the data. Analyzing the effect of these hyperparameters requires labelled data, which is not commonly available at test-time. In this paper, we propose a novel method for evaluating the influence of hyperparameters and subsequently selecting an optimal value for given pair of images. Our approach, which we call HyperPredict, implements a Multi-Layer Perceptron that learns the effect of selecting particular hyperparameters for registering an image pair by predicting the resulting segmentation overlap and measures of deformation smoothness. This approach enables us to select optimal hyperparameters at test time without requiring labelled data, removing the need for a one-size-fits-all cross-validation approach. Furthermore, the criteria used to define optimal hyperparameter is flexible post-training, allowing us to efficiently choose specific properties (e.g. overlap of specific anatomical regions of interest, smoothness/plausibility of the final displacement field). We evaluate our proposed method on the OASIS brain MR standard benchmark dataset using a recent deep learning approach (cLapIRN) and an algorithmic method (Niftyreg). Our results demonstrate good performance in predicting the effects of regularization hyperparameters and highlight the benefits of our image-pair specific approach to hyperparameter selection.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"18 3‐4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.1109/ICASSP48485.2024.10447643
Bo Li, Yuyan Chen, Liang Zeng
Multi-Label Text Classification (MLTC) is a fundamental task in the field of Natural Language Processing (NLP) that involves the assignment of multiple labels to a given text. MLTC has gained significant importance and has been widely applied in various domains such as topic recognition, recommendation systems, sentiment analysis, and information retrieval. However, traditional machine learning and Deep neural network have not yet addressed certain issues, such as the fact that some documents are brief but have a large number of labels and how to establish relationships between the labels. It is imperative to additionally acknowledge that the significance of knowledge is substantiated in the realm of MLTC. To address this issue, we provide a novel approach known as Knowledge-enhanced Doc-Label Attention Network (KeNet). Specifically, we design an Attention Network that incorporates external knowledge, label embedding, and a comprehensive attention mechanism. In contrast to conventional methods, we use comprehensive representation of documents, knowledge and labels to predict all labels for each single text. Our approach has been validated by comprehensive research conducted on three multi-label datasets. Experimental results demonstrate that our method outperforms state-of-the-art MLTC method. Additionally, a case study is undertaken to illustrate the practical implementation of KeNet.
{"title":"KeNet: Knowledge-enhanced Doc-Label Attention Network for Multi-label text classification","authors":"Bo Li, Yuyan Chen, Liang Zeng","doi":"10.1109/ICASSP48485.2024.10447643","DOIUrl":"https://doi.org/10.1109/ICASSP48485.2024.10447643","url":null,"abstract":"Multi-Label Text Classification (MLTC) is a fundamental task in the field of Natural Language Processing (NLP) that involves the assignment of multiple labels to a given text. MLTC has gained significant importance and has been widely applied in various domains such as topic recognition, recommendation systems, sentiment analysis, and information retrieval. However, traditional machine learning and Deep neural network have not yet addressed certain issues, such as the fact that some documents are brief but have a large number of labels and how to establish relationships between the labels. It is imperative to additionally acknowledge that the significance of knowledge is substantiated in the realm of MLTC. To address this issue, we provide a novel approach known as Knowledge-enhanced Doc-Label Attention Network (KeNet). Specifically, we design an Attention Network that incorporates external knowledge, label embedding, and a comprehensive attention mechanism. In contrast to conventional methods, we use comprehensive representation of documents, knowledge and labels to predict all labels for each single text. Our approach has been validated by comprehensive research conducted on three multi-label datasets. Experimental results demonstrate that our method outperforms state-of-the-art MLTC method. Additionally, a case study is undertaken to illustrate the practical implementation of KeNet.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Programming assistants have reshaped the experience of programming into one where programmers spend less time writing and more time critically examining code. In this paper, we explore how programming assistants can be extended to accelerate the inspection of generated code. We introduce an extension to the programming assistant called Ivie, or instantly visible in-situ explanations. When using Ivie, a programmer's generated code is instantly accompanied by explanations positioned just adjacent to the code. Our design was optimized for extremely low-cost invocation and dismissal. Explanations are compact and informative. They describe meaningful expressions, from individual variables to entire blocks of code. We present an implementation of Ivie that forks VS Code, applying a modern LLM for timely segmentation and explanation of generated code. In a lab study, we compared Ivie to a contemporary baseline tool for code understanding. Ivie improved understanding of generated code, and was received by programmers as a highly useful, low distraction, desirable complement to the programming assistant.
{"title":"Ivie: Lightweight Anchored Explanations of Just-Generated Code","authors":"Litao Yan, Alyssa Hwang, Zhiyuan Wu, Andrew Head","doi":"10.1145/3613904.3642239","DOIUrl":"https://doi.org/10.1145/3613904.3642239","url":null,"abstract":"Programming assistants have reshaped the experience of programming into one where programmers spend less time writing and more time critically examining code. In this paper, we explore how programming assistants can be extended to accelerate the inspection of generated code. We introduce an extension to the programming assistant called Ivie, or instantly visible in-situ explanations. When using Ivie, a programmer's generated code is instantly accompanied by explanations positioned just adjacent to the code. Our design was optimized for extremely low-cost invocation and dismissal. Explanations are compact and informative. They describe meaningful expressions, from individual variables to entire blocks of code. We present an implementation of Ivie that forks VS Code, applying a modern LLM for timely segmentation and explanation of generated code. In a lab study, we compared Ivie to a contemporary baseline tool for code understanding. Ivie improved understanding of generated code, and was received by programmers as a highly useful, low distraction, desirable complement to the programming assistant.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"4 5‐6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent developments in prompt-based generative AI has given rise to discourse surrounding the perceived ethical concerns, economic implications, and consequences for the future of cultural production. As generative imagery becomes pervasive in mainstream society, dominated primarily by emerging industry leaders, we encourage that the role of the CHI community be one of inquiry; to investigate the numerous ways in which generative AI has the potential to, and already is, augmenting human creativity. In this paper, we conducted a diffractive analysis exploring the potential role of prompt-based interfaces in artists' creative practice. Over a two week period, seven visual artists were given access to a personalised instance of Stable Diffusion, fine-tuned on a dataset of their work. In the following diffractive analysis, we identified two dominant modes adopted by participants, AI for ideation, and AI for production. We furthermore present a number of ethical design considerations for the future development of generative AI interfaces.
{"title":"Towards A Diffractive Analysis of Prompt-Based Generative AI","authors":"Nina Rajcic, M. T. Llano, Jon McCormack","doi":"10.1145/3613904.3641971","DOIUrl":"https://doi.org/10.1145/3613904.3641971","url":null,"abstract":"Recent developments in prompt-based generative AI has given rise to discourse surrounding the perceived ethical concerns, economic implications, and consequences for the future of cultural production. As generative imagery becomes pervasive in mainstream society, dominated primarily by emerging industry leaders, we encourage that the role of the CHI community be one of inquiry; to investigate the numerous ways in which generative AI has the potential to, and already is, augmenting human creativity. In this paper, we conducted a diffractive analysis exploring the potential role of prompt-based interfaces in artists' creative practice. Over a two week period, seven visual artists were given access to a personalised instance of Stable Diffusion, fine-tuned on a dataset of their work. In the following diffractive analysis, we identified two dominant modes adopted by participants, AI for ideation, and AI for production. We furthermore present a number of ethical design considerations for the future development of generative AI interfaces.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"27 1‐2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.1609/aaai.v38i3.27968
Shuai Guo, Q. Wang, Yijie Gao, Rong Xie, Li Song
Novel-view synthesis with sparse input views is important for real-world applications like AR/VR and autonomous driving. Recent methods have integrated depth information into NeRFs for sparse input synthesis, leveraging depth prior for geometric and spatial understanding. However, most existing works tend to overlook inaccuracies within depth maps and have low time efficiency. To address these issues, we propose a depth-guided robust and fast point cloud fusion NeRF for sparse inputs. We perceive radiance fields as an explicit voxel grid of features. A point cloud is constructed for each input view, characterized within the voxel grid using matrices and vectors. We accumulate the point cloud of each input view to construct the fused point cloud of the entire scene. Each voxel determines its density and appearance by referring to the point cloud of the entire scene. Through point cloud fusion and voxel grid fine-tuning, inaccuracies in depth values are refined or substituted by those from other views. Moreover, our method can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition. Experimental results underline the superior performance and time efficiency of our approach compared to state-of-the-art baselines.
{"title":"Depth-Guided Robust and Fast Point Cloud Fusion NeRF for Sparse Input Views","authors":"Shuai Guo, Q. Wang, Yijie Gao, Rong Xie, Li Song","doi":"10.1609/aaai.v38i3.27968","DOIUrl":"https://doi.org/10.1609/aaai.v38i3.27968","url":null,"abstract":"Novel-view synthesis with sparse input views is important for real-world applications like AR/VR and autonomous driving. Recent methods have integrated depth information into NeRFs for sparse input synthesis, leveraging depth prior for geometric and spatial understanding. However, most existing works tend to overlook inaccuracies within depth maps and have low time efficiency. To address these issues, we propose a depth-guided robust and fast point cloud fusion NeRF for sparse inputs. We perceive radiance fields as an explicit voxel grid of features. A point cloud is constructed for each input view, characterized within the voxel grid using matrices and vectors. We accumulate the point cloud of each input view to construct the fused point cloud of the entire scene. Each voxel determines its density and appearance by referring to the point cloud of the entire scene. Through point cloud fusion and voxel grid fine-tuning, inaccuracies in depth values are refined or substituted by those from other views. Moreover, our method can achieve faster reconstruction and greater compactness through effective vector-matrix decomposition. Experimental results underline the superior performance and time efficiency of our approach compared to state-of-the-art baselines.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"115 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
People have to remember an ever-expanding volume of information. Wearables that use information capture and retrieval for memory augmentation can help but can be disruptive and cumbersome in real-world tasks, such as in social settings. To address this, we developed Memoro, a wearable audio-based memory assistant with a concise user interface. Memoro uses a large language model (LLM) to infer the user's memory needs in a conversational context, semantically search memories, and present minimal suggestions. The assistant has two interaction modes: Query Mode for voicing queries and Queryless Mode for on-demand predictive assistance, without explicit query. Our study of (N=20) participants engaged in a real-time conversation demonstrated that using Memoro reduced device interaction time and increased recall confidence while preserving conversational quality. We report quantitative results and discuss the preferences and experiences of users. This work contributes towards utilizing LLMs to design wearable memory augmentation systems that are minimally disruptive.
{"title":"Memoro: Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation","authors":"Wazeer Zulfikar, Samantha Chan, Pattie Maes","doi":"10.1145/3613904.3642450","DOIUrl":"https://doi.org/10.1145/3613904.3642450","url":null,"abstract":"People have to remember an ever-expanding volume of information. Wearables that use information capture and retrieval for memory augmentation can help but can be disruptive and cumbersome in real-world tasks, such as in social settings. To address this, we developed Memoro, a wearable audio-based memory assistant with a concise user interface. Memoro uses a large language model (LLM) to infer the user's memory needs in a conversational context, semantically search memories, and present minimal suggestions. The assistant has two interaction modes: Query Mode for voicing queries and Queryless Mode for on-demand predictive assistance, without explicit query. Our study of (N=20) participants engaged in a real-time conversation demonstrated that using Memoro reduced device interaction time and increased recall confidence while preserving conversational quality. We report quantitative results and discuss the preferences and experiences of users. This work contributes towards utilizing LLMs to design wearable memory augmentation systems that are minimally disruptive.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"110 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As intelligent agents transition from controlled to uncontrolled environments, they face challenges that sometimes exceed their operational capabilities. In many scenarios, they rely on assistance from bystanders to overcome those challenges. Using robots that get stuck in urban settings as an example, we investigate how agents can prompt bystanders into providing assistance. We conducted four focus group sessions with 17 participants that involved bodystorming, where participants assumed the role of robots and bystander pedestrians in role-playing activities. Generating insights from both assumed robot and bystander perspectives, we were able to identify potential non-verbal help-seeking strategies (i.e., addressing bystanders, cueing intentions, and displaying emotions) and factors shaping the assistive behaviours of bystanders. Drawing on these findings, we offer design considerations for help-seeking urban robots and other agents operating in uncontrolled environments to foster casual collaboration, encompass expressiveness, align with agent social categories, and curate appropriate incentives.
{"title":"From Agent Autonomy to Casual Collaboration: A Design Investigation on Help-Seeking Urban Robots","authors":"Xinyan Yu, Marius Hoggenmueller, M. Tomitsch","doi":"10.1145/3613904.3642389","DOIUrl":"https://doi.org/10.1145/3613904.3642389","url":null,"abstract":"As intelligent agents transition from controlled to uncontrolled environments, they face challenges that sometimes exceed their operational capabilities. In many scenarios, they rely on assistance from bystanders to overcome those challenges. Using robots that get stuck in urban settings as an example, we investigate how agents can prompt bystanders into providing assistance. We conducted four focus group sessions with 17 participants that involved bodystorming, where participants assumed the role of robots and bystander pedestrians in role-playing activities. Generating insights from both assumed robot and bystander perspectives, we were able to identify potential non-verbal help-seeking strategies (i.e., addressing bystanders, cueing intentions, and displaying emotions) and factors shaping the assistive behaviours of bystanders. Drawing on these findings, we offer design considerations for help-seeking urban robots and other agents operating in uncontrolled environments to foster casual collaboration, encompass expressiveness, align with agent social categories, and curate appropriate incentives.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"137 1‐3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for developing more robust malware detection systems in the digital era.
{"title":"Comprehensive evaluation of Mal-API-2019 dataset by machine learning in malware detection","authors":"Zhenglin Li, Haibei Zhu, Houze Liu, Jintong Song, Qishuo Cheng","doi":"10.62051/ijcsit.v2n1.01","DOIUrl":"https://doi.org/10.62051/ijcsit.v2n1.01","url":null,"abstract":"This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for developing more robust malware detection systems in the digital era.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"14 4‐6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.1109/icassp48485.2024.10445878
Kuan-Hsun Ho, J. Hung, Berlin Chen
This study introduces a reformed Sinc-convolution (Sincconv) framework tailored for the encoder component of deep networks for speech enhancement (SE). The reformed Sincconv, based on parametrized sinc functions as band-pass filters, offers notable advantages in terms of training efficiency, filter diversity, and interpretability. The reformed Sinc-conv is evaluated in conjunction with various SE models, showcasing its ability to boost SE performance. Furthermore, the reformed Sincconv provides valuable insights into the specific frequency components that are prioritized in an SE scenario. This opens up a new direction of SE research and improving our knowledge of their operating dynamics.
本研究介绍了一种为语音增强(SE)深度网络编码器组件量身定制的改革 Sinc-卷积(Sincconv)框架。改革后的 Sincconv 基于参数化 sinc 函数作为带通滤波器,在训练效率、滤波器多样性和可解释性方面具有显著优势。改革后的 Sincconv 结合各种 SE 模型进行了评估,展示了其提高 SE 性能的能力。此外,改革后的 Sincconv 对 SE 场景中优先考虑的特定频率成分提供了有价值的见解。这开辟了 SE 研究的新方向,并提高了我们对其运行动态的认识。
{"title":"What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution","authors":"Kuan-Hsun Ho, J. Hung, Berlin Chen","doi":"10.1109/icassp48485.2024.10445878","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10445878","url":null,"abstract":"This study introduces a reformed Sinc-convolution (Sincconv) framework tailored for the encoder component of deep networks for speech enhancement (SE). The reformed Sincconv, based on parametrized sinc functions as band-pass filters, offers notable advantages in terms of training efficiency, filter diversity, and interpretability. The reformed Sinc-conv is evaluated in conjunction with various SE models, showcasing its ability to boost SE performance. Furthermore, the reformed Sincconv provides valuable insights into the specific frequency components that are prioritized in an SE scenario. This opens up a new direction of SE research and improving our knowledge of their operating dynamics.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"101 10‐12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-04DOI: 10.14722/madweb.2024.23035
Naif Mehanna, Walter Rudametkin, Pierre Laperdrix, Antoine Vastel
Free-proxies have been widespread since the early days of the Web, helping users bypass geo-blocked content and conceal their IP addresses. Various proxy providers promise faster Internet or increased privacy while advertising their lists comprised of hundreds of readily available free proxies. However, while paid proxy services advertise the support of encrypted connections and high stability, free proxies often lack such guarantees, making them prone to malicious activities such as eavesdropping or modifying content. Furthermore, there is a market that encourages exploiting devices to install proxies. In this paper, we present a 30-month longitudinal study analyzing the stability, security, and potential manipulation of free web proxies that we collected from 11 providers. Our collection resulted in over 640,600 proxies, that we cumulatively tested daily. We find that only 34.5% of proxies were active at least once during our tests, showcasing the general instability of free proxies. Geographically, a majority of proxies originate from the US and China. Leveraging the Shodan search engine, we identified 4,452 distinct vulnerabilities on the proxies' IP addresses, including 1,755 vulnerabilities that allow unauthorized remote code execution and 2,036 that enable privilege escalation on the host device. Through the software analysis on the proxies' IP addresses, we find that 42,206 of them appear to run on MikroTik routers. Worryingly, we also discovered 16,923 proxies that manipulate content, indicating potential malicious intent by proxy owners. Ultimately, our research reveals that the use of free web proxies poses significant risks to users' privacy and security. The instability, vulnerabilities, and potential for malicious actions uncovered in our analysis lead us to strongly caution users against relying on free proxies.
免费代理从网络诞生之初就开始普及,它可以帮助用户绕过地理封锁,隐藏自己的 IP 地址。各种代理服务器提供商承诺提供更快的上网速度或更高的隐私保护,同时宣传他们的列表由数百个随时可用的免费代理服务器组成。然而,虽然付费代理服务宣传支持加密连接和高稳定性,但免费代理往往缺乏此类保证,因此容易发生恶意活动,如窃听或修改内容。此外,市场还鼓励利用设备安装代理服务器。在本文中,我们介绍了一项为期 30 个月的纵向研究,分析了我们从 11 个提供商处收集的免费网络代理的稳定性、安全性和潜在操纵性。我们收集了超过 640,600 个代理服务器,每天对其进行累积测试。我们发现,只有 34.5% 的代理服务器在测试期间至少活跃过一次,这表明免费代理服务器普遍存在不稳定性。从地域上看,大多数代理服务器来自美国和中国。利用 Shodan 搜索引擎,我们在代理服务器的 IP 地址上发现了 4,452 个不同的漏洞,其中 1,755 个漏洞允许未经授权的远程代码执行,2,036 个漏洞允许主机设备上的权限升级。通过对代理服务器 IP 地址的软件分析,我们发现其中 42,206 个代理服务器似乎运行在 MikroTik 路由器上。令人担忧的是,我们还发现 16,923 个代理程序操纵内容,这表明代理程序所有者可能有恶意意图。最终,我们的研究表明,使用免费网络代理会给用户的隐私和安全带来巨大风险。我们在分析中发现的不稳定性、漏洞和潜在的恶意行为使我们强烈警告用户不要依赖免费代理。
{"title":"Free Proxies Unmasked: A Vulnerability and Longitudinal Analysis of Free Proxy Services","authors":"Naif Mehanna, Walter Rudametkin, Pierre Laperdrix, Antoine Vastel","doi":"10.14722/madweb.2024.23035","DOIUrl":"https://doi.org/10.14722/madweb.2024.23035","url":null,"abstract":"Free-proxies have been widespread since the early days of the Web, helping users bypass geo-blocked content and conceal their IP addresses. Various proxy providers promise faster Internet or increased privacy while advertising their lists comprised of hundreds of readily available free proxies. However, while paid proxy services advertise the support of encrypted connections and high stability, free proxies often lack such guarantees, making them prone to malicious activities such as eavesdropping or modifying content. Furthermore, there is a market that encourages exploiting devices to install proxies. In this paper, we present a 30-month longitudinal study analyzing the stability, security, and potential manipulation of free web proxies that we collected from 11 providers. Our collection resulted in over 640,600 proxies, that we cumulatively tested daily. We find that only 34.5% of proxies were active at least once during our tests, showcasing the general instability of free proxies. Geographically, a majority of proxies originate from the US and China. Leveraging the Shodan search engine, we identified 4,452 distinct vulnerabilities on the proxies' IP addresses, including 1,755 vulnerabilities that allow unauthorized remote code execution and 2,036 that enable privilege escalation on the host device. Through the software analysis on the proxies' IP addresses, we find that 42,206 of them appear to run on MikroTik routers. Worryingly, we also discovered 16,923 proxies that manipulate content, indicating potential malicious intent by proxy owners. Ultimately, our research reveals that the use of free web proxies poses significant risks to users' privacy and security. The instability, vulnerabilities, and potential for malicious actions uncovered in our analysis lead us to strongly caution users against relying on free proxies.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"25 1‐2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140397909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}