首页 > 最新文献

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence最新文献

英文 中文
MTDiag: An Effective Multi-Task Framework for Automatic Diagnosis MTDiag:一个有效的多任务自动诊断框架
Zhenyu Hou, Yukuo Cen, Ziding Liu, Dongxue Wu, Baoyan Wang, Xuanhe Li, Lei Hong, Jie Tang
Automatic diagnosis systems aim to probe for symptoms (i.e., symptom checking) and diagnose disease through multi-turn conversations with patients. Most previous works formulate it as a sequential decision process and use reinforcement learning (RL) to decide whether to inquire about symptoms or make a diagnosis. However, these RL-based methods heavily rely on the elaborate reward function and usually suffer from an unstable training process and low data efficiency. In this work, we propose an effective multi-task framework for automatic diagnosis called MTDiag. We first reformulate symptom checking as a multi-label classification task by direct supervision. Each medical dialogue is equivalently converted into multiple samples for classification, which can also help alleviate the data scarcity problem. Furthermore, we design a multi-task learning strategy to guide the symptom checking procedure with disease information and further utilize contrastive learning to better distinguish symptoms between diseases. Extensive experimental results show that our method achieves state-of-the-art performance on four public datasets with 1.7%~3.1% improvement in disease diagnosis, demonstrating the superiority of the proposed method. Additionally, our model is now deployed in an online medical consultant system as an assistant tool for real-life doctors.
自动诊断系统旨在通过与患者的多回合对话来探测症状(即症状检查)并诊断疾病。大多数先前的工作将其表述为一个顺序决策过程,并使用强化学习(RL)来决定是询问症状还是做出诊断。然而,这些基于强化学习的方法严重依赖于复杂的奖励函数,并且通常存在训练过程不稳定和数据效率低的问题。在这项工作中,我们提出了一个有效的多任务自动诊断框架,称为MTDiag。我们首先通过直接监督将症状检查重新定义为多标签分类任务。将每个医学对话等价地转换成多个样本进行分类,也有助于缓解数据稀缺问题。此外,我们设计了一个多任务学习策略来指导疾病信息的症状检查过程,并进一步利用对比学习来更好地区分疾病之间的症状。大量的实验结果表明,我们的方法在四个公共数据集上达到了最先进的性能,疾病诊断提高了1.7%~3.1%,证明了所提出方法的优越性。此外,我们的模型现在被部署在一个在线医疗咨询系统中,作为现实生活中医生的辅助工具。
{"title":"MTDiag: An Effective Multi-Task Framework for Automatic Diagnosis","authors":"Zhenyu Hou, Yukuo Cen, Ziding Liu, Dongxue Wu, Baoyan Wang, Xuanhe Li, Lei Hong, Jie Tang","doi":"10.1609/aaai.v37i12.26666","DOIUrl":"https://doi.org/10.1609/aaai.v37i12.26666","url":null,"abstract":"Automatic diagnosis systems aim to probe for symptoms (i.e., symptom checking) and diagnose disease through multi-turn conversations with patients. Most previous works formulate it as a sequential decision process and use reinforcement learning (RL) to decide whether to inquire about symptoms or make a diagnosis. However, these RL-based methods heavily rely on the elaborate reward function and usually suffer from an unstable training process and low data efficiency. In this work, we propose an effective multi-task framework for automatic diagnosis called MTDiag. We first reformulate symptom checking as a multi-label classification task by direct supervision. Each medical dialogue is equivalently converted into multiple samples for classification, which can also help alleviate the data scarcity problem. Furthermore, we design a multi-task learning strategy to guide the symptom checking procedure with disease information and further utilize contrastive learning to better distinguish symptoms between diseases. Extensive experimental results show that our method achieves state-of-the-art performance on four public datasets with 1.7%~3.1% improvement in disease diagnosis, demonstrating the superiority of the proposed method. Additionally, our model is now deployed in an online medical consultant system as an assistant tool for real-life doctors.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"44 1","pages":"14241-14248"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81106184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
See Your Emotion from Gait Using Unlabeled Skeleton Data 使用未标记的骨骼数据从步态中看到你的情绪
Haifeng Lu, Xiping Hu, B. Hu
This paper focuses on contrastive learning for gait-based emotion recognition. The existing contrastive learning approaches are rarely suitable for learning skeleton-based gait representations, which suffer from limited gait diversity and inconsistent semantics. In this paper, we propose a Cross-coordinate contrastive learning framework utilizing Ambiguity samples for self-supervised Gait-based Emotion representation (CAGE). First, we propose ambiguity transform to push positive samples into ambiguous semantic space. By learning similarities between ambiguity samples and positive samples, our model can learn higher-level semantics of the gait sequences and maintain semantic diversity. Second, to encourage learning the semantic invariance, we uniquely propose cross-coordinate contrastive learning between the Cartesian coordinate and the Spherical coordinate, which brings rich supervisory signals to learn the intrinsic semantic consistency information. Exhaustive experiments show that CAGE improves existing self-supervised methods by 5%–10% accuracy, and it achieves comparable or even superior performance to supervised methods.
本文主要研究基于步态的情感识别中的对比学习。现有的对比学习方法很少适用于基于骨骼的步态表征,步态多样性有限,语义不一致。在本文中,我们提出了一种基于模糊样本的交叉坐标对比学习框架,用于自监督基于步态的情感表示(CAGE)。首先,我们提出了歧义变换,将阳性样本推入歧义语义空间。通过学习歧义样本和正样本之间的相似性,我们的模型可以学习步态序列的高级语义,并保持语义的多样性。其次,为了促进语义不变性的学习,我们独特地提出了笛卡尔坐标与球坐标之间的跨坐标对比学习,这为学习内在的语义一致性信息提供了丰富的监督信号。穷举实验表明,CAGE将现有的自监督方法的准确率提高了5%-10%,达到了与有监督方法相当甚至更好的性能。
{"title":"See Your Emotion from Gait Using Unlabeled Skeleton Data","authors":"Haifeng Lu, Xiping Hu, B. Hu","doi":"10.1609/aaai.v37i2.25272","DOIUrl":"https://doi.org/10.1609/aaai.v37i2.25272","url":null,"abstract":"This paper focuses on contrastive learning for gait-based emotion recognition. The existing contrastive learning approaches are rarely suitable for learning skeleton-based gait representations, which suffer from limited gait diversity and inconsistent semantics. In this paper, we propose a Cross-coordinate contrastive learning framework utilizing Ambiguity samples for self-supervised Gait-based Emotion representation (CAGE). First, we propose ambiguity transform to push positive samples into ambiguous semantic space. By learning similarities between ambiguity samples and positive samples, our model can learn higher-level semantics of the gait sequences and maintain semantic diversity. Second, to encourage learning the semantic invariance, we uniquely propose cross-coordinate contrastive learning between the Cartesian coordinate and the Spherical coordinate, which brings rich supervisory signals to learn the intrinsic semantic consistency information. Exhaustive experiments show that CAGE improves existing self-supervised methods by 5%–10% accuracy, and it achieves comparable or even superior performance to supervised methods.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"33 1","pages":"1826-1834"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81187420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEID: Mixture-of-Experts with Internal Distillation for Long-Tailed Video Recognition 基于内部蒸馏的专家混合长尾视频识别
Xinjie Li, Huijuan Xu
The long-tailed video recognition problem is especially challenging, as videos tend to be long and untrimmed, and each video may contain multiple classes, causing frame-level class imbalance. The previous method tackles the long-tailed video recognition only through frame-level sampling for class re-balance without distinguishing the frame-level feature representation between head and tail classes. To improve the frame-level feature representation of tail classes, we modulate the frame-level features with an auxiliary distillation loss to reduce the distribution distance between head and tail classes. Moreover, we design a mixture-of-experts framework with two different expert designs, i.e., the first expert with an attention-based classification network handling the original long-tailed distribution, and the second expert dealing with the re-balanced distribution from class-balanced sampling. Notably, in the second expert, we specifically focus on the frames unsolved by the first expert through designing a complementary frame selection module, which inherits the attention weights from the first expert and selects frames with low attention weights, and we also enhance the motion feature representation for these selected frames. To highlight the multi-label challenge in long-tailed video recognition, we create two additional benchmarks based on Charades and CharadesEgo videos with the multi-label property, called CharadesLT and CharadesEgoLT. Extensive experiments are conducted on the existing long-tailed video benchmark VideoLT and the two new benchmarks to verify the effectiveness of our proposed method with state-of-the-art performance. The code and proposed benchmarks are released at https://github.com/VisionLanguageLab/MEID.
长尾视频识别问题尤其具有挑战性,因为视频往往很长且未经修剪,并且每个视频可能包含多个类,导致帧级类不平衡。以前的方法只通过帧级采样进行类重平衡来处理长尾视频识别,没有区分头尾类的帧级特征表示。为了改善尾类的帧级特征表示,我们使用辅助蒸馏损失来调节帧级特征,以减小头类和尾类之间的分布距离。此外,我们设计了一个混合专家框架,采用两种不同的专家设计,即第一个专家使用基于注意力的分类网络处理原始长尾分布,第二个专家处理来自类平衡抽样的重新平衡分布。值得注意的是,在第二个专家中,我们通过设计一个补充帧选择模块,专门针对第一个专家未解决的帧,该模块继承了第一个专家的注意权值,选择了注意权值低的帧,并增强了这些被选择帧的运动特征表示。为了突出长尾视频识别中的多标签挑战,我们基于具有多标签属性的CharadesLT和CharadesEgoLT视频创建了两个额外的基准,称为CharadesLT和CharadesEgoLT。在现有的长尾视频基准VideoLT和两个新的基准上进行了大量的实验,以验证我们提出的方法的有效性和最先进的性能。代码和建议的基准测试在https://github.com/VisionLanguageLab/MEID上发布。
{"title":"MEID: Mixture-of-Experts with Internal Distillation for Long-Tailed Video Recognition","authors":"Xinjie Li, Huijuan Xu","doi":"10.1609/aaai.v37i2.25230","DOIUrl":"https://doi.org/10.1609/aaai.v37i2.25230","url":null,"abstract":"The long-tailed video recognition problem is especially challenging, as videos tend to be long and untrimmed, and each video may contain multiple classes, causing frame-level class imbalance. The previous method tackles the long-tailed video recognition only through frame-level sampling for class re-balance without distinguishing the frame-level feature representation between head and tail classes. To improve the frame-level feature representation of tail classes, we modulate the frame-level features with an auxiliary distillation loss to reduce the distribution distance between head and tail classes. Moreover, we design a mixture-of-experts framework with two different expert designs, i.e., the first expert with an attention-based classification network handling the original long-tailed distribution, and the second expert dealing with the re-balanced distribution from class-balanced sampling. Notably, in the second expert, we specifically focus on the frames unsolved by the first expert through designing a complementary frame selection module, which inherits the attention weights from the first expert and selects frames with low attention weights, and we also enhance the motion feature representation for these selected frames. To highlight the multi-label challenge in long-tailed video recognition, we create two additional benchmarks based on Charades and CharadesEgo videos with the multi-label property, called CharadesLT and CharadesEgoLT. Extensive experiments are conducted on the existing long-tailed video benchmark VideoLT and the two new benchmarks to verify the effectiveness of our proposed method with state-of-the-art performance. The code and proposed benchmarks are released at https://github.com/VisionLanguageLab/MEID.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"26 1","pages":"1451-1459"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81211794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PGSS: Pitch-Guided Speech Separation 音高引导语音分离
Xiang Li, Yiwen Wang, Yifan Sun, Xihong Wu, J. Chen
Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.
单耳语音分离旨在从单麦克风混合录音中分离并发说话者。受音高启动在听觉场景分析(ASA)机制中的作用启发,本文提出了一种新的音高引导语音分离框架。该框架的突出优点是,利用基音轮廓作为引导目标说话人的主要手段,可以避免一般模型中存在的排列问题和未知说话人数量问题。此外,采用对抗训练代替传统的时频掩模,提高分离语音的感知质量。具体而言,该框架可分为两个阶段:音高提取和语音分离。前者旨在从混合中提取每个说话者的候选音高轮廓,模拟ASA机制中的自下而上过程。第二阶段采用条件生成对抗网络(conditional generative adversarial network, CGAN),选取任意音高轮廓作为分离相应说话人的条件。第二阶段模拟音调启动对ASA的影响。在WSJ0-2mix语料上的实验表明,与基线模型相比,所提方法具有更高的基音提取精度和更好的分离性能,具有应用于SOTA体系结构的潜力。
{"title":"PGSS: Pitch-Guided Speech Separation","authors":"Xiang Li, Yiwen Wang, Yifan Sun, Xihong Wu, J. Chen","doi":"10.1609/aaai.v37i11.26542","DOIUrl":"https://doi.org/10.1609/aaai.v37i11.26542","url":null,"abstract":"Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"29 1","pages":"13130-13138"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82040326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BETA-CD: A Bayesian Meta-Learned Cognitive Diagnosis Framework for Personalized Learning 个性化学习的贝叶斯元学习认知诊断框架
Haoyang Bi, Enhong Chen, Weidong He, Han Wu, Weihao Zhao, Shijin Wang, Jinze Wu
Personalized learning is a promising educational approach that aims to provide high-quality personalized services for each student with minimum demands for practice data. The key to achieving that lies in the cognitive diagnosis task, which estimates the cognitive state of the student through his/her logged data of doing practice quizzes. Nevertheless, in the personalized learning scenario, existing cognitive diagnosis models suffer from the inability to (1) quickly adapt to new students using a small amount of data, and (2) measure the reliability of the diagnosis result to avoid improper services that mismatch the student's actual state. In this paper, we propose a general Bayesian mETA-learned Cognitive Diagnosis framework (BETA-CD), which addresses the two challenges by prior knowledge exploitation and model uncertainty quantification, respectively. Specifically, we firstly introduce Bayesian hierarchical modeling to associate each student's cognitive state with a shared prior distribution encoding prior knowledge and a personal posterior distribution indicating model uncertainty. Furthermore, we formulate a meta-learning objective to automatically exploit prior knowledge from historical students, and efficiently solve it with a gradient-based variational inference method. The code will be publicly available at https://github.com/AyiStar/pyat.
个性化学习是一种很有发展前景的教育方式,旨在以最小的实践数据需求为每个学生提供高质量的个性化服务。实现这一目标的关键在于认知诊断任务,该任务通过学生做练习题的记录数据来估计学生的认知状态。然而,在个性化学习场景下,现有的认知诊断模型存在以下缺陷:(1)使用少量数据无法快速适应新生;(2)无法衡量诊断结果的可靠性,以避免与学生实际状态不匹配的不当服务。本文提出了一种通用贝叶斯元学习认知诊断框架(BETA-CD),该框架分别解决了先验知识开发和模型不确定性量化这两个挑战。具体而言,我们首先引入贝叶斯分层建模,将每个学生的认知状态与编码先验知识的共享先验分布和表示模型不确定性的个人后验分布联系起来。此外,我们还制定了一个元学习目标来自动挖掘历史学生的先验知识,并利用基于梯度的变分推理方法有效地解决了这个问题。代码将在https://github.com/AyiStar/pyat上公开。
{"title":"BETA-CD: A Bayesian Meta-Learned Cognitive Diagnosis Framework for Personalized Learning","authors":"Haoyang Bi, Enhong Chen, Weidong He, Han Wu, Weihao Zhao, Shijin Wang, Jinze Wu","doi":"10.1609/aaai.v37i4.25629","DOIUrl":"https://doi.org/10.1609/aaai.v37i4.25629","url":null,"abstract":"Personalized learning is a promising educational approach that aims to provide high-quality personalized services for each student with minimum demands for practice data. The key to achieving that lies in the cognitive diagnosis task, which estimates the cognitive state of the student through his/her logged data of doing practice quizzes. Nevertheless, in the personalized learning scenario, existing cognitive diagnosis models suffer from the inability to (1) quickly adapt to new students using a small amount of data, and (2) measure the reliability of the diagnosis result to avoid improper services that mismatch the student's actual state. In this paper, we propose a general Bayesian mETA-learned Cognitive Diagnosis framework (BETA-CD), which addresses the two challenges by prior knowledge exploitation and model uncertainty quantification, respectively. Specifically, we firstly introduce Bayesian hierarchical modeling to associate each student's cognitive state with a shared prior distribution encoding prior knowledge and a personal posterior distribution indicating model uncertainty. Furthermore, we formulate a meta-learning objective to automatically exploit prior knowledge from historical students, and efficiently solve it with a gradient-based variational inference method. The code will be publicly available at https://github.com/AyiStar/pyat.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"72 1","pages":"5018-5026"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85703765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene-Level Sketch-Based Image Retrieval with Minimal Pairwise Supervision 最小成对监督的基于场景级草图的图像检索
Ce Ge, Jingyu Wang, Q. Qi, Haifeng Sun, Tong Xu, Jianxin Liao
The sketch-based image retrieval (SBIR) task has long been researched at the instance level, where both query sketches and candidate images are assumed to contain only one dominant object. This strong assumption constrains its application, especially with the increasingly popular intelligent terminals and human-computer interaction technology. In this work, a more general scene-level SBIR task is explored, where sketches and images can both contain multiple object instances. The new general task is extremely challenging due to several factors: (i) scene-level SBIR inherently shares sketch-specific difficulties with instance-level SBIR (e.g., sparsity, abstractness, and diversity), (ii) the cross-modal similarity is measured between two partially aligned domains (i.e., not all objects in images are drawn in scene sketches), and (iii) besides instance-level visual similarity, a more complex multi-dimensional scene-level feature matching problem is imposed (including appearance, semantics, layout, etc.). Addressing these challenges, a novel Conditional Graph Autoencoder model is proposed to deal with scene-level sketch-images retrieval. More importantly, the model can be trained with only pairwise supervision, which distinguishes our study from others in that elaborate instance-level annotations (for example, bounding boxes) are no longer required. Extensive experiments confirm the ability of our model to robustly retrieve multiple related objects at the scene level and exhibit superior performance beyond strong competitors.
基于草图的图像检索(SBIR)任务长期以来一直在实例级进行研究,其中假设查询草图和候选图像只包含一个主要对象。这种强烈的假设制约了它的应用,特别是随着智能终端和人机交互技术的日益普及。在这项工作中,我们探索了一个更一般的场景级SBIR任务,其中草图和图像都可以包含多个对象实例。由于以下几个因素,新的总体任务极具挑战性:(i)场景级SBIR固有地与实例级SBIR共享特定于草图的困难(例如,稀疏性,抽象性和多样性),(ii)跨模态相似性是在两个部分对齐的域之间测量的(即,并非图像中的所有对象都绘制在场景草图中),以及(iii)除了实例级视觉相似性之外,还引入了更复杂的多维场景级特征匹配问题(包括外观,语义,布局等)。针对这些挑战,提出了一种新的条件图自编码器模型来处理场景级草图图像检索。更重要的是,模型可以只使用两两监督进行训练,这使我们的研究与其他研究不同,因为不再需要详细的实例级注释(例如,边界框)。大量的实验证实了我们的模型能够在场景级别健壮地检索多个相关对象,并且表现出优于强大竞争对手的性能。
{"title":"Scene-Level Sketch-Based Image Retrieval with Minimal Pairwise Supervision","authors":"Ce Ge, Jingyu Wang, Q. Qi, Haifeng Sun, Tong Xu, Jianxin Liao","doi":"10.1609/aaai.v37i1.25141","DOIUrl":"https://doi.org/10.1609/aaai.v37i1.25141","url":null,"abstract":"The sketch-based image retrieval (SBIR) task has long been researched at the instance level, where both query sketches and candidate images are assumed to contain only one dominant object. This strong assumption constrains its application, especially with the increasingly popular intelligent terminals and human-computer interaction technology. In this work, a more general scene-level SBIR task is explored, where sketches and images can both contain multiple object instances. The new general task is extremely challenging due to several factors: (i) scene-level SBIR inherently shares sketch-specific difficulties with instance-level SBIR (e.g., sparsity, abstractness, and diversity), (ii) the cross-modal similarity is measured between two partially aligned domains (i.e., not all objects in images are drawn in scene sketches), and (iii) besides instance-level visual similarity, a more complex multi-dimensional scene-level feature matching problem is imposed (including appearance, semantics, layout, etc.). Addressing these challenges, a novel Conditional Graph Autoencoder model is proposed to deal with scene-level sketch-images retrieval. More importantly, the model can be trained with only pairwise supervision, which distinguishes our study from others in that elaborate instance-level annotations (for example, bounding boxes) are no longer required. Extensive experiments confirm the ability of our model to robustly retrieve multiple related objects at the scene level and exhibit superior performance beyond strong competitors.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"11 1","pages":"650-657"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84148600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electrophysiological Brain Source Imaging via Combinatorial Search with Provable Optimality 基于可证明最优性组合搜索的脑电生理源成像
Guihong Wan, Meng Jiao, Xinglong Ju, Yu Zhang, H. Schweitzer, Feng Liu
Electrophysiological Source Imaging (ESI) refers to reconstructing the underlying brain source activation from non-invasive Electroencephalography (EEG) and Magnetoencephalography (MEG) measurements on the scalp. Estimating the source locations and their extents is a fundamental tool in clinical and neuroscience applications. However, the estimation is challenging because of the ill-posedness and high coherence in the leadfield matrix as well as the noise in the EEG/MEG data. In this work, we proposed a combinatorial search framework to address the ESI problem with a provable optimality guarantee. Specifically, by exploiting the graph neighborhood information in the brain source space, we converted the ESI problem into a graph search problem and designed a combinatorial search algorithm under the framework of A* to solve it. The proposed algorithm is guaranteed to give an optimal solution to the ESI problem. Experimental results on both synthetic data and real epilepsy EEG data demonstrated that the proposed algorithm could faithfully reconstruct the source activation in the brain.
电生理源成像(Electrophysiological Source Imaging, ESI)是指通过头皮上的无创脑电图(EEG)和脑磁图(MEG)测量重建潜在的脑源激活。估计源位置及其范围是临床和神经科学应用的基本工具。然而,由于前导场矩阵的病态性和高相干性以及脑磁图数据中的噪声,估计是具有挑战性的。在这项工作中,我们提出了一个组合搜索框架来解决ESI问题,并提供了可证明的最优性保证。具体而言,我们利用脑源空间中的图邻域信息,将ESI问题转化为图搜索问题,并设计了a *框架下的组合搜索算法进行求解。该算法保证了ESI问题的最优解。在合成数据和真实癫痫脑电图数据上的实验结果表明,该算法能够真实地重建脑源激活。
{"title":"Electrophysiological Brain Source Imaging via Combinatorial Search with Provable Optimality","authors":"Guihong Wan, Meng Jiao, Xinglong Ju, Yu Zhang, H. Schweitzer, Feng Liu","doi":"10.1609/aaai.v37i10.26471","DOIUrl":"https://doi.org/10.1609/aaai.v37i10.26471","url":null,"abstract":"Electrophysiological Source Imaging (ESI) refers to reconstructing the underlying brain source activation from non-invasive Electroencephalography (EEG) and Magnetoencephalography (MEG) measurements on the scalp. Estimating the source locations and their extents is a fundamental tool in clinical and neuroscience applications. However, the estimation is challenging because of the ill-posedness and high coherence in the leadfield matrix as well as the noise in the EEG/MEG data. In this work, we proposed a combinatorial search framework to address the ESI problem with a provable optimality guarantee. Specifically, by exploiting the graph neighborhood information in the brain source space, we converted the ESI problem into a graph search problem and designed a combinatorial search algorithm under the framework of A* to solve it. The proposed algorithm is guaranteed to give an optimal solution to the ESI problem. Experimental results on both synthetic data and real epilepsy EEG data demonstrated that the proposed algorithm could faithfully reconstruct the source activation in the brain.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"62 1","pages":"12491-12499"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78339379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating Transferable 3D Adversarial Point Cloud via Random Perturbation Factorization 随机扰动分解生成可转移的三维对抗点云
Bangyan He, J. Liu, Yiming Li, Siyuan Liang, Jingzhi Li, Xiaojun Jia, Xiaochun Cao
Recent studies have demonstrated that existing deep neural networks (DNNs) on 3D point clouds are vulnerable to adversarial examples, especially under the white-box settings where the adversaries have access to model parameters. However, adversarial 3D point clouds generated by existing white-box methods have limited transferability across different DNN architectures. They have only minor threats in real-world scenarios under the black-box settings where the adversaries can only query the deployed victim model. In this paper, we revisit the transferability of adversarial 3D point clouds. We observe that an adversarial perturbation can be randomly factorized into two sub-perturbations, which are also likely to be adversarial perturbations. It motivates us to consider the effects of the perturbation and its sub-perturbations simultaneously to increase the transferability for sub-perturbations also contain helpful information. In this paper, we propose a simple yet effective attack method to generate more transferable adversarial 3D point clouds. Specifically, rather than simply optimizing the loss of perturbation alone, we combine it with its random factorization. We conduct experiments on benchmark dataset, verifying our method's effectiveness in increasing transferability while preserving high efficiency.
最近的研究表明,3D点云上现有的深度神经网络(dnn)容易受到对抗性示例的攻击,特别是在对手可以访问模型参数的白盒设置下。然而,由现有白盒方法生成的对抗性3D点云在不同深度神经网络架构之间的可转移性有限。在黑盒设置下的真实场景中,攻击者只能查询已部署的受害者模型,它们只有较小的威胁。在本文中,我们重新审视了对抗性三维点云的可转移性。我们观察到一个对抗性扰动可以被随机分解成两个子扰动,这两个子扰动也可能是对抗性扰动。它促使我们同时考虑扰动及其子扰动的影响,以增加子扰动的可转移性,因为子扰动也包含有用的信息。在本文中,我们提出了一种简单而有效的攻击方法来生成更多可转移的对抗三维点云。具体来说,我们不是简单地优化扰动损失,而是将其与随机分解相结合。我们在基准数据集上进行了实验,验证了我们的方法在提高可转移性的同时保持高效率的有效性。
{"title":"Generating Transferable 3D Adversarial Point Cloud via Random Perturbation Factorization","authors":"Bangyan He, J. Liu, Yiming Li, Siyuan Liang, Jingzhi Li, Xiaojun Jia, Xiaochun Cao","doi":"10.1609/aaai.v37i1.25154","DOIUrl":"https://doi.org/10.1609/aaai.v37i1.25154","url":null,"abstract":"Recent studies have demonstrated that existing deep neural networks (DNNs) on 3D point clouds are vulnerable to adversarial examples, especially under the white-box settings where the adversaries have access to model parameters. However, adversarial 3D point clouds generated by existing white-box methods have limited transferability across different DNN architectures. They have only minor threats in real-world scenarios under the black-box settings where the adversaries can only query the deployed victim model. In this paper, we revisit the transferability of adversarial 3D point clouds. We observe that an adversarial perturbation can be randomly factorized into two sub-perturbations, which are also likely to be adversarial perturbations. It motivates us to consider the effects of the perturbation and its sub-perturbations simultaneously to increase the transferability for sub-perturbations also contain helpful information. In this paper, we propose a simple yet effective attack method to generate more transferable adversarial 3D point clouds. Specifically, rather than simply optimizing the loss of perturbation alone, we combine it with its random factorization. We conduct experiments on benchmark dataset, verifying our method's effectiveness in increasing transferability while preserving high efficiency.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"2 1","pages":"764-772"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78477112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Ultrafast Euclidean Shortest Path Computation Using Hub Labeling 基于轮毂标记的超快速欧氏最短路径计算
Jinchun Du, Bojie Shen, M. A. Cheema
Finding shortest paths in a Euclidean plane containing polygonal obstacles is a well-studied problem motivated by a variety of real-world applications. The state-of-the-art algorithms require finding obstacle corners visible to the source and target, and need to consider potentially a large number of candidate paths. This adversely affects their query processing cost. We address these limitations by proposing a novel adaptation of hub labeling which is the state-of-the-art approach for shortest distance computation in road networks. Our experimental study conducted on the widely used benchmark maps shows that our approach is typically 1-2 orders of magnitude faster than two state-of-the-art algorithms.
在包含多边形障碍物的欧几里得平面上寻找最短路径是一个被广泛研究的问题,其动机是各种实际应用。最先进的算法需要找到源和目标可见的障碍角,并且需要考虑潜在的大量候选路径。这对它们的查询处理成本有不利影响。我们通过提出一种新的适应枢纽标签的方法来解决这些限制,这是最先进的道路网络中最短距离计算的方法。我们在广泛使用的基准地图上进行的实验研究表明,我们的方法通常比两种最先进的算法快1-2个数量级。
{"title":"Ultrafast Euclidean Shortest Path Computation Using Hub Labeling","authors":"Jinchun Du, Bojie Shen, M. A. Cheema","doi":"10.1609/aaai.v37i10.26463","DOIUrl":"https://doi.org/10.1609/aaai.v37i10.26463","url":null,"abstract":"Finding shortest paths in a Euclidean plane containing polygonal obstacles is a well-studied problem motivated by a variety of real-world applications. \u0000The state-of-the-art algorithms require finding obstacle corners visible to the source and target, and need to consider potentially a large number of candidate paths. This adversely affects their query processing cost. We address these limitations by proposing a novel adaptation of hub labeling which is the state-of-the-art approach for shortest distance computation in road networks. Our experimental study conducted on the widely used benchmark maps shows that our approach is typically 1-2 orders of magnitude faster than two state-of-the-art algorithms.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"113 1","pages":"12417-12426"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80322760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Structurally Restricted Fragments of Numeric Planning - a Complexity Analysis 数字规划的结构限制片段-复杂性分析
Alexander Shleyfman, Daniel Gnad, P. Jonsson
Numeric planning is known to be undecidable even under severe restrictions. Prior work has investigated the decidability boundaries by restricting the expressiveness of the planning formalism in terms of the numeric functions allowed in conditions and effects. We study a well-known restricted form of Hoffmann's simple numeric planning, which is undecidable. We analyze the complexity by imposing restrictions on the causal structure, exploiting a novel method for bounding variable domain sizes. First, we show that plan existence for tasks where all numeric variables are root nodes in the causal graph is in PSPACE.Second, we show that for tasks with only numeric leaf variables the problem is decidable, and that it is in PSPACE if the propositional state space has a fixed size. Our work lays a strong foundation for future investigations of structurally more complex tasks. From a practical perspective, our method allows to employ heuristics and methods that are geared towards finite variable domains (such as pattern database heuristics or decoupled search) to solve non-trivial families of numeric planning problems.
众所周知,即使在严格的限制条件下,数字规划也是不可确定的。先前的工作通过限制规划形式在条件和效果中允许的数值函数的表达性来研究可决性边界。我们研究了霍夫曼简单数值规划的一种众所周知的限制形式,它是不可确定的。我们通过对因果结构施加限制来分析复杂性,开发了一种新的方法来限定可变域的大小。首先,我们证明了所有数值变量都是因果图根节点的任务的计划存在性在PSPACE中。其次,我们证明了对于只有数字叶变量的任务,问题是可决定的,并且如果命题状态空间具有固定大小,它是在PSPACE中。我们的工作为未来研究结构更复杂的任务奠定了坚实的基础。从实际的角度来看,我们的方法允许使用启发式和面向有限变量域的方法(如模式数据库启发式或解耦搜索)来解决非平凡的数字规划问题族。
{"title":"Structurally Restricted Fragments of Numeric Planning - a Complexity Analysis","authors":"Alexander Shleyfman, Daniel Gnad, P. Jonsson","doi":"10.1609/aaai.v37i10.26428","DOIUrl":"https://doi.org/10.1609/aaai.v37i10.26428","url":null,"abstract":"Numeric planning is known to be undecidable even under severe restrictions. Prior work has investigated the decidability boundaries by restricting the expressiveness of the planning formalism in terms of the numeric functions allowed in conditions and effects. We study a well-known restricted form of Hoffmann's simple numeric planning, which is undecidable. We analyze the complexity by imposing restrictions on the causal structure, exploiting a novel method for bounding variable domain sizes. First, we show that plan existence for tasks where all numeric variables are root nodes in the causal graph is in PSPACE.\u0000Second, we show that for tasks with only numeric leaf variables the problem is decidable, and that it is in PSPACE if the propositional state space has a fixed size. Our work lays a strong foundation for future investigations of structurally more complex tasks. From a practical perspective, our method allows to employ heuristics and methods that are geared towards finite variable domains (such as pattern database heuristics or decoupled search) to solve non-trivial families of numeric planning problems.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"163 1","pages":"12112-12119"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76634971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1