AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.002

Zhongtian Sun , Anoushka Harit , Alexandra I. Cristea , Jingyun Wang , Pietro Lio

Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market.

随着人工智能的兴起，研究人员对股价预测的兴趣日益浓厚，在金融投资领域，股价预测具有挑战性。尽管最近取得了这些进展，但许多研究仅限于通过循环神经网络(rnn)捕捉价格运动的时间序列特征，而忽略了其他关键的相关因素，如行业、股东和新闻。另一方面，图神经网络由于其在捕获实体之间的复杂关系和表示学习方面的优异性能而被广泛应用于各种任务。本文研究了利用图神经网络进行股票价格走势预测的有效性。受最近一项研究的启发，我们通过超图捕获了复杂的群体级信息(类似公司的共同运动)。与其他超图研究不同，我们还使用图模型来学习两两关系。此外，我们是第一个证明这个简单的图模型应该在使用rnn之前应用，而不是像之前的研究建议的那样在之后应用。在本文中，类似公司的长期依赖关系可以被下一个rnn学习，这增加了它们的可预测性。我们还应用对抗性训练来捕捉金融市场的随机性，并增强所提出模型的泛化性。因此，我们提出了一个新的集成学习框架来预测股票价格走势，命名为MONEY。它由(a)一个图卷积网络(GCN)组成，两两表示行业和价格信息;(b)一个超图卷积网络，通过在最后一个预测层之前的输入上添加扰动，通过超边进行对抗性训练，用于面向群体的信息传输。现实世界的数据实验表明，平均而言，MONEY的表现明显优于最先进的方法，在熊市中表现尤其出色。

{"title":"MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model","authors":"Zhongtian Sun , Anoushka Harit , Alexandra I. Cristea , Jingyun Wang , Pietro Lio","doi":"10.1016/j.aiopen.2023.10.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.002","url":null,"abstract":"<div><p>Stock price prediction is challenging in financial investment, with the AI boom leading to increased interest from researchers. Despite these recent advances, many studies are limited to capturing the time series characteristics of price movement via recurrent neural networks (RNNs) but neglect other critical relevant factors, such as industry, shareholders, and news. On the other hand, graph neural networks have been applied to a broad range of tasks due to their superior performance in capturing complex relations among entities and representation learning. This paper investigates the effectiveness of using graph neural networks for stock price movement prediction. Inspired by a recent study, we capture the complex group-level information (co-movement of similar companies) via hypergraphs. Unlike other hypergraph studies, we also use a graph model to learn pairwise relations. Moreover, we are the first to demonstrate that this simple graph model should be applied before using RNNs, rather than later, as prior research suggested. In this paper, the long-term dependencies of similar companies can be learnt by the next RNNs, which augments their predictability. We also apply adversarial training to capture the stochastic nature of the financial market and enhance the generalisation of the proposed model. Hence, we contribute with a novel ensemble learning framework to predict stock price movement, named MONEY. It is comprised of (a) a Graph Convolution Network (GCN), representing pairwise industry and price information and (b) a hypergraph convolution network for group-oriented information transmission via hyperedges with adversarial training by adding perturbations on inputs before the last prediction layer. Real-world data experiments demonstrate that MONEY significantly outperforms, on average, the state-of-the-art methods and performs particularly well in the bear market.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 165-174"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000189/pdfft?md5=40081746293fa3fdc23c059ee4dd4684&pid=1-s2.0-S2666651023000189-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92026116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive active learning for fairness with partial group label 基于部分分组标签的公平交互式主动学习

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.003

Zeyu Yang , Jizhi Zhang , Fuli Feng , Chongming Gao , Qifan Wang , Xiangnan He

The rapid development of AI technologies has found numerous applications across various domains in human society. Ensuring fairness and preventing discrimination are critical considerations in the development of AI models. However, incomplete information often hinders the complete collection of sensitive attributes in real-world applications, primarily due to the high cost and potential privacy violations associated with such data collection. Label reconstruction through building another learner on sensitive attributes is a common approach to address this issue. However, existing methods focus solely on improving the prediction accuracy of the sensitive learner as a separate model, while ignoring the disparity between its accuracy and the fairness of the base model. To bridge this gap, this paper proposes an interactive learning framework that aims to optimize the sensitive learner while considering the fairness of the base learner. Furthermore, a new active sampling strategy is developed to select the most valuable data for the sensitive learner regarding the fairness of the base model. The effectiveness of our proposed method in improving model fairness is demonstrated through comprehensive evaluations conducted on various datasets and fairness criteria.

人工智能技术的快速发展已经在人类社会的各个领域找到了许多应用。确保公平和防止歧视是人工智能模型开发中的关键考虑因素。然而，在实际应用程序中，不完整的信息通常会阻碍敏感属性的完整收集，这主要是由于与此类数据收集相关的高成本和潜在的隐私侵犯。通过在敏感属性上构建另一个学习器来进行标签重构是解决这一问题的常用方法。然而，现有的方法仅仅关注于提高敏感学习器作为一个独立模型的预测精度，而忽略了其准确性与基础模型公平性之间的差异。为了弥补这一差距，本文提出了一种交互式学习框架，旨在优化敏感学习者，同时考虑基础学习者的公平性。此外，提出了一种新的主动采样策略，根据基本模型的公平性为敏感学习者选择最有价值的数据。通过对各种数据集和公平性标准进行综合评估，证明了我们提出的方法在提高模型公平性方面的有效性。

{"title":"Interactive active learning for fairness with partial group label","authors":"Zeyu Yang , Jizhi Zhang , Fuli Feng , Chongming Gao , Qifan Wang , Xiangnan He","doi":"10.1016/j.aiopen.2023.10.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.003","url":null,"abstract":"<div><p>The rapid development of AI technologies has found numerous applications across various domains in human society. Ensuring fairness and preventing discrimination are critical considerations in the development of AI models. However, incomplete information often hinders the complete collection of sensitive attributes in real-world applications, primarily due to the high cost and potential privacy violations associated with such data collection. Label reconstruction through building another learner on sensitive attributes is a common approach to address this issue. However, existing methods focus solely on improving the prediction accuracy of the sensitive learner as a separate model, while ignoring the disparity between its accuracy and the fairness of the base model. To bridge this gap, this paper proposes an interactive learning framework that aims to optimize the sensitive learner while considering the fairness of the base learner. Furthermore, a new active sampling strategy is developed to select the most valuable data for the sensitive learner regarding the fairness of the base model. The effectiveness of our proposed method in improving model fairness is demonstrated through comprehensive evaluations conducted on various datasets and fairness criteria.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 175-182"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000190/pdfft?md5=8647172d4d8f417e44b8c64861c1afd4&pid=1-s2.0-S2666651023000190-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92131676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Information Retrieval meets Large Language Models: A strategic report from Chinese IR community 信息检索与大型语言模型的结合——来自中国信息检索界的战略报告

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.001

Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu

The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.

信息检索（IR）的研究领域已经发生了重大变化，超越了传统的搜索，以满足不同用户的信息需求。最近，大型语言模型（LLM）在文本理解、生成和知识推理方面表现出了非凡的能力，为IR研究开辟了令人兴奋的途径。LLM不仅有助于生成检索，还为用户理解、模型评估和用户系统交互提供了改进的解决方案。更重要的是，IR模型、LLM和人类之间的协同关系形成了一种新的技术范式，对信息寻求来说更为强大。IR模型提供实时和相关的信息，LLM贡献内部知识，人类在信息服务的可靠性方面扮演着需求者和评估者的核心角色。尽管如此，仍然存在重大挑战，包括计算成本、可信度问题、特定领域的限制和道德考虑。为了深入讨论LLM对IR研究的变革性影响，中国IR界于2023年4月举办了一次战略研讨会，产生了宝贵的见解。本文总结了研讨会的成果，包括对IR核心价值观的重新思考、LLM和IR的相互增强、新的IR技术范式的提出以及公开的挑战。

{"title":"Information Retrieval meets Large Language Models: A strategic report from Chinese IR community","authors":"Qingyao Ai , Ting Bai , Zhao Cao , Yi Chang , Jiawei Chen , Zhumin Chen , Zhiyong Cheng , Shoubin Dong , Zhicheng Dou , Fuli Feng , Shen Gao , Jiafeng Guo , Xiangnan He , Yanyan Lan , Chenliang Li , Yiqun Liu , Ziyu Lyu , Weizhi Ma , Jun Ma , Zhaochun Ren , Xiaofei Zhu","doi":"10.1016/j.aiopen.2023.08.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.001","url":null,"abstract":"<div><p>The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop’s outcomes, including the rethinking of IR’s core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 80-90"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A unified network embedding algorithm for multi-type similarity measures 一种用于多类型相似性度量的统一网络嵌入算法

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.002

Rui Feng , Qi Ding , Weihao Qiu , Xiao Yang , Yang yang , Chunping Wang

Traditional network embedding aims to learn representations by capturing a predefined vertex-to-vertex similarity measure. However, in practice, there are different types of similarity measures (e.g., connectivity and structural similarity), which are appropriate for different downstream applications. Meanwhile, it is hard to select the “best” similarity measure that can mostly benefit the application, considering the required domain knowledge of both application scenario and network science. It sometimes requires to cooperate these similarity measures with each other for achieving better performance. Therefore, automatically integrate multiple types of similarity measures into a uniform network embedding framework is critical to obtain effective vertex representations for a downstream application. In this paper, we address the above problem in social networks, and propose a semi-supervised representation learning algorithm. The general idea of our approach is to impose social influence, which occurs when one’s opinions, emotions, or behaviors are affected by others in a social network. Particularly, we build the connection between a user’s representation vector and the probability of her being influenced by another user to have a particular label (e.g., fraud, personal interest, etc.). We conduct efficient experiments based on six real-world datasets and find a clear improvement of our approach comparing with several state-of-the-art baselines.

传统的网络嵌入旨在通过捕获预定义的顶点到顶点相似性度量来学习表示。然而，在实践中，存在不同类型的相似性度量（例如，连接性和结构相似性），适用于不同的下游应用。同时，考虑到应用场景和网络科学所需的领域知识，很难选择最有利于应用的“最佳”相似性度量。有时需要将这些相似性度量相互配合以实现更好的性能。因此，将多种类型的相似性度量自动集成到统一的网络嵌入框架中，对于获得下游应用程序的有效顶点表示至关重要。在本文中，我们解决了社交网络中的上述问题，并提出了一种半监督表示学习算法。我们方法的总体思想是施加社会影响，当一个人的观点、情绪或行为在社交网络中受到他人的影响时，就会产生这种影响。特别是，我们在用户的表示向量和她受到另一个用户影响而拥有特定标签（例如，欺诈、个人兴趣等）的概率之间建立了联系。我们基于六个真实世界的数据集进行了有效的实验，并发现与几个最先进的基线相比，我们的方法有了明显的改进。

{"title":"A unified network embedding algorithm for multi-type similarity measures","authors":"Rui Feng , Qi Ding , Weihao Qiu , Xiao Yang , Yang yang , Chunping Wang","doi":"10.1016/j.aiopen.2023.08.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.002","url":null,"abstract":"<div><p>Traditional network embedding aims to learn <em>representations</em> by capturing a predefined <em>vertex-to-vertex similarity measure</em>. However, in practice, there are different types of similarity measures (e.g., <em>connectivity</em> and <em>structural similarity</em>), which are appropriate for different downstream applications. Meanwhile, it is hard to select the “best” similarity measure that can mostly benefit the application, considering the required domain knowledge of both application scenario and network science. It sometimes requires to cooperate these similarity measures with each other for achieving better performance. Therefore, automatically integrate multiple types of similarity measures into a uniform network embedding framework is critical to obtain effective vertex representations for a downstream application. In this paper, we address the above problem in social networks, and propose a <em>semi-supervised</em> representation learning algorithm. The general idea of our approach is to impose <em>social influence</em>, which occurs when one’s opinions, emotions, or behaviors are affected by others in a social network. Particularly, we build the connection between a user’s representation vector and the probability of her being influenced by another user to have a particular label (e.g., fraud, personal interest, etc.). We conduct efficient experiments based on six real-world datasets and find a clear improvement of our approach comparing with several state-of-the-art baselines.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 64-72"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios 中文拼写检查准备好了吗?了解现实场景中的纠正行为

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.10.004

Liner Yang , Xin Liu , Tianxin Liao , Zhenghao Liu , Mengyan Wang , Xuezhi Fang , Erhong Yang

The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors in Chinese texts. While prior work in this domain has predominantly relied on benchmarks such as SIGHAN for evaluating model performance, these benchmarks often exhibit an imbalanced distribution of spelling errors. They are typically constructed under idealized conditions, presuming the presence of only spelling errors in the input text. This assumption does not hold in real-world scenarios, where spell checkers frequently encounter a mix of spelling and grammatical errors, thereby presenting additional challenges. To address this gap and create a more realistic testing environment, we introduce a high-quality CSC evaluation benchmark named YACSC (Yet Another Chinese Spelling Check Dataset). YACSC is unique in that it includes annotations for both grammatical and spelling errors, rendering it a more reliable benchmark for CSC tasks. Furthermore, we propose a hierarchical network designed to integrate multidimensional information, leveraging semantic and phonetic aspects, as well as the structural forms of Chinese characters, to enhance the detection and correction of spelling errors. Through extensive experiments, we evaluate the limitations of existing CSC benchmarks and illustrate the application of our proposed system in real-world scenarios, particularly as a preliminary stage in writing assistant systems.

汉语拼写检查是识别和纠正汉语文本拼写错误的关键任务。虽然该领域的先前工作主要依赖于诸如SIGHAN之类的基准来评估模型性能，但这些基准通常表现出拼写错误的不平衡分布。它们通常是在理想条件下构建的，假设输入文本中只存在拼写错误。这种假设在实际场景中并不成立，在实际场景中，拼写检查器经常遇到拼写和语法错误，从而带来额外的挑战。为了解决这一差距并创造一个更现实的测试环境，我们引入了一个高质量的CSC评估基准，名为YACSC (Yet Another Chinese Spelling Check Dataset)。YACSC的独特之处在于它包含语法和拼写错误的注释，使其成为CSC任务更可靠的基准。此外，我们提出了一种分层网络，旨在整合多维信息，利用语义和语音方面，以及汉字的结构形式，以提高拼写错误的检测和纠正。通过广泛的实验，我们评估了现有CSC基准的局限性，并说明了我们提出的系统在现实场景中的应用，特别是作为写作辅助系统的初步阶段。

{"title":"Is Chinese Spelling Check ready? Understanding the correction behavior in real-world scenarios","authors":"Liner Yang , Xin Liu , Tianxin Liao , Zhenghao Liu , Mengyan Wang , Xuezhi Fang , Erhong Yang","doi":"10.1016/j.aiopen.2023.10.004","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.004","url":null,"abstract":"<div><p>The task of Chinese Spelling Check (CSC) is crucial for identifying and rectifying spelling errors in Chinese texts. While prior work in this domain has predominantly relied on benchmarks such as SIGHAN for evaluating model performance, these benchmarks often exhibit an imbalanced distribution of spelling errors. They are typically constructed under idealized conditions, presuming the presence of only spelling errors in the input text. This assumption does not hold in real-world scenarios, where spell checkers frequently encounter a mix of spelling and grammatical errors, thereby presenting additional challenges. To address this gap and create a more realistic testing environment, we introduce a high-quality CSC evaluation benchmark named YACSC (Yet Another Chinese Spelling Check Dataset). YACSC is unique in that it includes annotations for both grammatical and spelling errors, rendering it a more reliable benchmark for CSC tasks. Furthermore, we propose a hierarchical network designed to integrate multidimensional information, leveraging semantic and phonetic aspects, as well as the structural forms of Chinese characters, to enhance the detection and correction of spelling errors. Through extensive experiments, we evaluate the limitations of existing CSC benchmarks and illustrate the application of our proposed system in real-world scenarios, particularly as a preliminary stage in writing assistant systems.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 183-192"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000207/pdfft?md5=74aa1bdba96c30d73a25c1dde4472205&pid=1-s2.0-S2666651023000207-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134657198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey on complex factual question answering 复杂事实问答调查

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2022.12.003

Lingxi Zhang , Jing Zhang , Xirui Ke , Haoyang Li , Xinmei Huang , Zhonghui Shao , Shulin Cao , Xin Lv

Answering complex factual questions has drawn a lot of attention. Researchers leverage various data sources to support complex QA, such as unstructured texts, structured knowledge graphs and relational databases, semi-structured web tables, or even hybrid data sources. However, although the ideas behind these approaches show similarity to some extent, there is not yet a consistent strategy to deal with various data sources. In this survey, we carefully examine how complex factual question answering has evolved across various data sources. We list the similarities among these approaches and group them into the analysis–extend–reason framework, despite the various question types and data sources that they focus on. We also address future directions for difficult factual question answering as well as the relevant benchmarks.

回答复杂的事实问题引起了很多关注。研究人员利用各种数据源来支持复杂的QA，如非结构化文本、结构化知识图和关系数据库、半结构化web表，甚至混合数据源。然而，尽管这些方法背后的思想在某种程度上显示出相似性，但还没有一个一致的策略来处理各种数据源。在这项调查中，我们仔细研究了复杂的事实问题回答是如何在各种数据源中演变的。我们列出了这些方法之间的相似之处，并将它们分组到分析-扩展-原因框架中，尽管它们关注的问题类型和数据来源各不相同。我们还讨论了困难的事实问题回答的未来方向以及相关基准。

引用次数: 6

Graph-based methods for cervical cancer segmentation: Advancements, limitations, and future directions 基于图的子宫颈癌分割方法:进展、限制和未来方向

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.006

Nazar Zaki , Wenjian Qin , Anusuya Krishnan

Cervical cancer remains a significant health concern worldwide, where precise segmentation of cervical lesions is integral for effective diagnosis and treatment planning. This systematic review critically evaluates the application of graph-based methodologies for cervical cancer segmentation, identifying their potential, drawbacks, and avenues for future development. An exhaustive literature search across Scopus and PubMed databases resulted in 20 pertinent studies. These studies were assessed focusing on their implementation of graph-based techniques for cervical cancer segmentation, the utilized datasets, evaluation metrics, and reported precision levels. The review highlights the progressive strides made in the field, especially regarding the segmentation of intricate, non-convex regions and facilitating the detection and grading of cervical cancer using graph-based methodologies. Nonetheless, several constraints were evident, including a dearth of comparative performance analysis, reliance on high-resolution images, difficulties in specific boundary delineation, and the imperative for additional validation and diversified datasets. The review suggests future work to integrate advanced deep learning strategies for heightened accuracy, formulate hybrid methodologies to counteract existing limitations, and explore multi-modal fusion to boost segmentation precision. Emphasizing the explainability and interpretability of outcomes also stands paramount. Lastly, addressing critical challenges such as scarcity of annotated data, the need for real-time and interactive segmentation, and the segmentation of multiple objects or regions of interest remains a crucial frontier for future endeavors.

宫颈癌症仍然是世界范围内一个重要的健康问题，宫颈病变的精确分割对于有效的诊断和治疗计划至关重要。这篇系统综述对基于图形的方法在宫颈癌症分割中的应用进行了批判性评估，确定了它们的潜力、缺点和未来发展的途径。在Scopus和PubMed数据库中进行了详尽的文献检索，得出了20项相关研究。对这些研究进行了评估，重点是它们对基于图形的宫颈癌症分割技术的实施、所使用的数据集、评估指标和报告的精度水平。该综述强调了该领域取得的进步，特别是在复杂非凸区域的分割以及使用基于图形的方法促进癌症的检测和分级方面。尽管如此，仍存在一些明显的制约因素，包括缺乏比较性能分析、依赖高分辨率图像、难以划定具体边界，以及需要额外的验证和多样化的数据集。该综述建议未来的工作是集成先进的深度学习策略以提高准确性，制定混合方法以抵消现有的局限性，并探索多模式融合以提高分割精度。强调结果的可解释性和可解释性也至关重要。最后，解决关键挑战，如注释数据的稀缺性、实时和交互式分割的需要以及多个感兴趣对象或区域的分割，仍然是未来努力的关键前沿。

{"title":"Graph-based methods for cervical cancer segmentation: Advancements, limitations, and future directions","authors":"Nazar Zaki , Wenjian Qin , Anusuya Krishnan","doi":"10.1016/j.aiopen.2023.08.006","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.006","url":null,"abstract":"<div><p>Cervical cancer remains a significant health concern worldwide, where precise segmentation of cervical lesions is integral for effective diagnosis and treatment planning. This systematic review critically evaluates the application of graph-based methodologies for cervical cancer segmentation, identifying their potential, drawbacks, and avenues for future development. An exhaustive literature search across Scopus and PubMed databases resulted in 20 pertinent studies. These studies were assessed focusing on their implementation of graph-based techniques for cervical cancer segmentation, the utilized datasets, evaluation metrics, and reported precision levels. The review highlights the progressive strides made in the field, especially regarding the segmentation of intricate, non-convex regions and facilitating the detection and grading of cervical cancer using graph-based methodologies. Nonetheless, several constraints were evident, including a dearth of comparative performance analysis, reliance on high-resolution images, difficulties in specific boundary delineation, and the imperative for additional validation and diversified datasets. The review suggests future work to integrate advanced deep learning strategies for heightened accuracy, formulate hybrid methodologies to counteract existing limitations, and explore multi-modal fusion to boost segmentation precision. Emphasizing the explainability and interpretability of outcomes also stands paramount. Lastly, addressing critical challenges such as scarcity of annotated data, the need for real-time and interactive segmentation, and the segmentation of multiple objects or regions of interest remains a crucial frontier for future endeavors.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 42-55"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Word sense induction with agglomerative clustering and mutual information maximization 词义归纳与聚类和互信息最大化

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.12.001

Hadi Abdine , Moussa Kamal Eddine , Davide Buscaldi , Michalis Vazirgiannis

Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word’s senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public¹.

词义归纳（WSI）是自然语言处理中一个具有挑战性的问题，它涉及在无监督的情况下自动检测一个词的词义（即含义）。最近的研究通过预训练一个语言模型，该模型可以专门用于词义消歧，从而在词义归纳任务中取得了显著的成果。与此相反，其他研究则采用现成的预训练语言模型，并增加了诱导词义的策略。本文提出了一种基于分层聚类和不变信息聚类（IIC）的新型无监督方法。IIC 损失用于训练一个小型模型，以优化一对合成意译中出现的目标词的两个向量表示之间的互信息。该模型随后将用于推理模式，以提取更高质量的向量表示，用于分层聚类。我们在两个 WSI 任务和两种不同的聚类配置（固定聚类数和动态聚类数）中对我们的方法进行了评估。我们的经验表明，我们的方法至少与最先进的基线方法不相上下，在几种配置中的表现都优于它们。重现这项工作的代码和数据可公开获取1。

{"title":"Word sense induction with agglomerative clustering and mutual information maximization","authors":"Hadi Abdine , Moussa Kamal Eddine , Davide Buscaldi , Michalis Vazirgiannis","doi":"10.1016/j.aiopen.2023.12.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.12.001","url":null,"abstract":"<div><p>Word sense induction (WSI) is a challenging problem in natural language processing that involves the unsupervised automatic detection of a word’s senses (i.e., meanings). Recent work achieves significant results on the WSI task by pre-training a language model that can exclusively disambiguate word senses. In contrast, others employ off-the-shelf pre-trained language models with additional strategies to induce senses. This paper proposes a novel unsupervised method based on hierarchical clustering and invariant information clustering (IIC). The IIC loss is used to train a small model to optimize the mutual information between two vector representations of a target word occurring in a pair of synthetic paraphrases. This model is later used in inference mode to extract a higher-quality vector representation to be used in the hierarchical clustering. We evaluate our method on two WSI tasks and in two distinct clustering configurations (fixed and dynamic number of clusters). We empirically show that our approach is at least on par with the state-of-the-art baselines, outperforming them in several configurations. The code and data to reproduce this work are available to the public<span><sup>1</sup></span>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 193-201"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000232/pdfft?md5=a0553e94f2fab365fb751bcc0ddf8e6c&pid=1-s2.0-S2666651023000232-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138570139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint span and token framework for few-shot named entity recognition 用于少镜头命名实体识别的联合跨度和令牌框架

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.009

Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan

Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (JST) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The JST framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD¹ and SNIPS² datasets demonstrate that our model outperforms existing methods in terms of performance.

少镜头命名实体识别（NER）是一项具有挑战性的任务，涉及使用有限数量的标记实例来识别新的实体类型进行训练。目前，大多数少镜头NER方法都是基于跨度的，它们更关注作为候选实体的跨度的边界信息和实体级别的信息。然而，这些方法往往忽略了令牌级别的语义信息，这可能会限制它们的有效性。为了解决这个问题，我们提出了一种新的联合跨度和令牌（JST）框架，该框架集成了实体的边界信息和包括实体的每个令牌的语义信息。JST框架使用跨度特征来提取实体的边界特征，使用令牌特征来提取每个令牌的语义特征。此外，为了减少Other类的负面影响，我们引入了一种在语义空间中将命名实体与Other类分离的方法，这有助于改进实体和Other类之间的区别。此外，我们使用GPT对支持语句进行数据扩充，生成与原始语句相似的语句。这些句子增加了样本的多样性和我们模型的可靠性。我们在Few-NERD1和SNIPS2数据集上的实验结果表明，我们的模型在性能方面优于现有方法。

{"title":"Joint span and token framework for few-shot named entity recognition","authors":"Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan","doi":"10.1016/j.aiopen.2023.08.009","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.009","url":null,"abstract":"<div><p>Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (<strong>JST</strong>) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The <strong>JST</strong> framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD<span><sup>1</sup></span> and SNIPS<span><sup>2</sup></span> datasets demonstrate that our model outperforms existing methods in terms of performance.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 111-119"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Restricted orthogonal gradient projection for continual learning 连续学习的受限正交梯度投影

AI Open

Pub Date : 2023-01-01 DOI: 10.1016/j.aiopen.2023.08.010

Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu

Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches using a fixed network architecture. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.

持续学习旨在避免灾难性的遗忘，并有效地利用所学经验来掌握新知识。现有的梯度投影方法对新任务的优化空间施加了严格的约束，以最大限度地减少干扰，这同时阻碍了正向知识转移。为了解决这个问题，最近的方法在不断增长的网络中重用冻结的参数，导致计算成本高。因此，我们是否可以使用固定网络架构改进梯度投影方法的前向知识转移仍然是一个挑战。在这项工作中，我们提出了限制正交梯度投影（ROGO）框架。其基本思想是采用限制正交约束，允许在倾斜于整个冻结空间的方向上优化参数，以便于在巩固先前知识的同时向前转移知识。我们的框架既不需要数据缓冲区，也不需要额外的参数。大量的实验已经证明了我们的框架相对于几个强大的基线的优越性。我们也为我们的放松策略提供了理论保障。

{"title":"Restricted orthogonal gradient projection for continual learning","authors":"Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu","doi":"10.1016/j.aiopen.2023.08.010","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.010","url":null,"abstract":"<div><p>Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches <em>using a fixed network architecture</em>. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 98-110"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI Open最新文献