Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.11.004
Mengwei Xie, Suyun Zhao, Hong Chen, Cuiping Li
When faced with the issue of different feature distribution between training and test data, the test data may differ in style and background from the training data due to the collection sources or privacy protection. That is, the transfer generalization problem. Contrastive learning, which is currently the most successful unsupervised learning method, provides good generalization performance for the various distributions of data and can use labeled data more effectively without overfitting. This study demonstrates how contrast can enhance a model’s ability to generalize, how joint contrastive learning and supervised learning can strengthen one another, and how this approach can be broadly used in various disciplines.
{"title":"Domain generalization by class-aware negative sampling-based contrastive learning","authors":"Mengwei Xie, Suyun Zhao, Hong Chen, Cuiping Li","doi":"10.1016/j.aiopen.2022.11.004","DOIUrl":"10.1016/j.aiopen.2022.11.004","url":null,"abstract":"<div><p>When faced with the issue of different feature distribution between training and test data, the test data may differ in style and background from the training data due to the collection sources or privacy protection. That is, the transfer generalization problem. Contrastive learning, which is currently the most successful unsupervised learning method, provides good generalization performance for the various distributions of data and can use labeled data more effectively without overfitting. This study demonstrates how contrast can enhance a model’s ability to generalize, how joint contrastive learning and supervised learning can strengthen one another, and how this approach can be broadly used in various disciplines.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 200-207"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000195/pdfft?md5=d1beea40105807161328cdcc4aa5b211&pid=1-s2.0-S2666651022000195-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83293382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.07.001
Wenkai Li , Wenbo Hu , Ting Chen , Ning Chen , Cheng Feng
Recent studies have shown that autoencoder-based models can achieve superior performance on anomaly detection tasks due to their excellent ability to fit complex data in an unsupervised manner. In this work, we propose a novel autoencoder-based model, named StackVAE-G that can significantly bring the efficiency and interpretability to multivariate time series anomaly detection. Specifically, we utilize the similarities across the time series channels by the stacking block-wise reconstruction with a weight-sharing scheme to reduce the size of learned models and also relieve the overfitting to unknown noises in the training data. We also leverage a graph learning module to learn a sparse adjacency matrix to explicitly capture the stable interrelation structure among multiple time series channels for the interpretable pattern reconstruction of interrelated channels. Combining these two modules, we introduce the stacking block-wise VAE (variational autoencoder) with GNN (graph neural network) model for multivariate time series anomaly detection. We conduct extensive experiments on three commonly used public datasets, showing that our model achieves comparable (even better) performance with the state-of-the-art models and meanwhile requires much less computation and memory cost. Furthermore, we demonstrate that the adjacency matrix learned by our model accurately captures the interrelation among multiple channels, and can provide valuable information for failure diagnosis applications.
{"title":"StackVAE-G: An efficient and interpretable model for time series anomaly detection","authors":"Wenkai Li , Wenbo Hu , Ting Chen , Ning Chen , Cheng Feng","doi":"10.1016/j.aiopen.2022.07.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.07.001","url":null,"abstract":"<div><p>Recent studies have shown that autoencoder-based models can achieve superior performance on anomaly detection tasks due to their excellent ability to fit complex data in an unsupervised manner. In this work, we propose a novel autoencoder-based model, named StackVAE-G that can significantly bring the efficiency and interpretability to multivariate time series anomaly detection. Specifically, we utilize the similarities across the time series channels by the stacking block-wise reconstruction with a weight-sharing scheme to reduce the size of learned models and also relieve the overfitting to unknown noises in the training data. We also leverage a graph learning module to learn a sparse adjacency matrix to explicitly capture the stable interrelation structure among multiple time series channels for the interpretable pattern reconstruction of interrelated channels. Combining these two modules, we introduce the stacking block-wise VAE (variational autoencoder) with GNN (graph neural network) model for multivariate time series anomaly detection. We conduct extensive experiments on three commonly used public datasets, showing that our model achieves comparable (even better) performance with the state-of-the-art models and meanwhile requires much less computation and memory cost. Furthermore, we demonstrate that the adjacency matrix learned by our model accurately captures the interrelation among multiple channels, and can provide valuable information for failure diagnosis applications.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 101-110"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000110/pdfft?md5=1bdde12e6a6cbde8b1220840197923b8&pid=1-s2.0-S2666651022000110-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72282566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.12.001
Qingye Meng , Ziyue Wang , Hang Chen , Xianzhen Luo , Baoxin Wang , Zhipeng Chen , Yiming Cui , Dayong Wu , Zhigang Chen , Shijin Wang
{"title":"Augmented and challenging datasets with multi-step reasoning and multi-span questions for Chinese judicial reading comprehension","authors":"Qingye Meng , Ziyue Wang , Hang Chen , Xianzhen Luo , Baoxin Wang , Zhipeng Chen , Yiming Cui , Dayong Wu , Zhigang Chen , Shijin Wang","doi":"10.1016/j.aiopen.2022.12.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.12.001","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"193-199"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000225/pdfft?md5=b1c460292acbffd5098c88c36eca4487&pid=1-s2.0-S2666651022000225-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72286079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.01.001
Ludan Ruan, Qin Jin
Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have started to apply transformer to video processing. This survey aims to provide a comprehensive overview of transformer-based pre-training methods for Video-Language learning. We first briefly introduce the transformer structure as the background knowledge, including attention mechanism, position encoding etc. We then describe the typical paradigm of pre-training & fine-tuning on Video-Language processing in terms of proxy tasks, downstream tasks and commonly used video datasets. Next, we categorize transformer models into Single-Stream and Multi-Stream structures, highlight their innovations and compare their performances. Finally, we analyze and discuss the current challenges and possible future research directions for Video-Language pre-training.
{"title":"Survey: Transformer based video-language pre-training","authors":"Ludan Ruan, Qin Jin","doi":"10.1016/j.aiopen.2022.01.001","DOIUrl":"10.1016/j.aiopen.2022.01.001","url":null,"abstract":"<div><p>Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have started to apply transformer to video processing. This survey aims to provide a comprehensive overview of transformer-based pre-training methods for Video-Language learning. We first briefly introduce the transformer structure as the background knowledge, including attention mechanism, position encoding etc. We then describe the typical paradigm of pre-training & fine-tuning on Video-Language processing in terms of proxy tasks, downstream tasks and commonly used video datasets. Next, we categorize transformer models into Single-Stream and Multi-Stream structures, highlight their innovations and compare their performances. Finally, we analyze and discuss the current challenges and possible future research directions for Video-Language pre-training.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 1-13"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000018/pdfft?md5=d7b4ae16eb4b58434223ebe8ccf64030&pid=1-s2.0-S2666651022000018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77585167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.02.001
Tingchen Fu , Shen Gao , Xueliang Zhao , Ji-rong Wen , Rui Yan
Recent years have witnessed a surge of interest in the field of open-domain dialogue. Thanks to the rapid development of social media, large dialogue corpus from the Internet builds up a fundamental premise for data-driven dialogue model. The breakthrough in neural network also brings new ideas to researchers in AI and NLP. A great number of new techniques and methods therefore came into being. In this paper, we review some of the most representative works in recent years and divide existing prevailing frameworks for a dialogue model into three categories. We further analyze the trend of development for open-domain dialogue and summarize the goal of an open-domain dialogue system in two aspects, informative and controllable. The methods we review in this paper are selected according to our unique perspectives and by no means complete. Rather, we hope this servery could benefit NLP community for future research in open-domain dialogue.
{"title":"Learning towards conversational AI: A survey","authors":"Tingchen Fu , Shen Gao , Xueliang Zhao , Ji-rong Wen , Rui Yan","doi":"10.1016/j.aiopen.2022.02.001","DOIUrl":"10.1016/j.aiopen.2022.02.001","url":null,"abstract":"<div><p>Recent years have witnessed a surge of interest in the field of open-domain dialogue. Thanks to the rapid development of social media, large dialogue corpus from the Internet builds up a fundamental premise for data-driven dialogue model. The breakthrough in neural network also brings new ideas to researchers in AI and NLP. A great number of new techniques and methods therefore came into being. In this paper, we review some of the most representative works in recent years and divide existing prevailing frameworks for a dialogue model into three categories. We further analyze the trend of development for open-domain dialogue and summarize the goal of an open-domain dialogue system in two aspects, informative and controllable. The methods we review in this paper are selected according to our unique perspectives and by no means complete. Rather, we hope this servery could benefit NLP community for future research in open-domain dialogue.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 14-28"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000079/pdfft?md5=a8c5cdae822d93f7d82a0ff336415b53&pid=1-s2.0-S2666651022000079-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85008120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2021.10.001
Ming Ding
Variational Auto-Encoders (VAEs) have emerged as one of the most popular genres of generative models, which are learned to characterize the data distribution. The classic Expectation Maximization (EM) algorithm aims to learn models with hidden variables. Essentially, both of them are iteratively optimizing the evidence lower bound (ELBO) to maximize to the likelihood of the observed data.
This short tutorial joins them up into a line and offer a good way to thoroughly understand EM and VAE with minimal knowledge. It is especially helpful to beginners and readers with experiences in machine learning applications but no statistics background.
{"title":"The road from MLE to EM to VAE: A brief tutorial","authors":"Ming Ding","doi":"10.1016/j.aiopen.2021.10.001","DOIUrl":"10.1016/j.aiopen.2021.10.001","url":null,"abstract":"<div><p>Variational Auto-Encoders (VAEs) have emerged as one of the most popular genres of <em>generative models</em>, which are learned to characterize the data distribution. The classic Expectation Maximization (EM) algorithm aims to learn models with hidden variables. Essentially, both of them are iteratively optimizing the <em>evidence lower bound</em> (ELBO) to maximize to the likelihood of the observed data.</p><p>This short tutorial joins them up into a line and offer a good way to thoroughly understand EM and VAE with minimal knowledge. It is especially helpful to beginners and readers with experiences in machine learning applications but no statistics background.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 29-34"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000279/pdfft?md5=8f78a90e4fd74243d885b738de1fe94e&pid=1-s2.0-S2666651021000279-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73299255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.10.002
Tao Wei , Yonghong Tian , Yaowei Wang , Yun Liang , Chang Wen Chen
The convolution operation is the most critical component in recent surge of deep learning research. Conventional 2D convolution needs parameters to represent, where is the channel size and is the kernel size. The amount of parameters has become really costly considering that these parameters increased tremendously recently to meet the needs of demanding applications. Among various implementations of the convolution, separable convolution has been proven to be more efficient in reducing the model size. For example, depth separable convolution reduces the complexity to while spatial separable convolution reduces the complexity to . However, these are considered ad hoc designs which cannot ensure that they can in general achieve optimal separation. In this research, we propose a novel and principled operator called optimized separable convolution by optimal design for the internal number of groups and kernel sizes for general separable convolutions can achieve the complexity of . When the restriction in the number of separated convolutions can be lifted, an even lower complexity at can be achieved. Experimental results demonstrate that the proposed optimized separable convolution is able to achieve an improved performance in terms of accuracy-#Params trade-offs over both conventional, depth-wise, and depth/spatial separable convolutions.
{"title":"Optimized separable convolution: Yet another efficient convolution operator","authors":"Tao Wei , Yonghong Tian , Yaowei Wang , Yun Liang , Chang Wen Chen","doi":"10.1016/j.aiopen.2022.10.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.10.002","url":null,"abstract":"<div><p>The convolution operation is the most critical component in recent surge of deep learning research. Conventional 2D convolution needs <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>C</mi></mrow><mrow><mn>2</mn></mrow></msup><msup><mrow><mi>K</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> parameters to represent, where <span><math><mi>C</mi></math></span> is the channel size and <span><math><mi>K</mi></math></span> is the kernel size. The amount of parameters has become really costly considering that these parameters increased tremendously recently to meet the needs of demanding applications. Among various implementations of the convolution, separable convolution has been proven to be more efficient in reducing the model size. For example, depth separable convolution reduces the complexity to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>C</mi><mi>⋅</mi><mrow><mo>(</mo><mi>C</mi><mo>+</mo><msup><mrow><mi>K</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math></span> while spatial separable convolution reduces the complexity to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>C</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>K</mi><mo>)</mo></mrow></mrow></math></span>. However, these are considered ad hoc designs which cannot ensure that they can in general achieve optimal separation. In this research, we propose a novel and principled operator called <em>optimized separable convolution</em> by optimal design for the internal number of groups and kernel sizes for general separable convolutions can achieve the complexity of <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>C</mi></mrow><mrow><mfrac><mrow><mn>3</mn></mrow><mrow><mn>2</mn></mrow></mfrac></mrow></msup><mi>K</mi><mo>)</mo></mrow></mrow></math></span>. When the restriction in the number of separated convolutions can be lifted, an even lower complexity at <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>C</mi><mi>⋅</mi><mo>log</mo><mrow><mo>(</mo><mi>C</mi><msup><mrow><mi>K</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math></span> can be achieved. Experimental results demonstrate that the proposed optimized separable convolution is able to achieve an improved performance in terms of accuracy-#Params trade-offs over both conventional, depth-wise, and depth/spatial separable convolutions.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 162-171"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000158/pdfft?md5=53825a8ab2de46247d122c455ee0622b&pid=1-s2.0-S2666651022000158-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72286083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.
{"title":"A survey of transformers","authors":"Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu","doi":"10.1016/j.aiopen.2022.10.001","DOIUrl":"10.1016/j.aiopen.2022.10.001","url":null,"abstract":"<div><p>Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 111-132"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000146/pdfft?md5=802c180f3454a2e26d638dce462d3dff&pid=1-s2.0-S2666651022000146-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80994748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.11.005
Quanyu Dai , Zhenhua Dong , Xu Chen
Debiased recommender models have recently attracted increasing attention from the academic and industry communities. Existing models are mostly based on the technique of inverse propensity score (IPS). However, in the recommendation domain, IPS can be hard to estimate given the sparse and noisy nature of the observed user–item exposure data. To alleviate this problem, in this paper, we assume that the user preference can be dominated by a small amount of latent factors, and propose to cluster the users for computing more accurate IPS via increasing the exposure densities. Basically, such method is similar with the spirit of stratification models in applied statistics. However, unlike previous heuristic stratification strategy, we learn the cluster criterion by presenting the users with low ranking embeddings, which are future shared with the user representations in the recommender model. At last, we find that our model has strong connections with the previous two types of debiased recommender models. We conduct extensive experiments based on real-world datasets to demonstrate the effectiveness of the proposed method.
{"title":"Debiased recommendation with neural stratification","authors":"Quanyu Dai , Zhenhua Dong , Xu Chen","doi":"10.1016/j.aiopen.2022.11.005","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.11.005","url":null,"abstract":"<div><p>Debiased recommender models have recently attracted increasing attention from the academic and industry communities. Existing models are mostly based on the technique of inverse propensity score (IPS). However, in the recommendation domain, IPS can be hard to estimate given the sparse and noisy nature of the observed user–item exposure data. To alleviate this problem, in this paper, we assume that the user preference can be dominated by a small amount of latent factors, and propose to cluster the users for computing more accurate IPS via increasing the exposure densities. Basically, such method is similar with the spirit of stratification models in applied statistics. However, unlike previous heuristic stratification strategy, we learn the cluster criterion by presenting the users with low ranking embeddings, which are future shared with the user representations in the recommender model. At last, we find that our model has strong connections with the previous two types of debiased recommender models. We conduct extensive experiments based on real-world datasets to demonstrate the effectiveness of the proposed method.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 213-217"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000201/pdfft?md5=1244b2c9319c988375fcebe6f3172caa&pid=1-s2.0-S2666651022000201-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72246441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.11.002
Haiguang Zhang, Tongyue Zhang, Faxin Cao, Zhizheng Wang, Yuanyu Zhang, Yuanyuan Sun, Mark Anthony Vicente
The National Judicial Examination of China is an essential examination for selecting legal practitioners. In recent years, people have tried to use machine learning algorithms to answer examination questions. With the proposal of JEC-QA (Zhong et al. 2020), the judicial examination becomes a particular legal task. The data of judicial examination contains two types, i.e., Knowledge-Driven questions and Case-Analysis questions. Both require complex reasoning and text comprehension, thus challenging computers to answer judicial examination questions. We propose Bilinear Convolutional Neural Networks and Attention Networks (BCA) in this paper, which is an improved version based on the model proposed by our team on the Challenge of AI in Law 2021 judicial examination task. It has two essential modules, Knowledge-Driven Module (KDM) for local features extraction and Case-Analysis Module (CAM) for the semantic difference clarification between the question stem and the options. We also add a post-processing module to correct the results in the final stage. The experimental results show that our system achieves state-of-the-art in the offline test of the judicial examination task.
国家司法考试是选拔法律从业人员的重要考试。近年来,人们尝试使用机器学习算法来回答考试问题。随着JEC-QA(Zhong et al.2020)的提出,司法审查成为一项特殊的法律任务。司法考试数据分为知识驱动题和案例分析题两类。两者都需要复杂的推理和文本理解,因此对计算机回答司法考试问题具有挑战性。我们在本文中提出了双线性卷积神经网络和注意力网络(BCA),这是基于我们团队在2021年法律中人工智能挑战司法考试任务中提出的模型的改进版本。它有两个基本模块,用于局部特征提取的知识驱动模块(KDM)和用于澄清题干和选项之间语义差异的案例分析模块(CAM)。我们还添加了一个后处理模块,以在最后阶段更正结果。实验结果表明,我们的系统在司法考试任务的离线测试中达到了最先进的水平。
{"title":"BCA: Bilinear Convolutional Neural Networks and Attention Networks for legal question answering","authors":"Haiguang Zhang, Tongyue Zhang, Faxin Cao, Zhizheng Wang, Yuanyu Zhang, Yuanyuan Sun, Mark Anthony Vicente","doi":"10.1016/j.aiopen.2022.11.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.11.002","url":null,"abstract":"<div><p>The National Judicial Examination of China is an essential examination for selecting legal practitioners. In recent years, people have tried to use machine learning algorithms to answer examination questions. With the proposal of JEC-QA (Zhong et al. 2020), the judicial examination becomes a particular legal task. The data of judicial examination contains two types, i.e., Knowledge-Driven questions and Case-Analysis questions. Both require complex reasoning and text comprehension, thus challenging computers to answer judicial examination questions. We propose <strong>B</strong>ilinear <strong>C</strong>onvolutional Neural Networks and <strong>A</strong>ttention Networks (<strong>BCA</strong>) in this paper, which is an improved version based on the model proposed by our team on the Challenge of AI in Law 2021 judicial examination task. It has two essential modules, <strong>K</strong>nowledge-<strong>D</strong>riven <strong>M</strong>odule (<strong>KDM</strong>) for local features extraction and <strong>C</strong>ase-<strong>A</strong>nalysis <strong>M</strong>odule (<strong>CAM</strong>) for the semantic difference clarification between the question stem and the options. We also add a post-processing module to correct the results in the final stage. The experimental results show that our system achieves state-of-the-art in the offline test of the judicial examination task.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 172-181"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000171/pdfft?md5=7fc8cf53d6ea6be2b3999607b407f336&pid=1-s2.0-S2666651022000171-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72286081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}