Pub Date : 2024-12-05DOI: 10.1016/j.ipm.2024.104003
Jitao Zhong , Yushan Wu , Hele Liu , Jinlong Chao , Bin Hu , Sujie Ma , Hong Peng
To address the gap in fNIRS-based depression detection research concerning channel selection and information fusion, and to possibly provide recommendations for channel design to fNIRS device manufacturers, we propose a novel framework for depression detection using functional near-infrared spectroscopy (fNIRS) with optimized channel selection and fusion. Involving a sample of 80 participants (40 depressed, 40 healthy), we employed Phase Space Reconstruction (PSR) to capture neurovascular nonlinear dynamics from the fNIRS data. Using multi-objective optimization (MOMVO), we identified key channels in brain regions such as the Left Dorsolateral Prefrontal Cortex, Right Infraorbital Superior Frontal Gyrus, Right Dorsolateral Prefrontal Cortex, and Right Middle Frontal Gyrus. Our approach achieved depression detection rates of 96.1% under positive stimuli, 91.3% under neutral stimuli, and 98.0% under negative stimuli, surpassing comparative methods by 5% to 12%. This framework demonstrates potential for improving early depression detection and clinical applications.
{"title":"Soft fusion of channel information in depression detection using functional near-infrared spectroscopy","authors":"Jitao Zhong , Yushan Wu , Hele Liu , Jinlong Chao , Bin Hu , Sujie Ma , Hong Peng","doi":"10.1016/j.ipm.2024.104003","DOIUrl":"10.1016/j.ipm.2024.104003","url":null,"abstract":"<div><div>To address the gap in fNIRS-based depression detection research concerning channel selection and information fusion, and to possibly provide recommendations for channel design to fNIRS device manufacturers, we propose a novel framework for depression detection using functional near-infrared spectroscopy (fNIRS) with optimized channel selection and fusion. Involving a sample of 80 participants (40 depressed, 40 healthy), we employed Phase Space Reconstruction (PSR) to capture neurovascular nonlinear dynamics from the fNIRS data. Using multi-objective optimization (MOMVO), we identified key channels in brain regions such as the Left Dorsolateral Prefrontal Cortex, Right Infraorbital Superior Frontal Gyrus, Right Dorsolateral Prefrontal Cortex, and Right Middle Frontal Gyrus. Our approach achieved depression detection rates of 96.1% under positive stimuli, 91.3% under neutral stimuli, and 98.0% under negative stimuli, surpassing comparative methods by 5% to 12%. This framework demonstrates potential for improving early depression detection and clinical applications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 104003"},"PeriodicalIF":7.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05DOI: 10.1016/j.ipm.2024.103995
Junyi Chen, Leyuan Liu, Fan Zhou
Concerning the dissemination of rumors via tweets, current methods for detecting rumors – both content-based and graph-based – do not adequately address the need for preemptive action to suppress rumors before public exposure. Although the advancement of large language models (LLMs) indicates a positive trend, their application in rumor detection remains either overly simplistic or excessively complex. Motivated by these, we put forward EvidenceRD, which employs an efficient yet effective cooperative strategy of three types of LLMs to mine informative evidence that augments content context behind tweets warranting a fact check. This is then integrated with a credibility network, an automatically generated social context based on the social homophily theory, to depict potential credibility relationships between authors of corresponding tweets. Extensive experiments across four public datasets confirm the efficacy of the proposed EvidenceRD. Specifically, EvidenceRD outperforms state-of-the-art baselines across various categories, achieving an improvement in general detection performance ranging from 3% to 16%. This superiority is achieved by exclusively utilizing pre-public information, a constraint not imposed on the comparative baselines. Besides, As EvidenceRD considers both evidence and the social context behind a tweet, it not only offers enhanced explainability through its multi-perspective evidence presented in natural language but also demonstrates greater robustness and transferability across different scenarios. Additional efficiency analysis demonstrates that the above-enhanced characteristics brought by EvidenceRD are cost-effective in terms of both financial and computational expenses. To summarize, compared to existing studies, this work theoretically presents a generally effective and efficient paradigm for utilizing LLMs in the context of rumor detection and demonstrates how integrating accessible social context can effectively detect rumors. Practically, the proposed method can accurately detect rumors before they emerge publicly with several notable metrics improvements and aid human decision-making on them from multiple dimensions.
{"title":"Do not wait: Preemptive rumor detection with cooperative LLMs and accessible social context","authors":"Junyi Chen, Leyuan Liu, Fan Zhou","doi":"10.1016/j.ipm.2024.103995","DOIUrl":"10.1016/j.ipm.2024.103995","url":null,"abstract":"<div><div>Concerning the dissemination of rumors via tweets, current methods for detecting rumors – both content-based and graph-based – do not adequately address the need for preemptive action to suppress rumors before public exposure. Although the advancement of large language models (LLMs) indicates a positive trend, their application in rumor detection remains either overly simplistic or excessively complex. Motivated by these, we put forward EvidenceRD, which employs an efficient yet effective cooperative strategy of three types of LLMs to mine informative evidence that augments content context behind tweets warranting a fact check. This is then integrated with a credibility network, an automatically generated social context based on the social homophily theory, to depict potential credibility relationships between authors of corresponding tweets. Extensive experiments across four public datasets confirm the efficacy of the proposed EvidenceRD. Specifically, EvidenceRD outperforms state-of-the-art baselines across various categories, achieving an improvement in general detection performance ranging from 3% to 16%. This superiority is achieved by exclusively utilizing pre-public information, a constraint not imposed on the comparative baselines. Besides, As EvidenceRD considers both evidence and the social context behind a tweet, it not only offers enhanced explainability through its multi-perspective evidence presented in natural language but also demonstrates greater robustness and transferability across different scenarios. Additional efficiency analysis demonstrates that the above-enhanced characteristics brought by EvidenceRD are cost-effective in terms of both financial and computational expenses. To summarize, compared to existing studies, this work theoretically presents a generally effective and efficient paradigm for utilizing LLMs in the context of rumor detection and demonstrates how integrating accessible social context can effectively detect rumors. Practically, the proposed method can accurately detect rumors before they emerge publicly with several notable metrics improvements and aid human decision-making on them from multiple dimensions.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103995"},"PeriodicalIF":7.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05DOI: 10.1016/j.ipm.2024.103990
Xinqi Su , Dan Song , Wenhui Li , Tongwei Ren , An-An Liu
The method of image-text matching typically employs hard triplet loss as its optimization objective to learn coarse correspondences based on object co-occurrence statistics. However, due to insufficiently sampled negative instances, this coarse correspondences not only leads to the model learning biases in semantic co-occurrence but also obscures the model’s understanding of crucial semantic and significant semantic contextual dependencies. In this study, we propose the Generating Feature-level and Relation-level Counterfactual Negative Samples method (GFRN) for image-text matching. This method utilizes prior knowledge and gradients to mask key regions or words to generate feature-level counterfactual negative samples, or disrupts their important contextual dependencies through Bernoulli distributions and self-supervised learning to generate relation-level counterfactual negative samples with sufficient information. Subsequently, we employ these counterfactual samples to construct contrastive triplet losses to enhance the training of the image-text matching model. Consequently, the model’s ability to understand crucial semantic concepts and complex dependency relationships is significantly enhanced, and semantic biases are greatly reduced. Compared to state-of-the-art methods, the proposed GFRN improves rSum by 3.9% on Flickr30K, 2.0% on MSCOCO1K, and 4.8% on MSCOCO5K, with significant improvements in R@1 across all datasets.
{"title":"Generating counterfactual negative samples for image-text matching","authors":"Xinqi Su , Dan Song , Wenhui Li , Tongwei Ren , An-An Liu","doi":"10.1016/j.ipm.2024.103990","DOIUrl":"10.1016/j.ipm.2024.103990","url":null,"abstract":"<div><div>The method of image-text matching typically employs hard triplet loss as its optimization objective to learn coarse correspondences based on object co-occurrence statistics. However, due to insufficiently sampled negative instances, this coarse correspondences not only leads to the model learning biases in semantic co-occurrence but also obscures the model’s understanding of crucial semantic and significant semantic contextual dependencies. In this study, we propose the Generating Feature-level and Relation-level Counterfactual Negative Samples method (GFRN) for image-text matching. This method utilizes prior knowledge and gradients to mask key regions or words to generate feature-level counterfactual negative samples, or disrupts their important contextual dependencies through Bernoulli distributions and self-supervised learning to generate relation-level counterfactual negative samples with sufficient information. Subsequently, we employ these counterfactual samples to construct contrastive triplet losses to enhance the training of the image-text matching model. Consequently, the model’s ability to understand crucial semantic concepts and complex dependency relationships is significantly enhanced, and semantic biases are greatly reduced. Compared to state-of-the-art methods, the proposed GFRN improves rSum by 3.9% on Flickr30K, 2.0% on MSCOCO1K, and 4.8% on MSCOCO5K, with significant improvements in R@1 across all datasets.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103990"},"PeriodicalIF":7.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-05DOI: 10.1016/j.ipm.2024.104004
Zihan Wu, Yuzhen Wang, Liangmin Wang
Multi-chain data sharing refers to cross-chain data exchange among multiple blockchains. However, existing multi-chain data sharing schemes rely on direct blockchain-to-blockchain connections to establish links among multiple chains. This leads to poor scalability and low efficiency as the number of connected blockchains increases. To address these problems, we propose GAM (Group Authorization-based Multi-chain Data Sharing) for scalable and efficient multi-chain data sharing. GAM enhances scalability by organizing users from different chains into authorized virtual groups, enabling trusted data sharing within it. To further improve efficiency, the data sharing authorization process is executed on-chain, while data transfer is based on off-chain storage. We provide a formal analysis of GAM in multi-chain scenarios and implement a proof-of-concept prototype using Hyperledger Fabric. Experimental results demonstrate that GAM is effective in reducing the execution time of multi-chain data sharing while maintaining high transaction throughput and minimal end-to-end delay.
{"title":"GAM: A scalable and efficient multi-chain data sharing scheme","authors":"Zihan Wu, Yuzhen Wang, Liangmin Wang","doi":"10.1016/j.ipm.2024.104004","DOIUrl":"10.1016/j.ipm.2024.104004","url":null,"abstract":"<div><div>Multi-chain data sharing refers to cross-chain data exchange among multiple blockchains. However, existing multi-chain data sharing schemes rely on direct blockchain-to-blockchain connections to establish links among multiple chains. This leads to poor scalability and low efficiency as the number of connected blockchains increases. To address these problems, we propose GAM (Group Authorization-based Multi-chain Data Sharing) for scalable and efficient multi-chain data sharing. GAM enhances scalability by organizing users from different chains into authorized virtual groups, enabling trusted data sharing within it. To further improve efficiency, the data sharing authorization process is executed on-chain, while data transfer is based on off-chain storage. We provide a formal analysis of GAM in multi-chain scenarios and implement a proof-of-concept prototype using Hyperledger Fabric. Experimental results demonstrate that GAM is effective in reducing the execution time of multi-chain data sharing while maintaining high transaction throughput and minimal end-to-end delay.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 104004"},"PeriodicalIF":7.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1016/j.ipm.2024.103960
Qi Wang, Yi Liu
In the realm of emergency management involving multi-party collaboration, securely sharing data among multiple entities poses a significant challenge. Traditional centralized data platforms struggle with fine-grained access control for data from different sources and are susceptible to single-point failures and risks of forged authorization. This paper delves into the prospective application of blockchain technology for decentralized data governance. We propose a dynamic access control technology for multi-party collaborative emergency response, which combines blockchain and attribute-based access control technologies to enable dynamic access control in multi-party data sharing scenarios. A data sharing system for collaborative emergency management built with Hyperledger Fabric has been developed and subjected to a series of evaluation tests. The experimental results reveal that our blockchain-based system not only significantly enhances data security and privacy by leveraging decentralized control but also improves the efficiency and reliability of data sharing in emergency situations. Specifically, our findings demonstrate the system's ability to dynamically adapt access control policies based on changing emergency scenarios, effectively mitigating risks associated with single-point failures and unauthorized access. These outcomes underscore the viability and reliability of our approach, providing a robust framework for secure, scalable, and flexible data sharing in the demanding field of emergency management.
{"title":"Blockchain empowered dynamic access control for secure data sharing in collaborative emergency management","authors":"Qi Wang, Yi Liu","doi":"10.1016/j.ipm.2024.103960","DOIUrl":"10.1016/j.ipm.2024.103960","url":null,"abstract":"<div><div>In the realm of emergency management involving multi-party collaboration, securely sharing data among multiple entities poses a significant challenge. Traditional centralized data platforms struggle with fine-grained access control for data from different sources and are susceptible to single-point failures and risks of forged authorization. This paper delves into the prospective application of blockchain technology for decentralized data governance. We propose a dynamic access control technology for multi-party collaborative emergency response, which combines blockchain and attribute-based access control technologies to enable dynamic access control in multi-party data sharing scenarios. A data sharing system for collaborative emergency management built with Hyperledger Fabric has been developed and subjected to a series of evaluation tests. The experimental results reveal that our blockchain-based system not only significantly enhances data security and privacy by leveraging decentralized control but also improves the efficiency and reliability of data sharing in emergency situations. Specifically, our findings demonstrate the system's ability to dynamically adapt access control policies based on changing emergency scenarios, effectively mitigating risks associated with single-point failures and unauthorized access. These outcomes underscore the viability and reliability of our approach, providing a robust framework for secure, scalable, and flexible data sharing in the demanding field of emergency management.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103960"},"PeriodicalIF":7.4,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1016/j.ipm.2024.103979
Xiaonan Yang , Jinjie Zhou , Deshan Yang, Yunwei Wan, Limin Pan, Senlin Luo
Speech compression attempts to yield a compact bitstream that can represent a speech signal with minimal distortion by eliminating redundant information, which is increasingly challenging as the bitrate decreases. However, existing neural speech codecs do not fully exploit the information from previous speech sequences, and learning encoded features blindly leads to the ineffective removal of redundant information, resulting in suboptimal reconstruction quality. In this work, we propose an end-to-end speech codec with intra-inter broad attention, named IBACodec, that efficiently compresses speech across different types of datasets, including LibriTTS, LJSpeech, and more. By designing an intra-inter broad transformer that integrates multi-head attention networks and LSTM, our model captures broad attention with direct context awareness between the intra- and inter-frames of speech. Furthermore, we present a dual-branch conformer for channel-wise modeling to effectively eliminate redundant information. In subjective evaluations using speech at a 24 kHz sampling rate, IBACodec at 6.3 kbps is comparable to SoundStream at 9 kbps and better than Opus at 9 kbps, with about 30 % fewer bits. Objective experimental results show that IBACodec outperforms state-of-the-art codecs across a wide range of bitrates, with an average ViSQOL, LLR, and CEP improvement of up to 4.97 %, 38.94 %, and 25.39 %, respectively.
{"title":"IBACodec: End-to-end speech codec with intra-inter broad attention","authors":"Xiaonan Yang , Jinjie Zhou , Deshan Yang, Yunwei Wan, Limin Pan, Senlin Luo","doi":"10.1016/j.ipm.2024.103979","DOIUrl":"10.1016/j.ipm.2024.103979","url":null,"abstract":"<div><div>Speech compression attempts to yield a compact bitstream that can represent a speech signal with minimal distortion by eliminating redundant information, which is increasingly challenging as the bitrate decreases. However, existing neural speech codecs do not fully exploit the information from previous speech sequences, and learning encoded features blindly leads to the ineffective removal of redundant information, resulting in suboptimal reconstruction quality. In this work, we propose an end-to-end speech codec with intra-inter broad attention, named IBACodec, that efficiently compresses speech across different types of datasets, including LibriTTS, LJSpeech, and more. By designing an intra-inter broad transformer that integrates multi-head attention networks and LSTM, our model captures broad attention with direct context awareness between the intra- and inter-frames of speech. Furthermore, we present a dual-branch conformer for channel-wise modeling to effectively eliminate redundant information. In subjective evaluations using speech at a 24 kHz sampling rate, IBACodec at 6.3 kbps is comparable to SoundStream at 9 kbps and better than Opus at 9 kbps, with about 30 % fewer bits. Objective experimental results show that IBACodec outperforms state-of-the-art codecs across a wide range of bitrates, with an average ViSQOL, LLR, and CEP improvement of up to 4.97 %, 38.94 %, and 25.39 %, respectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103979"},"PeriodicalIF":7.4,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1016/j.ipm.2024.103973
Shengzhi Huang , Yong Huang , Yinpeng Liu , Zhuoran Luo , Wei Lu
Large language models (LLMs) are a new generation of conversational language model with impressive semantic comprehension, text generation, and knowledge inference capabilities. LLMs are significantly influencing the development of science by assisting researchers in analyzing, understanding, and grasping original knowledge in scientific papers. This study investigates LLMs’ potential as qualified reviewers in originality evaluation in zero-shot learning, utilizing a unique, manually crafted prompt. Using biomedical papers as the data source, we constructed two evaluation datasets based on Nobel Prize papers and disruptive index. The evaluation performance of multiple LLMs of different types and scales on the datasets was scrutinized through the analysis of originality score (OS), originality type (OT), and originality description (OD), all of which were generated by the LLM. Our results show that LLMs can to some extent discern papers with distinct originality level via OS; however, they appear to be overly lenient reviewers. In LLMs’ evaluation mechanism, five distinct OTs reflecting varied research contributions do not manifest independently, but together they positively influence OS. Of all the LLMs analyzed, GPT-4 stood out as able to produce the most readable ODs, effectively explaining the inference process for both OS and OT from the perspectives of completeness, logicality, and regularity.
{"title":"Are large language models qualified reviewers in originality evaluation?","authors":"Shengzhi Huang , Yong Huang , Yinpeng Liu , Zhuoran Luo , Wei Lu","doi":"10.1016/j.ipm.2024.103973","DOIUrl":"10.1016/j.ipm.2024.103973","url":null,"abstract":"<div><div>Large language models (LLMs) are a new generation of conversational language model with impressive semantic comprehension, text generation, and knowledge inference capabilities. LLMs are significantly influencing the development of science by assisting researchers in analyzing, understanding, and grasping original knowledge in scientific papers. This study investigates LLMs’ potential as qualified reviewers in originality evaluation in zero-shot learning, utilizing a unique, manually crafted prompt. Using biomedical papers as the data source, we constructed two evaluation datasets based on Nobel Prize papers and disruptive index. The evaluation performance of multiple LLMs of different types and scales on the datasets was scrutinized through the analysis of originality score (OS), originality type (OT), and originality description (OD), all of which were generated by the LLM. Our results show that LLMs can to some extent discern papers with distinct originality level via OS; however, they appear to be overly lenient reviewers. In LLMs’ evaluation mechanism, five distinct OTs reflecting varied research contributions do not manifest independently, but together they positively influence OS. Of all the LLMs analyzed, GPT-4 stood out as able to produce the most readable ODs, effectively explaining the inference process for both OS and OT from the perspectives of completeness, logicality, and regularity.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103973"},"PeriodicalIF":7.4,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1016/j.ipm.2024.104006
Jianxin Deng , Gang Liu , Ling Wang , Jiawei Liang , Bolin Dai
To improve the accuracy and automation of table extraction from journal articles, we present an efficient method for automatically extracting data from tables in PDF-based journal articles using table texts and border features. All characters and lines in each article are obtained from the text stream of the target PDF file. The table area is then located via the filtering rules and algorithm designed utilizing the obtained features of the table, such as text size, border length, and absolute location of elements. Furthermore, an improved hierarchical clustering algorithm is designed to restore the logical structure of the table, which includes single-linkage clustering and agglomerative nesting based on border constraints. By combining text block layout features, it restores the entire process of character merging, text block clustering, and cell clustering. Finally, by constructing a table structure to restore the correspondence between the header and body, the content output with the desired correct structure is achieved. The table area detection accuracy, logic and content extraction accuracy, information loss rate, extraction efficiency and comprehensive performance were utilized to quantify the performance. Through the extraction experiment of a dataset comprising 500 academic articles with 1157 tables, it indicated the weighted average F1 for table detection achieved 0.963, and the F1 values for logical-structure restoration and content accuracy reached 0.856 and 0.889, respectively. Compared to Tabula, ABBYY FineReader, and TabbyPDF, this method exhibited the highest efficiency, minimal information loss, and best overall performance. The proposed method enables rapid and large-scale acquisition of table data from PDF-based journal articles.
{"title":"An efficient extraction method of journal-article table data for data-driven applications","authors":"Jianxin Deng , Gang Liu , Ling Wang , Jiawei Liang , Bolin Dai","doi":"10.1016/j.ipm.2024.104006","DOIUrl":"10.1016/j.ipm.2024.104006","url":null,"abstract":"<div><div>To improve the accuracy and automation of table extraction from journal articles, we present an efficient method for automatically extracting data from tables in PDF-based journal articles using table texts and border features. All characters and lines in each article are obtained from the text stream of the target PDF file. The table area is then located via the filtering rules and algorithm designed utilizing the obtained features of the table, such as text size, border length, and absolute location of elements. Furthermore, an improved hierarchical clustering algorithm is designed to restore the logical structure of the table, which includes single-linkage clustering and agglomerative nesting based on border constraints. By combining text block layout features, it restores the entire process of character merging, text block clustering, and cell clustering. Finally, by constructing a table structure to restore the correspondence between the header and body, the content output with the desired correct structure is achieved. The table area detection accuracy, logic and content extraction accuracy, information loss rate, extraction efficiency and comprehensive performance were utilized to quantify the performance. Through the extraction experiment of a dataset comprising 500 academic articles with 1157 tables, it indicated the weighted average <em>F</em>1 for table detection achieved 0.963, and the <em>F</em>1 values for logical-structure restoration and content accuracy reached 0.856 and 0.889, respectively. Compared to Tabula, ABBYY FineReader, and TabbyPDF, this method exhibited the highest efficiency, minimal information loss, and best overall performance. The proposed method enables rapid and large-scale acquisition of table data from PDF-based journal articles.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 104006"},"PeriodicalIF":7.4,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1016/j.ipm.2024.104001
Luyi Bai, Han Zhang, Xuanxuan An, Lin Zhu
Multi-hop reasoning on knowledge graphs is an important way to complete the knowledge graph. However, existing multi-hop reasoning methods often perform poorly in few-shot scenarios and primarily focus on static knowledge graphs, neglecting to model the dynamic changes of events over time in Temporal Knowledge Graphs (TKGs). Therefore, in this paper, we consider the few-shot multi-hop reasoning task on TKGs and propose a few-shot multi-hop reasoning model for TKGs (TFSM), which uses a reinforcement learning framework to improve model interpretability and introduces the one-hop neighbors of the task entity to consider the impact of previous events on the representation of current task entity. In order to reduce the cost of searching complex nodes, our model adopts a strategy based on path search and prunes the search space by considering the correlation between existing paths and the current state. Compared to the baseline method, our model achieved 5-shot Few-shot Temporal Knowledge Graph (FTKG) performance improvements of 1.0% ∼ 18.9% on ICEWS18-few, 0.6% ∼ 22.9% on ICEWS14-few, and 0.7% ∼ 10.5% on GDELT-few. Extensive experiments show that TFSM outperforms existing models on most metrics on the commonly used benchmark datasets ICEWS18-few, ICEWS14-few, and GDELT-few. Furthermore, ablation experiments demonstrated the effectiveness of each part of our model. In addition, we demonstrate the interpretability of the model by performing path analysis with a path search-based strategy.
{"title":"Few-shot multi-hop reasoning via reinforcement learning and path search strategy over temporal knowledge graphs","authors":"Luyi Bai, Han Zhang, Xuanxuan An, Lin Zhu","doi":"10.1016/j.ipm.2024.104001","DOIUrl":"10.1016/j.ipm.2024.104001","url":null,"abstract":"<div><div>Multi-hop reasoning on knowledge graphs is an important way to complete the knowledge graph. However, existing multi-hop reasoning methods often perform poorly in few-shot scenarios and primarily focus on static knowledge graphs, neglecting to model the dynamic changes of events over time in Temporal Knowledge Graphs (TKGs). Therefore, in this paper, we consider the few-shot multi-hop reasoning task on TKGs and propose a few-shot multi-hop reasoning model for TKGs (TFSM), which uses a reinforcement learning framework to improve model interpretability and introduces the one-hop neighbors of the task entity to consider the impact of previous events on the representation of current task entity. In order to reduce the cost of searching complex nodes, our model adopts a strategy based on path search and prunes the search space by considering the correlation between existing paths and the current state. Compared to the baseline method, our model achieved 5-shot Few-shot Temporal Knowledge Graph (FTKG) performance improvements of 1.0% ∼ 18.9% on ICEWS18-few, 0.6% ∼ 22.9% on ICEWS14-few, and 0.7% ∼ 10.5% on GDELT-few. Extensive experiments show that TFSM outperforms existing models on most metrics on the commonly used benchmark datasets ICEWS18-few, ICEWS14-few, and GDELT-few. Furthermore, ablation experiments demonstrated the effectiveness of each part of our model. In addition, we demonstrate the interpretability of the model by performing path analysis with a path search-based strategy.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 104001"},"PeriodicalIF":7.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142756610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-29DOI: 10.1016/j.ipm.2024.103996
Shangyuan Li , Shiman Zhao , Zhuoran Zhang , Zihao Fang , Wei Chen , Tengjiao Wang
The task of Legal Judgment Prediction (LJP) aims to forecast case outcomes by analyzing fact descriptions, playing a pivotal role in enhancing judicial system efficiency and fairness. Existing LJP methods primarily focus on improving representations of fact descriptions to enhance judgment performance. However, these methods typically depend on the superficial case information and neglect the underlying legal basis, resulting in a lack of in-depth reasoning and interpretability in the judgment process of long-tail or confusing cases. Recognizing that the basis for judgments in real-world legal contexts encompasses both factual logic and related legal knowledge, we introduce the interpretable legal judgment reasoning framework with multi-source knowledge prompted. The essence of this framework is to transform the implicit factual logic of cases and external legal knowledge into explicit basis for judgment, aiming to enhance not only the accuracy of judgment predictions but also the interpretability of the reasoning process. Specifically, we design a chain prompt reasoning module that guides a large language model to elucidate factual logic basis through incremental reasoning, aligning the model prior knowledge with task-oriented knowledge in the process. To match the above fact-based information with legal knowledge basis, we propose a contrastive knowledge fusing module to inject external statutes knowledge into the fact description embedding. It pushes away the distance of similar knowledge in the semantic space during the encoding of external knowledge base without manual annotation, thus improving the judgment prediction performance of long-tail and confusing cases. Experimental results on two real datasets indicate that our framework significantly outperforms existing LJP baseline methods in accuracy and interpretability, achieving new state-of-the-art performance. In addition, tests on specially constructed long-tail and confusing case datasets demonstrate that the proposed framework possesses improved generalization abilities for predicting these complex cases.
{"title":"Basis is also explanation: Interpretable Legal Judgment Reasoning prompted by multi-source knowledge","authors":"Shangyuan Li , Shiman Zhao , Zhuoran Zhang , Zihao Fang , Wei Chen , Tengjiao Wang","doi":"10.1016/j.ipm.2024.103996","DOIUrl":"10.1016/j.ipm.2024.103996","url":null,"abstract":"<div><div>The task of Legal Judgment Prediction (LJP) aims to forecast case outcomes by analyzing fact descriptions, playing a pivotal role in enhancing judicial system efficiency and fairness. Existing LJP methods primarily focus on improving representations of fact descriptions to enhance judgment performance. However, these methods typically depend on the superficial case information and neglect the underlying legal basis, resulting in a lack of in-depth reasoning and interpretability in the judgment process of long-tail or confusing cases. Recognizing that the basis for judgments in real-world legal contexts encompasses both factual logic and related legal knowledge, we introduce the interpretable legal judgment reasoning framework with multi-source knowledge prompted. The essence of this framework is to transform the implicit factual logic of cases and external legal knowledge into explicit basis for judgment, aiming to enhance not only the accuracy of judgment predictions but also the interpretability of the reasoning process. Specifically, we design a chain prompt reasoning module that guides a large language model to elucidate factual logic basis through incremental reasoning, aligning the model prior knowledge with task-oriented knowledge in the process. To match the above fact-based information with legal knowledge basis, we propose a contrastive knowledge fusing module to inject external statutes knowledge into the fact description embedding. It pushes away the distance of similar knowledge in the semantic space during the encoding of external knowledge base without manual annotation, thus improving the judgment prediction performance of long-tail and confusing cases. Experimental results on two real datasets indicate that our framework significantly outperforms existing LJP baseline methods in accuracy and interpretability, achieving new state-of-the-art performance. In addition, tests on specially constructed long-tail and confusing case datasets demonstrate that the proposed framework possesses improved generalization abilities for predicting these complex cases.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 3","pages":"Article 103996"},"PeriodicalIF":7.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142744584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}