Entity Resolution (ER) is a critical task in data cleaning and integration, traditionally focusing on structured relational tables with aligned schemas. However, real-world applications often involve diverse data formats, leading to the emergence of Generalized Entity Resolution, which addresses structured, semi-structured, and unstructured data. While prompt-based methods have shown promise in improving entity resolution, they suffer from significant limitations such as sensitivity to prompt design and instability across heterogeneous data formats. To address these challenges, we propose CrossER, a novel framework that integrates cross-attention mechanisms, contrastive learning, and data augmentation. CrossER employs a cross-attention module to dynamically align attributes across heterogeneous data sources, enabling accurate entity resolution. To enhance robustness, contrastive learning constructs discriminative feature representations, and data augmentation introduces variability to improve adaptability to noisy and complex datasets. Experimental results on multiple real-world datasets demonstrate that CrossER significantly outperforms state-of-the-art Generalized Entity Resolution methods in F1 scores while maintaining computational efficiency. Furthermore, CrossER exhibits minimal dependency on specific pre-trained language models and delivers superior recall rates compared to baseline methods, especially in challenging heterogeneous datasets.
扫码关注我们
求助内容:
应助结果提醒方式:
