A recommender or recommendation system is a subclass of information filtering systems that seeks to predict the “rating” or “preference” that a user would assign to an item. Although many collaborative filtering (CF) approaches based on neural matrix factorization (NMF) have been successful, significant scope for improvement in recommendation systems exists. The primary challenge in recommender systems is to extract high-quality user–item interaction information from sparse data. However, most studies have focused on additional review text or metadata instead of fully used high-order relationships between users and items. In this paper, we propose a novel model—Cross Neighborhood Attention Network (CNAN)—that solves this problem by designing high-order neighborhood selection and neighborhood attention networks to learn user–item interaction efficiently. Our CNAN performs rating prediction using an architecture considering only user–item interaction data. Furthermore, the proposed model uses only user–item interaction (from the user–item ratings matrix) information without additional information such as review text or metadata. We evaluated the effectiveness of the proposed model by performing experiments on five datasets with review text and three datasets with metadata. Consequently, the CNAN model demonstrated a performance improvement of up to 7.59% over the model using review text and up to 1.99% over the model using metadata. Experimental results show that CNAN achieves better recommendation performance through higher-order neighborhood information integration with neighborhood selection and attention. The results show that our model delivers higher prediction performance via efficient structural improvement without using additional information.