Nowadays, the swiftly advancing and intricately diverse IoT node devices produces high-dimensional, discrete, and temporally dynamic network traffic feature data. The ensuing data distribution sparsity and concept drift can critically impair the effectiveness of traditional deep learning-based intrusion detection models. To address these issues, we propose an incremental contrastive learning-based intrusion detection framework for IoT networks, CAEAID. On one hand, to tackle the high-dimensional sparse distribution of traffic, we construct a contrastive autoencoder. It effectively learns low-dimensional latent representations of IoT traffic features by minimizing the distance between similar samples while maximizing the distance between dissimilar samples. Subsequently, we identify abnormal traffic based on distance. The contrastive autoencoder clarifies the boundaries of traffic categories and alleviates the challenges posed by high-dimensional sparse spaces. Simultaneously, we apply improved extreme value theory to fit IoT traffic features and adaptively establish thresholds for detecting extreme discrete anomalous traffic for auxiliary analysis. On the other hand, to handle concept drift, CAEAID creates a pseudo-labeled dataset based on detection consistency, enabling incremental learning and periodic model updates for adaptive detection. Experimental results indicate that compared to other advanced methods, CAEAID improves the accuracy on the IoTID20 and CICIDS2018 datasets by at least 1.15% and 1.72%, respectively. Furthermore, the framework demonstrates superior performance in precision, recall, and F1-score.