Accurate localization of picking points in non-structural environments is crucial for intelligent picking of ripe citrus with a harvesting robot. However, citrus pedicels are too small and resemble other background objects in color, making it challenging to detect and localize the picking point of citrus fruits. This work presents a novel approach for detecting and localizing citrus picking points using binocular vision. First, the convolutional block attention module (CBAM) attention model is integrated into the backbone network of Mask R-CNN to increase the feature extraction for citrus pedicels, and the soft-non maximum suppression (Soft-NMS) strategy is used in the region proposal network to enhance the detection performance of citrus pedicel. Second, to accurately associate the citrus fruit with the best detected pedicel, a maximum discrimination criterion is proposed by integrating the confidence score of the detected pedicel and the degree of positional connectivity between the pedicel and the fruit. Finally, to reduce matching errors and improve computational efficiency, a rapid and robust matching method based on the normalized cross-correlation was applied to search the picking point within the line segment between the left and right images. The experimental results show that the precision, recall and F1-score for pedicel detection are 95.04%, 88.11%, and 91.44%, respectively, which are improvement of 13.00%, 7.84%, and 10.30% compared to the original Mask R-CNN. The mean absolute error (MAE) for the localizing the citrus picking point is 8.63 mm and the mean relative error (MRE) is 2.76%. The MRE was significantly reduced by at least 1.2% compared to the stereo matching methods belief-propagation (BP), semi-global block matching (SGBM), and block matching (BM), respectively. This study provides an effective method for the precise detection and localization of citrus picking point for a harvesting robot.