In recent years, substantial advancements have been achieved in vision-language integration and image segmentation, particularly through the use of pre-trained models like BERT and Vision Transformer (ViT). Within the domain of open-vocabulary instance segmentation (OVIS), accurately identifying an instance's positional information is critical, as it directly influences the precision of subsequent segmentation tasks. However, many existing methods rely on supplementary networks to generate pseudo-labels, such as multiple anchor frames containing object positional information. While these pseudo-labels aid visual language models in recognizing the absolute position of objects, they often compromise the overall efficiency and performance of the OVIS pipeline. In this study, we introduce a novel Automated Pixel-level OVIS (APOVIS) framework aimed at enhancing OVIS. Our approach automatically generates pixel-level annotations by leveraging the matching capabilities of pre-trained vision-language models for image-text pairs alongside a foundational segmentation model that accepts multiple prompts (e.g., points or anchor boxes) to guide the segmentation process. Specifically, our method first utilizes a pre-trained vision-language model to match instances within image-text pairs to identify relative positions. Next, we employ activation maps to visualize the instances, enabling us to extract instance location information and generate pseudo-label prompts that direct the segmentation process. These pseudo-labels then guide the segmentation model to execute pixel-level segmentation, enhancing both the accuracy and generalizability of object segmentation across images. Extensive experimental results demonstrate that our model significantly outperforms current state-of-the-art models in object detection accuracy and pixel-level instance segmentation on the COCO dataset. Additionally, the generalizability of our approach is validated through image-text pair data inference tasks on the Open Images, Pascal VOC 2012, Pascal Context, and ADE20K datasets. The code will be available at https://github.com/ijetma/APOVIS.