The booming development of automation in industry has seen robotic arms replace much of manual labor for tasks such as casting, processing, packaging, and gripping on production lines. The Internet of Things (IoT) framework enables machines to transmit data over networks, and combining it with artificial intelligence can create smarter systems with higher operational efficiency and quality. However, artificial intelligence models need to be optimized for different applications. This paper proposes a You Only Look Once–uniform experimental design (YOLO–UED) model for gripping tasks performed by an IoT-based robotic arm. The YOLO–UED model was designed by combining the YOLOv4 model with UED to optimize the model architecture, resulting in improved performance in various applications. Considering the huge expense of computational resources required for visual inspection with robotic arms, pairing each robotic arm with a high-performance computing device would substantially increase costs. This study proposed an IoT framework to transmit the images captured by the robotic arm to a computing server for object recognition. Utilizing the IoT framework helps reduce costs and provides scalability and flexibility in handling computational tasks. The proposed method was found to effectively enhance the model's mean average precision to 95 %. The YOLO–UED model exhibited 7–10 % improvement over the YOLOv4 model in terms of target recognition accuracy. Moreover, the proposed method attained a success rate of 90% in gripping tasks performed on objects placed at various angles.