Fruit uploading and packaging are labor-intensive and time-consuming steps in postharvest industry, which involve continuous pick-and-place manipulation. In this case, we aim to replace manual working with robotic grasping. However, for robotic fragile fruit grasping, the main difficulty is to reduce early stage bruise while maintaining the grasping reliability. In this study, we aim to solve this problem and achieve reliable and damage-less robotic fragile fruit grasping. Inspired by the structure of the Asian elephant trunk with its feeding behavior, a bionic and pneumatic soft gripper was designed, and a multimodal grasping strategy was proposed. Similar with Asian elephant trunk, the gripper has two designed trapezoid air chambers to control the two individual parts, including fingertip-like process and enveloping structure. Enveloping grasping behavior was imitated with larger area of contact, less contact force, and larger pull off force. A visuo-tactile multimodal grasping strategy was integrated into the robotic grasping system. The visual modality was developed for positioning and grasp pose estimation. The tactile modality was employed for grasping pose confirmation and closed-loop grasping force control. In the experiment on the enveloping gripper, the maximum contact force and the pull off force reached a good balance and were 0.7083 N and 7.959 N, respectively. With the proposed multimodal grasping strategy, the grasping success rate increased 4.23 % to 96.70 %. As for closed-loop control of the grasping force, the average value for steady-state error and maximum overshoot were 0.0856 N and 26.43 %, respectively. The experiment on Spatial Frequency Domain Imaging (SFDI) demonstrated the effectiveness of our enveloping gripper in reducing the early stage bruise. To some extent, the designed enveloping gripper with the proposed multimodal strategy could achieve reliable and damage-less fragile fruit grasping, which is promising in fruit postharvest industry.