Light field salient object detection (LF SOD) is an important task that aims to segment visually salient objects from the surroundings. However, existing methods still struggle to achieve accurate detection, especially in complex scenes. Recently, segment anything model (SAM) excels in various vision tasks with its strong object segmentation ability and generalization capability, which is suitable for solving the LF SOD challenge. In this paper, we aim to adapt the SAM for accurate LF SOD. Specifically, we propose a network named SAMNet with two adaptation designs. Firstly, to enhance the perception of salient objects, we design a task-oriented multi-scale convolution adapter (MSCA) integrated into SAM’s image encoder. Parameters in the image encoder except MSCA are frozen to balance detection accuracy and computational requirements. Furthermore, to effectively utilize the rich scene information of LF data, we design a data-oriented cross-modal fusion module (CMFM) to fuse SAM features of different modalities. Comprehensive experiments on four benchmark datasets demonstrate the effectiveness of SAMNet over current state-of-the-art methods. In particular, SAMNet achieves the highest F-measures of 0.945, 0.819, 0.868, and 0.898, respectively. To the best of our knowledge, this is the first work that adapts a vision foundation model to LF SOD.