This paper introduces a novel approach for designing safe optimal controllers that avoid destructive conflicts between safety and performance in a large domain of the system’s operation. Designing computationally tractable feedback controllers that respect safety for a given set is impossible in general. The best one can do in this case is to maximize the region contained in the safe set that respects both safety and optimality. To this end, our key contribution lies in constructing a safe optimal domain of attraction (DoA) that ensures optimal convergence of the system’s trajectories to the origin without violating safety. To accomplish this, we leverage the concept of the relaxed Hamilton–Jacobi–Bellman (HJB) equation, which allows us to learn the most permissive control barrier certificates (CBCs) with a maximum-volume conflict-free set by solving a tractable optimization problem. To enhance computational efficiency, we present an innovative sum-of-squares (SOS)-based algorithm, breaking down the optimization problem into smaller SOS programs at each iteration. To alleviate the need for the system model to solve these SOS optimizations, an SOS-based off-policy reinforcement learning (RL) method is presented. This off-policy learning approach enables the evaluation of a target policy distinct from the behavior policy used for data collection, ensuring safe exploration under mild assumptions. In the end, the simulation results are given to show the efficacy of the proposed method.