Estimating optimal adaptive treatment strategies (ATSs) can be done in several ways, including dynamic weighted ordinary least squares (dWOLS). This approach is doubly robust as it requires modeling both the treatment and the response, but only one of those models needs to be correctly specified to obtain a consistent estimator. For estimating an average treatment effect, doubly robust methods have been shown to combine better with machine learning methods than alternatives. However, the use of machine learning within dWOLS has not yet been investigated. Using simulation studies, we evaluate and compare the performance of the dWOLS estimator when the treatment probability is estimated either using machine learning algorithms or a logistic regression model. We further investigate the use of an adaptive