In this paper, we formulate the problem of optimal Wildfire Suppression as an infinite horizon Decision Process (DP) problem where an agent (e.g., a robotic firefighter) decides which areas to intervene to extinguish the fire. The dynamics of the wildfire is modeled by a cellular automaton whose state at time k is defined as a bi-dimensional grid $$x:k$$ where each cell in this grid describes the state of a rectangular geographic region of the wildland. The proposed algorithm, which is based on a non-parametric reinforcement learning (RL) methodology, computes optimized control policies that determine the agent’s actions that minimize a cost function that aims to preserve most of the cells with trees. From a given state $$x:k$$, the proposed algorithms employs rollout to take advantage of heuristic solutions to approximate, in polynomial-time, the future cost function. Two different heuristics approaches were applied: A corrective-based model that only takes into account surrounding burning cells, and a predictive strategy that calculates a coefficient-based metric over nearby trees and empty cells. We implemented a parallel sampler using CUDA to simulate the trajectories that rollout generates. This parallel implementation allows us to increase the number of lookahead steps without incurring in large computing times. Our experimental results show that the rollout strategy outperforms the base heuristics and that effectively suppresses wildfires.