© 2016 Elsevier Ltd In this paper, we present a novel approach for computing the Pareto frontier in Multi-Objective Markov Chains Problems (MOMCPs) that integrates a regularized penalty method for poly-linear functions. In addition, we present a method that make the Pareto frontier more useful as decision support system: it selects the ideal multi-objective option given certain bounds. We restrict our problem to a class of finite, ergodic and controllable Markov chains. The regularized penalty approach is based on the Tikhonov's regularization method and it employs a projection-gradient approach to find the strong Pareto policies along the Pareto frontier. Different from previous regularized methods, where the regularizator parameter needs to be large enough and modify (some times significantly) the initial functional, our approach balanced the value of the functional using a penalization term (μ) and the regularizator parameter (δ) at the same time improving the computation of the strong Pareto policies. The idea is to optimize the parameters μ and δ such that the functional conserves the original shape. We set the initial value and then decrease it until each policy approximate to the strong Pareto policy. In this sense, we define exactly how the parameters μ and δ tend to zero and we prove the convergence of the gradient regularized penalty algorithm. On the other hand, our policy-gradient multi-objective algorithms exploit a gradient-based approach so that the corresponding image in the objective space gets a Pareto frontier of just strong Pareto policies. We experimentally validate the method presenting a numerical example of a real alternative solution of the vehicle routing planning problem to increase security in transportation of cash and valuables. The decision-making process explored in this work correspond to the most frequent computational intelligent models applied in practice within the Artificial Intelligence research area.