The algorithm will pass each immediate next state to a function which will return some value. The action/ future state which gets the max value is selected as the next move.

Thus the goal is to maximize some function.

Things that would influence agent's movement