The algorithm will pass each immediate next state to a function which will return some value. The action/ future state which gets the max value is selected as the next move.
Thus the goal is to maximize some function.
Things that would influence agent's movement
- Score
- Distance from food
- [x] lesser the distance from food, higher is the will to move in that direction
- [x] if len(food) is getting reduced, more reward
- Distance from ghosts (implemented even though not required)
- [x] The farther away from ghosts the better
- [ ] Punish for being too close to ghosts ?
- Distance from power pellet (not implemented as not required)
- [ ] Reward for moving close to the power pellet
- Scared time of ghosts (not implemented as not required)
- [ ] Don't consider Distance from ghosts if this is not null