Suppose there exist a Virtual Learning Environment in which agent plays a role of the teacher. With time it moves to different states and makes decisions on which action to choose for moving from current state to the next state. Some actions taken are better than some others. The transition process through the set of states ends in some final (goal) state, being in which it gives for the agent the largest benefit. The best way of action is to reach the goal state with maximum return available. The system is formalized as Markov Decision Process and the Q-Learning algorithm is applied to find of such kind criterion that optimises the behavior of the agent.
This work is licensed under a Creative Commons Attribution 4.0 International License.