POMDP

General Info[]

What[]

A POMDP stands for partially observed markov decision process. It is a markov decision process where the states are replaced with beliefs

Notes[]

These can be solved with the same techniques that are used to solve MDPs. However, the catch is that belief space is enormous

Solving POMDPs[]

To solve these problems, we will employ a truncated expectimax to compute approximate values of actions. Looking at the example from the right, we see the gist of the expectimax computation. We run this computation every time the agent takes an action. Obviously, this is not optimal, but what it provides us with is a VPI agent, one that can reasonably manage beliefs and actions.