Exploring Maze Navigation with Partially Observable Markov Decision Processes in Python

Strategies for Solving Partially Observable Markov Decision Processes

The provided code defines and simulates a Partially Observable Markov Decision Process (POMDP) within a simple maze environment.

Here’s a step-by-step explanation of each component of the code:

Step 1: Define the MazePOMDP Class

This class initializes the maze environment, including states, actions, observations, and observation noise.

class MazePOMDP: def __init__(self, maze_size, observation_noise): self.maze_size = maze_size self.states = [(x, y) for x in range(maze_size) for y in range(maze_size)] self.actions = ["up", "down", "left", "right"] self.observations = [(x, y) for x in range(maze_size) for y in range(maze_size)] self.observation_noise = observation_noise

Step 2: Implement Transition Logic

Defines how the agent moves in the maze based on its action, adjusting for maze boundaries.

def transition(self, state, action): x, y = state if action == "up": return (max(x - 1, 0), y) elif action == "down": return (min(x + 1, self.maze_size - 1), y) elif action == "left": return (x, max(y - 1, 0)) elif action == "right": return (x, min(y + 1, self.maze_size - 1))

Step 3: Observation Function

Simulates receiving an observation which may or may not be noisy.

def observation(self, state, action, next_state): if random.random() < self.observation.A noisy correct position observation is more likely, otherwise a random position is observed.

Step 4: Reward Function

Calculates rewards based on the agent’s state post-action.

def reward(self, state, action): if state == (self.maze_size - 1, self.maze_size - 1): return 10 elif state in [(1, 1), (2, 2), (3, 3)]: return -5 else: return -1

Step 5: Print Maze Function

Visualizes the maze with the agent, goal, obstacles, and empty spaces.

def print_maze(agent_position, maze_size): for i in range(maze_size): for j in range(maze_size): if (i, j) == agent_position: print("A", end=" ") elif (i, j) == (maze_size - 1, maze_size - 1): print("G", end=" ") elif (i, j) in [(1, 1), (2, 2), (3, 3)]: print("X", end=" ") else: print(".", end=" ") print()

Step 6: Main Function

Executes the simulation, choosing actions randomly and updating beliefs based on observations.

def main(): maze_size = 5 observation_noise = 0.2 pomdp = MazePOMDP(maze_size, observation_noise) num_simulations = 10 belief = (0, 0) for i in range(num_simulations): action = random.choice(pomdp.actions) next_state = pomdp.transition(belief, action) observation = pomdp.observation(belief, action, next_state) reward = pomdp.reward(next_state, action) print_maze(next_state, maze_size) belief = observation

Step 7: Run the Program

Ensures the program runs only if executed as the main module.

if __name__ == "__main__": main()

Python

import random

class MazePOMDP:
    def __init__(self, maze_size, observation_noise):
        self.maze_size = maze_size
        self.states = [(x, y) for x in range(maze_size) for y in range(maze_size)]
        self.actions = ["up", "down", "left", "right"]
        self.observations = [(x, y) for x in range(maze_size) for y in range(maze_size)]  # All possible positions
        self.observation_noise = observation_noise

    def transition(self, state, action):
        x, y = state
        if action == "up":
            return (max(x - 1, 0), y)
        elif action == "down":
            return (min(x + 1, self.maze_size - 1), y)
        elif action == "left":
            return (x, max(y - 1, 0))
        elif action == "right":
            return (x, min(y + 1, self.maze_size - 1))

    def observation(self, state, action, next_state):
        if random.random() < self.observation_noise:
            return next_state  # Noisy observation is the true position
        else:
            return random.choice(self.observations)  # Random position as noisy observation

    def reward(self, state, action):
        if state == (self.maze_size - 1, self.maze_size - 1):  # Goal state
            return 10
        elif state in [(1, 1), (2, 2), (3, 3)]:  # Obstacles
            return -5
        else:
            return -1

def print_maze(agent_position, maze_size):
    for i in range(maze_size):
        for j in range(maze_size):
            if (i, j) == agent_position:
                print("A", end=" ")  # Agent
            elif (i, j) == (maze_size - 1, maze_size - 1):
                print("G", end=" ")  # Goal
            elif (i, j) in [(1, 1), (2, 2), (3, 3)]:
                print("X", end=" ")  # Obstacle
            else:
                print(".", end=" ")  # Empty space
        print()

def main():
    maze_size = 5
    observation_noise = 0.2  # Noise level for observations
    pomdp = MazePOMDP(maze_size, observation_noise)
    num_simulations = 10
    belief = (0, 0)  # Initial belief (assume starting from (0, 0))

    for i in range(num_simulations):
        action = random.choice(pomdp.actions)  # Random action selection
        next_state = pomdp.transition(belief, action)
        observation = pomdp.observation(belief, action, next_state)
        reward = pomdp.reward(next_state, action)
        
        print("Step:", i+1)
        print("Action:", action)
        print("Next State:", next_state)
        print("Observation:", observation)
        print("Reward:", reward)
        
        print_maze(next_state, maze_size)
        print()
        
        belief = observation  # Update belief to the observed position

if __name__ == "__main__":
    main()

Output:

Output

The provided sequence demonstrates a simulation of navigating a maze using a Partially Observable Markov Decision Process (POMDP) in Python. Throughout ten steps, an agent attempts to reach a goal (‘G’) from various positions in a 5×5 grid, making decisions based on limited and sometimes inaccurate observations of its environment. The actions, resulting state changes, rewards, and visual representations of the maze highlight the challenges and dynamics of decision-making under uncertainty in this POMDP framework.

Partially Observable Markov Decision Process (POMDP) in AI

Partially Observable Markov Decision Process (POMDP) is a mathematical framework employed for decision-making in situations of uncertainty, where the decision-maker lacks complete information or noisy information regarding the current state of the environment. POMDPs have broad applicability in diverse domains such as robotics, healthcare, finance, and others.

This article provides an in-depth overview of Partially Observable Markov Decision Processes (POMDPs), their components, mathematical framework, solving strategies, and practical application in maze navigation using Python.

Table of Content

What is Partially Observable Markov Decision Process (POMDP)?
Mathematical Framework of Partially Observable Markov Decision Process
Markov Decision Process vs POMDP
Strategies for Solving Partially Observable Markov Decision Processes
Exploring Maze Navigation with Partially Observable Markov Decision Processes in Python
Conclusion

Pre-Requisites

Probability theory: Probability theory is applied to POMDPs to model the uncertainty surrounding the observations made by the agent and the changes in state within the environment.
Markov processes: A Markov process, sometimes referred to as a Markov chain, is a stochastic model that depicts how a system changes over time. It assumes that the system’s future state is solely dependent on its current state and not on the preceding set of events.
Decision theory: Taking into account the trade-offs between various actions and their possible outcomes, decision theory offers a framework for making decisions under uncertainty.