This is a marimo notebook walking through Chapter 2 of Sutton & Barto — the multi-armed bandits problem, epsilon-greedy, UCB, and gradient bandits.
This is a marimo notebook walking through Chapter 2 of Sutton & Barto — the multi-armed bandits problem, epsilon-greedy, UCB, and gradient bandits.