This presentation is part of Minisymposium “MS-55 Recent advances on large-scale Bayesian optimal experimental design”
organized by: Peng Chen (The University of Texas at Austin) , Omar Ghattas (The University of Texas at Austin) , Youssef Marzouk (Massachusetts Institute of Technology)
We present a reinforcement learning approach to find the optimal design for a finite sequence of experiments. In contrast to previous efforts targeting a dynamic program, we directly parameterize a policy and improve it using gradient descent. Such policy-oriented techniques are advantageous for fast evaluation as it avoids online Bayesian inference, and can be more efficient and accurate than value function approximations. We demonstrate the overall design method in a simple time-dependent system.