6. Final discussion
This work presents a novel probabilistic approach built upon an optimally controlled stochastic system for on-line monitoring of an agent behavior under uncertainty, which has been conceived in terms of active inference and optimal action selection. Checking if an autonomous agent behavior fulfills expectations is a key issue to guarantee safety and performance of an increasing number of autonomous agent applications such as driverless cars, drones and biomedical systems. The main problem for behavior monitoring is generating prior beliefs under the uncertainty the agent should face in its own environment. In this work, the desired behavior is modeled by a prior Gaussian distribution for state transitions, in order to verify if a given agent control policy respects its specification. The desired optimal behavior is obtained analytically using a class of Markov decision processes which are linearly solvable. Through an exponential transformation, the Bellman equation for such problems can be made linear, despite nonlinearity in the stochastic dynamical models, which facilitates applying efficient numerical methods. The availability of an optimal control policy allows simulating the desired behavior over time and comparing it with the current system performance in order to identify deviations from the desired behavior. To favor on-line monitoring, a robust metric based on surprise and twin Gaussian processes is introduced to characterize the progressive degradation in the agent behavior by quantifying the distance between the implementation and prior beliefs. A distinctive advantage of computing surprise using Gaussian processes is that the divergence from prior beliefs can be estimated not only using the expected value of state transitions but also the corresponding prediction uncertainty for optimal action selection.