Recurrent neural networks (RNNs) (Hochreiter & Schmidhuber, 1997) are an effective representation of policies for partially observable Markov Decision Processes (POMDPs)(Kaelbling et al., 1998). However, a major drawback of RNN-based policies is the difficulty to formally verify behavioural specifications, e.g. with regard to reachability and expected cost. In previous work, we proposed to insert a quantized bottleneck network (QBN) to the RNN that learns a mapping from the latent memory states to quantized vectors, which enables the extraction of a finite-state controller (FSC) representation of the RNN (Carr et al., 2021). This FSC, together with a POMDP model description, constitutes a policy-induced Discrete-Time Markov Chain (DTMC) that allows us to use efficient formal verification methods. For the scenarios in which the FSC fails to satisfy the behavioural specification, the verification method generates diagnostic information in the form of critical examples. These critical examples can be used to re-train the RNN and extract an updated FSC.
In this project, we are interested to investigate the synthesis of policies with formally verified satisfaction of behavioural specifications in uncertain POMDPs (uPOMDPs) (Suilen et al., 2020; Cubuktepe et al., 2021), where the uncertainty is expressed by polynomial parameterizations of the transition and/or observation probabilities. The uPOMDP describes a set of possible POMDPs that can be used to express imperfect knowledge of the environment, e.g. because the POMDP is an abstraction of real-world dynamics. Our goal is to learn and extract an FSC that has the best worst-case performance among all possible instantiations of the uPOMDP.
- Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Comput., 9(8), 1735–1780.
- Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and Acting in Partially Observable Stochastic Domains. Artif. Intell., 101(1-2), 99–134.
- Carr, S., Jansen, N., & Topcu, U. (2021). Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes. J. Artif. Intell. Res., 72, 819–847.
- Suilen, M., Jansen, N., Cubuktepe, M., & Topcu, U. (2020). Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization. IJCAI, 4113–4120.
- Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., & Topcu, U. (2021). Robust Finite-State Controllers for Uncertain POMDPs. AAAI, 11792–11800.