Explanations for Neural Networks by Neural Networks

Understanding the function learned by a neural network is crucial in many domains, e.g., to detect a model’s adaption to concept drift in online learning.

Existing global surrogate model approaches generate explanations by maximizing the fidelity between the neural network and a surrogate model on a sample-basis, which can be very time-consuming. Therefore, these approaches are not applicable in scenarios where timely or frequent explanations are required. In this paper, we introduce a real-time approach for generating a symbolic representation of the function learned by a neural network. Our idea is to generate explanations via another neural network (called the Interpretation Network, or I-Net), which maps network parameters to a symbolic representation of the network function. We show that the training of an I-Net for a family of functions can be performed up-front and subsequent generation of an explanation only requires querying the I-Net once, which is computationally very efficient and does not require training data. We empirically evaluate our approach for the case of low-order polynomials as explanations, and show that it achieves competitive results for various data and function complexities. To the best of our knowledge, this is the first approach that attempts to learn mapping from neural networks to symbolic representations.

Full Paper