* first version: Jul 14, 2016 last update: Sep 17, 2019 *

This tutorial explains how to implement layer-wise relevance propagation (LRP) easily and efficiently, as described in the overview paper:

G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. MÃ¼llerWe consider two models: (1) a simple plain deep rectifier network trained on the MNIST handwritten digits data, (2) the VGG-16 network trained on ImageNet and applicable to general image classification.

Layer-wise Relevance Propagation: An Overview

in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Springer LNCS, vol. 11700, 2019

*Note:* If you are instead looking for ready to use software, have a look at the software section of this website. If you want to try relevance propagation without installing software, check our interactive demos. For the original paper on LRP, see instead:

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. MÃ¼ller, W. Samek

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

PloS ONE 10 (7), e0130140, 2015

For this tutorial, you need to install Python, Numpy, PyTorch, Matplotlib, and OpenCV. Then, you should download tutorial.zip and extract it in your working directory. It contains data, model parameters, and some additional functions (in the file `utils.py`

).

We first load 12 examplary MNIST test digits.

In [1]:

```
import utils
X,T = utils.loaddata()
%matplotlib inline
utils.digit(X.reshape(1,12,28,28).transpose(0,2,1,3).reshape(28,12*28),9,0.75)
```

Each digit is stored as a 784-dimensional vector of pixel values, where "-1.0" corresponds to black and "+1.0" corresponds to white.

These digits are fed to a fully connected neural network with layer sizes 784-300-100-10 with ReLU activations for each hidden layer. The architecture is depicted in the figure below.

The network we consider achieves an error of 1.6% which is a typical performance for a neural network without particular structure or regularization. The function `utils.loadparams()`

retrieves its parameters for us.

In [2]:

```
W,B = utils.loadparams()
L = len(W)
```

In [3]:

```
import numpy
A = [X]+[None]*L
for l in range(L):
A[l+1] = numpy.maximum(0,A[l].dot(W[l])+B[l])
```

In [4]:

```
for i in range(3):
utils.digit(X[i].reshape(28,28),0.75,0.75)
p = A[L][i]
print(" ".join(['[%1d] %.1f'%(d,p[d]) for d in range(10)]))
```

As expected, the highest score systematically corresponds to the correct digit.

We now implement the layer-wise relevance propagation (LRP) procedure from the top to the bottom of the network. As a first step, we create a list to store relevance scores at each layer. The top layer relevance scores are set to the top-layer activations, which we multiply by a label indicator in order to retain only the evidence for the actual class.

In [5]:

```
R = [None]*L + [A[L]*(T[:,None]==numpy.arange(10))]
```

The LRP-0, LRP-\(\epsilon\), and LRP-\(\gamma\) rules described in the LRP overview paper (Section 10.2.1) for propagating relevance on the lower layers are special cases of the more general propagation rule

\[ R_j = \sum_k \frac{a_j \rho(w_{jk})}{\epsilon + \sum_{0,j} a_j \rho(w_{jk})} R_k \]

(cf. Section 10.2.2), where \(\rho\) is a function that transform the weights, and \(\epsilon\) is a small positive increment. We define below two helper functions that perform the weight transformation and the incrementation. In practice, we would like to apply different rules at different layers (cf. Section 10.3). Therefore, we also give the layer index "`l`

" as argument to these functions.

In [6]:

```
def rho(w,l): return w + [None,0.1,0.0,0.0][l] * numpy.maximum(0,w)
def incr(z,l): return z + [None,0.0,0.1,0.0][l] * (z**2).mean()**.5+1e-9
```

In particular, these functions and the layer they receive as a parameter let us reduce the general rule to LRP-0 for the top-layer, to LRP-\(\epsilon\) with \(\epsilon = 0.1\,\text{std}\) for the layer just below, and to LRP-\(\gamma\) with \(\gamma=0.1\) for the layer before. We now come to the practical implementation of this general rule. It can be decomposed as a sequence of four computations:

\begin{align*} \forall_k:&~z_k = {\textstyle \epsilon + \sum_{0,j}} a_j \rho(w_{jk}) & (\text{step }1)\\ \forall_k:&~s_k = R_k / z_k \qquad & (\text{step }2)\\ \forall_j:&~c_j = {\textstyle \sum_k} \rho(w_{jk}) s_k \qquad & (\text{step }3)\\ \forall_j:&~R_j = a_j \cdot c_j \qquad & (\text{step }4) \end{align*}

The layer-wise relevance propagation procedure then consists of iterating over the layers in reverse order, starting from the top layer towards the first layers, and at each layer, applying this sequence of computations.

In [7]:

```
for l in range(1,L)[::-1]:
w = rho(W[l],l)
b = rho(B[l],l)
z = incr(A[l].dot(w)+b,l) # step 1
s = R[l+1] / z # step 2
c = s.dot(w.T) # step 3
R[l] = A[l]*c # step 4
```

Note that the loop above stops one layer before reaching the pixels. To propagate relevance scores until the pixels, we need to apply an alternate propagation rule that properly handles pixel values received as input (cf. Section 10.3.2). In particular, we apply for this layer the \(z^\mathcal{B}\)-rule given by:

\[ R_i = \sum_j \frac{a_i w_{ij} - l_i w_{ij}^+ - h_i w_{ij}^-}{\sum_{i} a_i w_{ij} - l_i w_{ij}^+ - h_i w_{ij}^-} R_j \]

In this rule, \(l_i\) and \(h_i\) are the lower and upper bounds of pixel values, i.e. "-1" and "+1", and \((\cdot)^+\) and \((\cdot)^-\) are shortcut notations for \(\max(0,\cdot)\) and \(\min(0,\cdot)\). The \(z^\mathcal{B}\)-rule can again be implemented with a four-step procedure similar to the one used in the layers above. Here, we need to create two copies of the weights, and also create arrays of pixel values set to \(l_i\) and \(h_i\) respectively:

In [8]:

```
w = W[0]
wp = numpy.maximum(0,w)
wm = numpy.minimum(0,w)
lb = A[0]*0-1
hb = A[0]*0+1
z = A[0].dot(w)-lb.dot(wp)-hb.dot(wm)+1e-9 # step 1
s = R[1]/z # step 2
c,cp,cm = s.dot(w.T),s.dot(wp.T),s.dot(wm.T) # step 3
R[0] = A[0]*c-lb*cp-hb*cm # step 4
```

In [9]:

```
utils.digit(X.reshape(1,12,28,28).transpose(0,2,1,3).reshape(28,12*28),9,0.75)
utils.heatmap(R[0].reshape(1,12,28,28).transpose(0,2,1,3).reshape(28,12*28),9,0.75)
```

In the example above, LRP rules could be easily expressed in terms of matrix-vector operations. In practice, state-of-the-art neural networks such as VGG-16 make use of more complex layers such as convolutions and pooling. In this case, LRP rules are more conveniently implemented by casting the operations of the four-step procedure above as *forward* and *gradient* evaluations on these layers. These operations are readily available in neural network frameworks such as PyTorch and TensorFlow, and can therefore be reused for the purpose of implementing LRP. Here, we take the VGG-16 pretrained network for image classification. For this network, we consider the task of explaining the evidence for the class "castle" it has found in the following image:

The image is first loaded in the notebook.

In [10]:

```
import cv2
img = numpy.array(cv2.imread('castle.jpg'))[...,::-1]/255.0
```

In [11]:

```
import torch
mean = torch.Tensor([0.485, 0.456, 0.406]).reshape(1,-1,1,1)
std = torch.Tensor([0.229, 0.224, 0.225]).reshape(1,-1,1,1)
X = (torch.FloatTensor(img[numpy.newaxis].transpose([0,3,1,2])*1) - mean) / std
```

In [12]:

```
import torchvision
model = torchvision.models.vgg16(pretrained=True); model.eval()
layers = list(model._modules['features']) + utils.toconv(list(model._modules['classifier']))
L = len(layers)
```

The input can then be propagated in the network and the activations at each layer are collected:

In [13]:

```
A = [X]+[None]*L
for l in range(L): A[l+1] = layers[l].forward(A[l])
```

In [14]:

```
scores = numpy.array(A[-1].data.view(-1))
ind = numpy.argsort(-scores)
for i in ind[:10]:
print('%20s (%3d): %6.3f'%(utils.imgclasses[i][:20],i,scores[i]))
```

We observe that the neuron castle (index 483) has the highest score. This is expected due to the presence of a castle in the image. Note that other building-related classes are also assigned a high score, as well as classes corresponding to other objects present in the image (e.g. street sign and traffic light).

The following code iterates from the top layer to the first layer in reverse order and applies propagation rules at each layer. Top-layer activations are first multiplied by the mask to retain only the predicted evidence for the class "castle".

In [15]:

```
T = torch.FloatTensor((1.0*(numpy.arange(1000)==483).reshape([1,1000,1,1])))
R = [None]*L + [(A[-1]*T).data]
```

This evidence can then be propagated backward in the network by applying propagation rules at each layer.

**Convolution layers:** Observing that convolutions are special types of linear layers, we can use the same propagation rules as in the MNIST example, and a similar four-step procedure for applying these rules. Steps 2 and 4 are simple element-wise computations. Step 1 can be implemented as a forward computation in the layer, where we have preliminary transformed the layer parameters, and where we apply the increment function afterwards. As shown in the LRP overview paper, Step 3 can instead be computed as a gradient in the space of input activations:

\[ c_j = \big[\nabla~\big({\textstyle \sum_k}~z_k(\boldsymbol{a}) \cdot s_k\big)\big]_j \]

where \(s_k\) is treated as constant.

**Pooling layers:** It is suggested in Section 10.3.2 of the paper to treat max-pooling layers as average pooling layers in the backward pass. Observing that average pooling is also a special linear layer, the same propagation rules as for the convolutional layers become applicable.

In the following code, we iterate the propagation procedure from the top-layer towards the lower layers. Whenever we meet a max-pooling layer, we convert it into an average pooling layer. The function `rho`

and `incr`

are set differently at each layer, following the strategy of Section 10.3.

In [16]:

```
for l in range(1,L)[::-1]:
A[l] = (A[l].data).requires_grad_(True)
if isinstance(layers[l],torch.nn.MaxPool2d): layers[l] = torch.nn.AvgPool2d(2)
if isinstance(layers[l],torch.nn.Conv2d) or isinstance(layers[l],torch.nn.AvgPool2d):
if l <= 16: rho = lambda p: p + 0.25*p.clamp(min=0); incr = lambda z: z+1e-9
if 17 <= l <= 30: rho = lambda p: p; incr = lambda z: z+1e-9+0.25*((z**2).mean()**.5).data
if l >= 31: rho = lambda p: p; incr = lambda z: z+1e-9
z = incr(utils.newlayer(layers[l],rho).forward(A[l])) # step 1
s = (R[l+1]/z).data # step 2
(z*s).sum().backward(); c = A[l].grad # step 3
R[l] = (A[l]*c).data # step 4
else:
R[l] = R[l+1]
```

In [17]:

```
for i,l in enumerate([31,21,11,1]):
utils.heatmap(numpy.array(R[l][0]).sum(axis=0),0.5*i+1.5,0.5*i+1.5)
```

In [18]:

```
A[0] = (A[0].data).requires_grad_(True)
lb = (A[0].data*0+(0-mean)/std).requires_grad_(True)
hb = (A[0].data*0+(1-mean)/std).requires_grad_(True)
z = layers[0].forward(A[0]) + 1e-9 # step 1 (a)
z -= utils.newlayer(layers[0],lambda p: p.clamp(min=0)).forward(lb) # step 1 (b)
z -= utils.newlayer(layers[0],lambda p: p.clamp(max=0)).forward(hb) # step 1 (c)
s = (R[1]/z).data # step 2
(z*s).sum().backward(); c,cp,cm = A[0].grad,lb.grad,hb.grad # step 3
R[0] = (A[0]*c+lb*cp+hb*cm).data # step 4
```

In [19]:

```
utils.heatmap(numpy.array(R[0][0]).sum(axis=0),3.5,3.5)
```