## Problem

an LSTM module will be used as an example. Assume a simple net that includes an LSTM module.

import torch class Net(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1 ...

According to the docs, the weight and bias can be accessed by `weight_ih_l[k]`

, `weight_hh_l[k]`

, `bias_ih_l[k]`

, `bias_hh_l[k]`

. For this example, I want to set the b_if to one. So, I could do something like this.

class Net(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1 print(self.lstm.bias_ih_l0) # printing for demonstration. output: tensor([-0.4163, -0.0641, -0.3475, 0.5244], requires_grad=True) hidden_size = 1 b_if_start_index = int(4*hidden_size * 0.25) b_if_end_index = int(4*hidden_size * 0.5) self.lstm.bias_ih_l0[b_if_start_index:b_if_end_index] = 1 print(self.lstm.bias_ih_l0) # ouput: tensor([-0.4163, 1.0000, -0.3475, 0.5244], grad_fn=<CopySlices>) ...

we can verify that the b_if value has been manually set to 1 as intended.

However, when this net is trained with optimizer, it raises an error.

ValueError: can't optimize a non-leaf Tensor

as it turns out, the approached used above turns `self.lstm.bias_ih_l0`

to a non-leaf tensor. This can be confirmed like this:

class Net(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1 print(self.lstm.bias_ih_l0) # printing for demonstration. output: tensor([-0.4163, -0.0641, -0.3475, 0.5244], requires_grad=True) print(self.lstm.bias_ih_l0.is_leaf) # output: True hidden_size = 1 b_if_start_index = int(4*hidden_size * 0.25) b_if_end_index = int(4*hidden_size * 0.5) self.lstm.bias_ih_l0[b_if_start_index:b_if_end_index] = 1 print(self.lstm.bias_ih_l0) # ouput: tensor([-0.4163, 1.0000, -0.3475, 0.5244], grad_fn=<CopySlices>) print(self.lstm.bias_ih_l0.is_leaf) # output: False

as you can see, before manually changing the b_if value, the `self.lstm.bias_ih_l0`

tensor is a leaf tensor but after the operation, it no longer is.

# Solution

To avoid the error, the manualy bias value change should be done like this.

class Net(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1 print(self.lstm.bias_ih_l0) # printing for demonstration. output: tensor([-0.4163, -0.0641, -0.3475, 0.5244], requires_grad=True) print(self.lstm.bias_ih_l0.is_leaf) # output: True hidden_size = 1 b_if_start_index = int(4*hidden_size * 0.25) b_if_end_index = int(4*hidden_size * 0.5) bias_nparr = self.lstm.bias_ih_l0.detach().numpy() bias_nparr[b_if_start_index:b_if_end_index] = 1 print(self.lstm.bias_ih_l0) # ouput: tensor([-0.4163, 1.0000, -0.3475, 0.5244], requires_grad=True) print(self.lstm.bias_ih_l0.is_leaf) # output: True

the difference is that the bias tensor is first **detached** and then by applying `numpy()`

, the code gains access to the tensor **values** only. After that, we change only the b_if section to 1. This way, the error does not show up even at optimization procedure.

Detaching is required since we want a copy of the bias tensor but without the `require_grad`

set to True. Tensors with `require_grad=True`

will not allow `numpy()`

function to work and raise an error. Here is a link to the doc on `detach`

.

`detach`

will copy the tensor but it will point to the same data storage, which is why changes made to `bias_nparr`

variable is reflected automatically to `self.lstm.bias_ih_l0`

.

## 0 Comments