Problem

an LSTM module will be used as an example. Assume a simple net that includes an LSTM module.

import torch 

class Net(torch.nn.Module):

    def __init__(self):
        super().__init__()

        self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1

        ...

According to the docs, the weight and bias can be accessed by weight_ih_l[k], weight_hh_l[k], bias_ih_l[k], bias_hh_l[k]. For this example, I want to set the b_if to one. So, I could do something like this.

class Net(torch.nn.Module):

    def __init__(self):
        super().__init__()

        self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1

        print(self.lstm.bias_ih_l0) # printing for demonstration. output: tensor([-0.4163, -0.0641, -0.3475,  0.5244], requires_grad=True)

        hidden_size = 1
        b_if_start_index = int(4*hidden_size * 0.25)
        b_if_end_index = int(4*hidden_size * 0.5)
        self.lstm.bias_ih_l0[b_if_start_index:b_if_end_index] = 1

        print(self.lstm.bias_ih_l0)  # ouput: tensor([-0.4163,  1.0000, -0.3475,  0.5244], grad_fn=<CopySlices>)

        ...

we can verify that the b_if value has been manually set to 1 as intended.

However, when this net is trained with optimizer, it raises an error.

ValueError: can't optimize a non-leaf Tensor

as it turns out, the approached used above turns self.lstm.bias_ih_l0 to a non-leaf tensor. This can be confirmed like this:

class Net(torch.nn.Module):

    def __init__(self):
        super().__init__()

        self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1

        print(self.lstm.bias_ih_l0) # printing for demonstration. output: tensor([-0.4163, -0.0641, -0.3475,  0.5244], requires_grad=True)
        print(self.lstm.bias_ih_l0.is_leaf) # output: True

        hidden_size = 1
        b_if_start_index = int(4*hidden_size * 0.25)
        b_if_end_index = int(4*hidden_size * 0.5)
        self.lstm.bias_ih_l0[b_if_start_index:b_if_end_index] = 1

        print(self.lstm.bias_ih_l0)  # ouput: tensor([-0.4163,  1.0000, -0.3475,  0.5244], grad_fn=<CopySlices>)
        print(self.lstm.bias_ih_l0.is_leaf) # output: False

as you can see, before manually changing the b_if value, the self.lstm.bias_ih_l0 tensor is a leaf tensor but after the operation, it no longer is.

Solution

To avoid the error, the manualy bias value change should be done like this.

class Net(torch.nn.Module):

    def __init__(self):
        super().__init__()

        self.lstm = torch.nn.LSTM(1,1,1) # input element size:1, hidden state size: 1, num_layers = 1

        print(self.lstm.bias_ih_l0) # printing for demonstration. output: tensor([-0.4163, -0.0641, -0.3475,  0.5244], requires_grad=True)
        print(self.lstm.bias_ih_l0.is_leaf) # output: True

        hidden_size = 1
        b_if_start_index = int(4*hidden_size * 0.25)
        b_if_end_index = int(4*hidden_size * 0.5)

        bias_nparr = self.lstm.bias_ih_l0.detach().numpy()

        bias_nparr[b_if_start_index:b_if_end_index] = 1

        print(self.lstm.bias_ih_l0)  # ouput: tensor([-0.4163,  1.0000, -0.3475,  0.5244], requires_grad=True)
        print(self.lstm.bias_ih_l0.is_leaf) # output: True

the difference is that the bias tensor is first detached and then by applying numpy(), the code gains access to the tensor values only. After that, we change only the b_if section to 1. This way, the error does not show up even at optimization procedure.

Detaching is required since we want a copy of the bias tensor but without the require_grad set to True. Tensors with require_grad=True will not allow numpy() function to work and raise an error. Here is a link to the doc on detach.

detach will copy the tensor but it will point to the same data storage, which is why changes made to bias_nparr variable is reflected automatically to self.lstm.bias_ih_l0.


0 Comments

Leave a Reply

Your email address will not be published.