I have trained and saved model checkpoint which was train in GPU 1. However, when I try to load the checkpoint in GPU 0, it fails.

The following is my ckpt loading code:

net = Net()

net.cuda(device) # device = GPU0

this gives me the following error:

Traceback (most recent call last):
  File "", line 127, in <module>
  File "/data/chadrick/venv/tf113/lib/python3.7/site-packages/torch/", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/data/chadrick/venv/tf113/lib/python3.7/site-packages/torch/", line 702, in _legacy_load
    result = unpickler.load()
  File "/data/chadrick/venv/tf113/lib/python3.7/site-packages/torch/", line 665, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/data/chadrick/venv/tf113/lib/python3.7/site-packages/torch/", line 156, in default_restore_location
    result = fn(storage, location)
  File "/data/chadrick/venv/tf113/lib/python3.7/site-packages/torch/", line 136, in _cuda_deserialize
    return storage_type(obj.size())
  File "/data/chadrick/venv/tf113/lib/python3.7/site-packages/torch/cuda/", line 480, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory


when loading ckpt data, ensure that device is mapped from some GPU to CPU. The fixed loading code is like this:

net = Net()
load_data = torch.load(ckpt_path, map_location='cpu')


the problematic behavior described above is also explained in the official docs.

1 Comment

Anonymous · April 29, 2021 at 4:57 am

Thanks! very helpful

Leave a Reply

Your email address will not be published.