Dead loop when iterate through pytorch dataloader

Hi,

I have a small CNN model that works fine on my PC-CPU.
I needed more computations power, so I moved all my data and code to Datalore.

Unfortunately, when I’m initializing my model over the Datalore Sheet, my model reaches only the lines where it needs to iterate over PyTorch dataloader and then entering into a dead loop.
This means that it can’t pull image samples for training, the model gets stuck, and I must restart the kernel.
This bug does not occur when I’m using my own PC with the same code for training.
I also tried to iterate over the dataloader outside of the model, using the code:

images, labels = iter(train_dataloader).next()

and this line of code reproduces the bug described above.

I keep searching the web for a solution but didn’t find anything.

will be greatly appreciated an answer.
Thanks!

Hello Jonathan,

Sorry, I missed the topic on the forum and replied to this question in the email.

It might be useful for other users so I will post the answer here as well.


In this case the issue is caused by slow WebDav performance and as a temporary workaround you can copy your data to the computational agent (machine), to do so you need to execute the following commands:

  1. create a temporary folder:
!mkdir /tmp/data
  1. copy your data from Workspace Files to it (it might take a while):
!cp -r /data/workspace_files/data /tmp/data
  1. Ensure the files were copied (list directory):
!ls /tmp/data
  1. Use /tmp/data in place of /data/workspace_files/data

Please note, that all the data that is stored on the machine will be lost once the computation is stopped.

We are still investigating the ways to improve WebDav performance.