Notebooks are getting ran in arbitrary agents

Guney90 · June 14, 2023, 2:30pm

I set up the datalore instance with external-server and we are having 2 external agents.

My agents-config.yaml looks like:

docker:
  network: datalore-agents-network
  dataloreHost: datalore
  instances:
    - id: basic-agent
      default: true
      label: "docker-base"
      description: "docker-base"
      image: docker.io/jetbrains/datalore-agent:2023.2
external:
  instances:
    - id: machine-1
      label: "machine-1"
      description: "Some description"
      image: jetbrains/datalore-agent:2023.3
      command: "podman"
      additionalOptions: "-e NVIDIA_VISIBLE_DEVICES=all"
    - id: machine-2
      label: "machine-2"
      description: "Some description"
      image: jetbrains/datalore-agent:2023.3
      command: "podman"
      additionalOptions: "-e NVIDIA_VISIBLE_DEVICES=all"

I have my agent buddles that each has their conf/buildAgent.properties is edited and points to my external-server and name value is same as label and id above.

When I create a new notebook and select let say machine-1 it arbitrarily creates the container in one of these two agents and creating another one is picking up the most idle machine. So it most like a load balancer other than being able to point agents.

I checked all documentation and configs but I couldn’t find anywhere if we can map agents so that we can run specific notebooks on specific machines by selecting the agents to run the notebook from the agent list.

aprilfire · June 28, 2023, 8:42am

Hello,

You’re right, the current implementation doesn’t allow to specify machines for particular external instance types. Computations are instead run on a random available agent.
This may be improved later, but there are no particular plans for it yet.

Best regards,
Stepan Tarasevich
JetBrains

Guney90 · July 4, 2023, 6:48am

Hi @aprilfire ,

Then assuming you have 2 agents 1 with GPU 1 with not, and you want to run a code that needs GPU support, how is this gonna be handled and handed over to the GPUed machine by Datalore since it’s randomly distributing the task to the most idle machine?

aprilfire · July 4, 2023, 7:49am

Hi @Guney90,

For external agents, unfortunately, it’s now assumed that all the external hardware is the same, and there’s no proper way to select the GPU one out of two. For two machines in particular, you can force Datalore to assign a particular agent to a notebook (e.g. by running two empty notebooks, stopping the one that got a GPU agent and then running the notebook you need), but it quickly becomes too cumbersome if you have more machines.

As I said before, it might (and most likely will) be revised in the future, but at the moment no estimates on it can be given.

Best regards,
Stepan Tarasevich
JetBrains