TELEMAC-MASCARET Forum: Not enough slots available in the system (1/1)

Welcome, Guest

TOPIC: Not enough slots available in the system

Not enough slots available in the system 3 years 11 months ago #38240

o.gourgue
OFFLINE
Expert Boarder
Posts: 155
Thank you received: 11

I am running Telemac (version 8.2.0) as follows on a supercomputer (each node has 28 processors):

telemac2d.py t2d_input.cas --ncsize=28

And it works just fine.

If I try to run on more than 1 node:

telemac2d.py t2d_input.cas --ncsize=56

I got the following error:

There are not enough slots available in the system to satisfy the 56
slots that were requested by the application:

/project/geomorph/ogourgue/TIGER/Runs/TIGER_000-2/t2d_input.cas_2021-04-12-16h14min53s/out_user_fortran

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:

1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.

I am looking at it with the IT team in charge of that supercomputer and they suspect the problem comes from Telemac. Personally, I have never experienced that with Telemac on other supercomputers. But maybe someone has? If so, please share your solution!

Attached is my configuration file.

Attachments:

systel.scc.cfg (1KB)

The administrator has disabled public write access.

Not enough slots available in the system 3 years 11 months ago #38248

o.gourgue OFFLINE Expert Boarder Posts: 155 Thank you received: 11	Is there someone with any idea how to solve this problem? We've run some tests on the supercomputer, and the problem doesn't come from Open MPI (other applications run just fine).
	The administrator has disabled public write access.

Not enough slots available in the system 3 years 11 months ago #38249

yugi OFFLINE openTELEMAC Guru Posts: 851 Thank you received: 244	Hi, Can you try removing "-machinefile MPI_HOSTFILE" in mpi_cmexec ? Is the supercomputer using a scheduler (like slurm) to submit scripts ? If that is the case you might need a configuration like gaia.intel.dyn from systel.edf.cfg If you a dcoumentation or anything about the supercomputer I can help you create a configuration that match it.
	There are 10 types of people in the world: those who understand binary, and those who don't. The administrator has disabled public write access. The following user(s) said Thank You: o.gourgue, phmusiedlak

Not enough slots available in the system 3 years 11 months ago #38250

o.gourgue OFFLINE Expert Boarder Posts: 155 Thank you received: 11	Removing "-machinefile MPI_HOSTFILE" was the solution. The supercomputer uses TORQUE. Thanks!
	The administrator has disabled public write access.