Welcome, Guest
Username: Password: Remember me

TOPIC: V8P1R0 parallel issue

V8P1R0 parallel issue 4 years 11 months ago #35045

  • Chen
  • Chen's Avatar
Hello,

I installed V8P1R0 with python3. But the parallel run can not proceed with meesage:

DIFFERENT NUMBER OF PARALLEL PROCESSORS:
DECLARED BEFORE (CASE FOR COUPLING ?):
TELEMAC-2D : 19
VALUE 1 IS KEPT


Then the simulation runs with only 1 processor. This happened in example cases as well. My guess is: there must be some hard coding in the parallel part of telemac which is causing this issue.

Thanks
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 11 months ago #35046

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

Could you post the command you used as well as your systel.cfg ?
There is nothing hard coded in telemac for the number of processor.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 11 months ago #35047

  • Chen
  • Chen's Avatar
Hello,

The config file is attached here. I've been keep using the same configuration for past version without any problem.

Thanks


Dongchen
Attachments:
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 11 months ago #35050

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
And what command did you type ?
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 10 months ago #35245

  • Yunhao Song
  • Yunhao Song's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 118
  • Thank you received: 9
Hi Youan,

I got the same error when testing an example (with PARALLEL PROCESSORS = 4) after compiling v8p1r0 with Python 3 on Cluster, the command used to run is:
python /username/TELEMAC/v8p1r0/scripts/python3/telemac3d.py t3d.cas -c intel
And if I added --ncsize=4,the Slurm returned the error below, while my account actually got 80 cores permitted...
srun: error: Unable to create step for job 6537452: More processors requested than permitted

Maybe something went wrong during the compilation, attached please find the systel.cfg and pysource.sh, thanks in advance.

Yunhao
Attachments:
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 10 months ago #35246

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
You can have a look at the options --ncnodes and --nctile.
To respectively specify the number of nodes and the number of cores per node.

Could you post the HPX_STDIN generated as well ?
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 10 months ago #35248

  • Yunhao Song
  • Yunhao Song's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 118
  • Thank you received: 9
I tried --ncnodes=1 with --nctile=4 but the error was the same, please find the HPC_STDIN in the attachment.

I also noticed that the heading is always TRUNK when running config.py and launching the example as well, as shown in the screenshot, while in my systel.cfg I already specified the root and version...
Attachments:
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 9 months ago #35283

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

About the trunk in the header that is a bug.

As for your your other issue.
First you need to replace the hard coded 16 in your systel.cfg for the definition of hpc_stdin:
hpc_stdin: #!/bin/bash
   #SBATCH -p hpxg
   #SBATCH -n <ncsize>
   source <root>/configs/pysource.intel.sh
   module load anaconda/3.7 intel/parallelstudio/2019
   <mpi_cmdexec>
   exit

Then the issue might be with using srun instead of mpiexec in your mpi_cmdexec ? It depends on what they recommand on your cluster. Sometimes srun lanuch sequential jobs.

Also i would recommend removing PARALLEL PROCESSOR from you steering file.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 9 months ago #35284

  • Yunhao Song
  • Yunhao Song's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 118
  • Thank you received: 9
Hi Yoann,

Thank you for the suggestions, I'll try and let you know whether it works on the HPC.
BTW the hard coded 16 is the maximum core number permitted for HPC free users here, it worked well in previous TELEMAC version using Python2.7

Best,
Yunhao
The administrator has disabled public write access.

V8P1R0 parallel issue 4 years 9 months ago #35285

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Ok for the hard coded 16.
But that means that if you use less than 16 core you will still reserve 16.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.