Welcome, Guest
Username: Password: Remember me

TOPIC: Parallel - compiled fine, but cannot run

Parallel - compiled fine, but cannot run 11 years 1 week ago #10959

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Dear All,

I have been running the Telemac on scalar for some time but now I need to run my cases on cluster in parallel. I can run on cluster in serical (scalar mode) but no in parallel. Here is some info:

Telemac version -v6p2
Platform - Linux 64bit
Library - OpenMPI_gnu/1.6.4
Language - Python 2.6.6
Compilet - gfortran
Metis -5.1.0

I have compiled it with ubugfopenmpi. The validation case I am using to test is 036_wave, which runs on cluster in scalar fine. When I try to run it on parallel it goes to the job queue, changes the status to R (run) and then disappear from the queue with errors that I attached on this post. I am using 2 processors.

I also attache the config.cfg file and my PBS script. I also tried to execute the following command ( /...../.../opentelemac/v6p2/parallel/parallel_v6p2/ubugfopenmpi/partel<PARTEL.PAR )as I have seen on one post and got some listing and at the end Segmentation fault (core dumped).


Please advise what I need to correct as I definitely have done something wrong there. Thanks a mil.

Kind Regards!

Violet
Attachments:
The administrator has disabled public write access.

Parallel - compiled fine, but cannot run 11 years 1 week ago #10960

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Violeta

I think the main problem is that you cannot run your script in this way.
In a certain way you run something like:
MPIRUN runcode
So mpirun try to launch parallel execution of runcode.
And runcode try to run telemac in parallel with a call to mpirun ...

In my opinion, the easiest solution (which is not probably as easy as you want) is to try to use the hpc option in the configuration file. You will find 2 example on the V6P3 at the end of systel.edf.cfg file.

In this case, the telemac launching process manage the script with the partition and the call the job manager only for the execution step (which is the right parallel run).

Hope this help
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: 716469

Parallel - compiled fine, but cannot run 11 years 1 week ago #10961

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Thank you again, Christope, I will do it now and hopefully I won't get any errors there.

Violeta
The administrator has disabled public write access.

Parallel - compiled fine, but cannot run 11 years 1 week ago #10962

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Hello Christophe,

Sorry for misspelling your name last time. I added hpc options to the config file and amended mpi_cmdexec slightly. You were right it is not easy especially to person like me with lack of IT skills. I am not sure if I done it correctly, but could you take a look please. It is probably wrong as I got same errors after trying to run the wave case again. Thanks.

Kind Regards!

Violeta
Attachments:
The administrator has disabled public write access.

Parallel - compiled fine, but cannot run 11 years 1 week ago #10963

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Violeta

SBATCH is a job scheduler but I'm not sure you could use it on your cluster as you use PBS.
I think you should adapt the script to PBS.
On my cluster, for example I've got:
#$ -N <jobname> instead of #SBATCH --job-name=<jobname>
#$ -pe ompi <ncsize> instead of #SBATCH --ntasks=<ncsize>
#$ -q <queue> instead of #SBATCH --partition=<queue>
...

and my command line is hpc_cmdexec: qsub < <hpc_stdin>

After configuring your system, the command should be also adapted with the following syntax:
runcode.py telemac2d -c ubugfopenmpi --hpc --ncsize=xx --jobname=xx --queue=xx cas

The adaptation step is not easy and depend of your configuration.
Good luck
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: 716469
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.