Welcome, Guest
Username: Password: Remember me

TOPIC: Runnin Telemac -3d with python script on cluster

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5534

  • sumit
  • sumit's Avatar
Dear All,

I was wondering if its possible to configure python script (runcode.py) to submit job on the linux cluster. The clutster that I am using is having Cent-OS and Oracle grid engine for submitting jobs.

Any and all help will be greatly appreciated.

Best regards,
Sumit
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5571

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
yes, it should be possible -- but we would rather wish to modify the systel.cfg and then make sure runcode does things appropriately.

At the moment, runcode support a systel.cfg that includes "parallel mpi hpc" as options. We are current using the system.cfg to launch TELEMAC on HPC queues such as BSUB, QSUB and PBS.

Let us know what you need / scripts, config files, and I am sure we could help.

For information, with BSUB:
[Configurations]
configs:    dab.tile9
[dab.tile9]
#
root:       /gpfs/ocf/ig_5895/shared/opentelemac/dab 
version:    v6p1
language:   2
modules:    update system
options:    parallel mpi hpc
#
mpi_hosts:    mg01
mpi_cmdexec: /gpfs/packages/openmpi/1.4.4/gcc/bin/mpiexec -wdir <wdir> -n <ncsize> <exename>
#
# <jobname> and <email> need to be provided on the TELEMAC command line #BSUB -u <email> \n #BSUB -N
hpc_stdin: #!/bin/bash
   #BSUB -n <ncsize>
   #BSUB -J <jobname>
   #BSUB -o <sortiefile>
   #BSUB -e <exename>.%J.err
   #BSUB -R "span[ptile=9]"
   <mpi_cmdexec>
   exit
#
hpc_cmdexec:   chmod 755 <hpc_stdin>; bsub -q encore < <hpc_stdin>
#
cmd_obj:    gfortran -c -O3 -ffixed-line-length-132 -fconvert=big-endian -frecord-marker=4 <mods> <incs> <f95name>
cmd_lib:    ar cru <libname> <objs>
cmd_exe:    mpif90 -fconvert=big-endian -frecord-marker=4 -v -lm -lz -o <exename> <objs> <libs>
#
mods_all:   -I <config>
#
incs_parallel:      -I /gpfs/packages/openmpi/1.4.4/gcc/include/
libs_parallel:      /gpfs/ocf/ig_5895/shared/opentelemac/libs/libmetis.a
libs_all:           /gpfs/packages/openmpi/1.4.4/gcc/lib/libmpi.so
#
sfx_zip:    .gztar
sfx_lib:    .lib
sfx_obj:    .o
sfx_mod:    .mod
sfx_exe:
#

Another example with QSUB:
[Configurations]
configs: eticsm
[eticsm]
#
root:       /gpfs/rrcfd/public/apps/opentelemac/eticsm
version:    v6p2
language:   2
modules:    update system
options:    parallel mpi hpc
#
mpi_cmdexec: mpiexec -wdir <wdir> -n <ncsize> <exename>
#
par_cmdexec:   <config>/partel_prelim; python <root>/pytel/utils/partitioning.py
#
# <jobname> and <email> need to be provided on the TELEMAC command line #BSUB -u <email> \n #BSUB -N
hpc_stdin: #!/bin/bash
   #$ -cwd                        # working directory is current directory
   #$ -V                          # forward your current environment to the execution environment
   #$ -pe mpi-12x1 96             # no of cores requested
   #$ -S /bin/bash                # shell it will be executed in
   #$ -j y                        # merge stderr and stdout
   cat $PE_HOSTFILE | awk '{print $1, " slots=9"}' > machinefile.$JOB_ID
   cat machinefile.$JOB_ID
   <mpi_cmdexec>
   exit
#
hpc_cmdexec:   chmod 755 <hpc_stdin>; qsub < <hpc_stdin>
#
cmd_obj:    ifort -c -O3 -convert big_endian -132 <mods> <incs> <f95name>
cmd_lib:    ar cru <libname> <objs>
cmd_exe:    mpif90 -convert big_endian -lm -lz -o <exename> <objs> <libs>
#
mods_all:   -I <config>
#
incs_parallel:      -I /usr/mpi/intel/openmpi-1.4.2/include/
libs_parallel:      /gpfs/rrcfd/public/apps/opentelemac/lib/libmetis.a
libs_all       :    /usr/mpi/intel/openmpi-1.4.2/lib64/libmpi.so
#
sfx_zip:    .zip
sfx_lib:    .lib
sfx_obj:    .o
sfx_mod:    .mod
sfx_exe:
#

The key keys are options, hpc_stdin and hpc_cmdexec

Hope this helps.

Sébastien.
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5577

  • sumit
  • sumit's Avatar
Hello Sébastien,

Thanks a lot for the mail, where does the name of .cas file goes, is it supplied from command line while submitting the script ?

Thanks,
Sumit
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5573

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Note that these type of configuration will be explained and detailed during the coming pre-conference workshop in HR Wallingford.

There are still a few places available for both the workshop and the conference.

Sébastien.
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5580

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
indeed.

please also note the new options -w, --split, --run and --merge, which allow you to test your split, run and merge your results independently.

Hope this helps.

Sébastien.
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5582

  • sumit
  • sumit's Avatar
Could I get the shell script for QSUB
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5583

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
you have the script -- it is runcode.py.

qsub call needs only to be setup through the systel.cfg using the options key (add hpc), the hpc_stdin key (your stdin bash script) and the hpc_cmdexec key (with chmod 755 <hpc_stdin>; qsub < <hpc_stdin>) where <hpc_stdin> is the content of the key hpc_stdin ... as shown in my previous post.

Hope this helps.

Sébastien.
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5591

  • sumit
  • sumit's Avatar
Thanks a lot Sébastien,, this surely helps. Although I have not completely solve the problem but working on it. Just to confirm once more to submit jobs on cluster I will issue the command python runcode.py with all the requisite arguments right ?

Best regards,
Sumit
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5592

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
yes -- also note that you can also just run telemac2d.py (or telemac3d.py) instead of runcode.py telemac2d (runcode.py telemac3d respectivement).

so, if your config file is OK:
telemac2d.py -w ./setworkdir --split --ncsize 9 casfile
(--ncsize allows to reset the number of PROCESSORS in the CAS file -- if your CAS file is OK, you do not need the option --ncsize)
telemac2d.py -w ./setworkdir --run --ncsize 9 casfile
(this will compile the exe and launch the job on the queue)
telemac2d.py -w ./setworkdir --merge --ncsize 9 casfile
(once you know your job is completed, this will just re-assemble your results ... using -w ./setworkdir allows you to control where these steps are referenced)

Hope this helps.
Sébastien.
The administrator has disabled public write access.

Runnin Telemac -3d with python script on cluster 12 years 2 months ago #5603

  • sumit
  • sumit's Avatar
Hello Sébastien,

Sorry to bother you again but I am still having some problem. I am attaching my config file, could you take a look and let me know if anything looks out of the place.

On the command line I am using the following command

python /home/sinha/opentelemac1/v6p1/pytel/runcode.py telemac3d -c ASAM -f /home/sinha/opentelemac1/v6p1/config/systel.cis-Sumit.cfg -s MBNBTry1.cas

I am kind of sure that I have compiled the code properly but somehow the job is not getting picked up on the cluster. The message I get is as follows

Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: /home/sinha/opentelemac1/v6p1/config/systel.cis-Sumit.cfg


Running MBNBTry1.cas with telemac3d under /gpfs0/home/sinha/opentelemac1/v6p1/MBNB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... reading module dictionary
/gpfs0/home/sinha/opentelemac1/v6p1/MBNB/MBNBTry1.cas
... running in English
copying: MBNBTry1.cas
copying: MBNB.cli
copying: MBNB.slf
copying: telemac3dv6p1.dico
copying: sumit.f
ifort -c -O3 -convert big_endian -132 -I /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/sisyphe/sisyphe_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/tomawac/toma_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/bief/bief_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/telemac2d/tel2d_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/damocles/damo_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/mumpsvoid/mumpsvoid_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/parallel/parallel_v6p1/ASAM -I /home/sinha/opentelemac1/v6p1/special/special_v6p1/ASAM t3dfort.f
mpif90 -convert big-endian -lm -lz -o sumit t3dfort.o /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/telemac3dv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/sisyphev6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/tomawacv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/biefv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/telemac2dv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/damoclesv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/mumpsvoidv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/parallelv6p1.lib /home/sinha/opentelemac1/v6p1/telemac3d/tel3d_v6p1/ASAM/specialv6p1.lib /usr/local/openmpi/1.4.3/lib/libmpi.so
ifort: warning #10314: specifying -lm before object files may supercede the Intel(R) math library and affect performance
/usr/local/intel/11.1.075/libimf.so: warning: warning: feupdateenv is not implemented and will always fail
partitioning: T3DGEO
mpirun -machinefile "$TMPDIR/machines" -np $NSLOTS ./xhpl



I am going to talk to the cluster guys and see if they can help me, meanwhile any suggestion from u is appreciated.

Best regards,
Sumit
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.