Welcome, Guest
Username: Password: Remember me

TOPIC: Installation on a Cluster HPC

Installation on a Cluster HPC 11 years 3 months ago #9963

  • VictorOpenOcean
  • VictorOpenOcean's Avatar
hello everyone,

I want to know what are the steps to follow to install Telemac on an HPC cluster, in addition to the installation on a single machine.
I suspect you need to install all the usual library, OpenMPI, Metis, gfortran, and python, but what to write in the configuration file and the rest is blank in the Installation procedure

Thank you in advance for any answers.
Victor.
The administrator has disabled public write access.

Installation on a Cluster HPC 11 years 3 months ago #9969

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

Not much more to do to run on a cluster. You just need to update your configuration. There are a number of existing examples on the repository. Have a look at systel.edf.cfg and look for those with the key hpc_stdin.

You'll have to tell us more about your configuration / HPC, queuing system etc. for more information.

Hope this helps,
Sébastien.
PS: Please update your profile under "COMMUNITY"
The administrator has disabled public write access.
The following user(s) said Thank You: VictorOpenOcean

Installation on a Cluster HPC 2 years 2 weeks ago #41521

Hi everybody,
I try to compile telemac on the MESO@LR cluster.
The cluster is under Linux and does not have a graphical interface.
The jobs are launch by the SLURM job manager. (cmd sbatch run ....)

I already install these libraries with Pagure : aed2, hdf5, Metis (for parralel launch), ParMetis, mumps, openmpi, med, scalapack, lapack, blas...

I have understood that I need to modify the .sh file and the .cfg file to adapt them to the cluster. The files are in attachments. I wanted to add in my config file hpc options :
### ___/ HPC /_____/
brief: Intel 16.0.4 compiler with open_mpi 1.6.5_tuned on the EDF athos cluster
#
language: 2
modules:  system
version:  trunk
#
sfx_zip:    .zip
sfx_lib:    .a
sfx_obj:    .o
sfx_mod:    .mod
sfx_exe:
#
#
val_root:   <root>/examples
#
val_rank:   all
#
mods_all:   -I <config>
 
#
options:    mpi hpc
par_cmdexec: srun -n 1 -N 1 <config>/partel < PARTEL.PAR >> <partel.log>
mpi_cmdexec: mpirun -np <ncsize> <exename>
#
hpc_stdin: #!/bin/bash
  #SBATCH --job-name=<jobname>
  #SBATCH --output=<jobname>-<time>.out
  #SBATCH --error=<jobname>-<time>.err
  #SBATCH --time=<walltime>
  #SBATCH --ntasks=<ncsize>
  #SBATCH --partition=<queue>
  ##SBATCH --exclude=cn[0000-0000,0000]
  #SBATCH --exclusive
  #SBATCH --nodes=<ncnode>
  #SBATCH --ntasks-per-node=<nctile>
  source <root>/configs/pysource.<configName>.sh
  <py_runcode>
#
hpc_runcode: cp HPC_STDIN ../;cd ../;sbatch < <hpc_stdin>
#
cmd_obj:    mpif90  -c -cpp -convert big_endian -O2 -DHAVE_MPI -DHAVE_MED -DHAVE_VTK <mods> <incs> <f95name>
cmd_lib:    ar cru <libname> <objs>
cmd_exe:    mpif90  -o <exename> <objs> <libs>
#
incs_all: -I $MEDHOME/include
libs_all: -lm -L$MEDHOME/lib -lmed -L$HDF5HOME/lib -lhdf5 -ldl -lstdc++ -lz
          -L$METISHOME/lib -lmetis
#
cmd_obj_c: gcc -c <srcName> -o <objName> 

But when I run config.py an error occured.

If I don't put these lines, config.py worked
But when I run compile_telemac.py an error occured :
Compiling from the tree top api plus dependents
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        +> Compile / Assemble / Link
Warning: "/home/soulayrolm/privatemodules/aed2/gcc75/1.3.0" is not a directorys
f951: internal compiler error: Aborted
Warning: "/home/soulayrolm/privatemodules/aed2/gcc75/1.3.0" is not a directory
f951: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Warning: "/home/soulayrolm/privatemodules/aed2/gcc75/1.3.0" is not a directory
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
f951: internal compiler error: Aborted
Traceback (most recent call last):
  File "/home/soulayrolm/telemac/v8p3r1/scripts/python3/compile_telemac.py", line 143, in <module>
    main()
  File "/home/soulayrolm/telemac/v8p3r1/scripts/python3/compile_telemac.py", line 129, in main
    compile_cmdf(options.ncsize, modules, options.verbose)
  File "/nfs/home/soulayrolm/telemac/v8p3r1/scripts/python3/compilation/compil_tools.py", line 1287, in compile_cmdf
Warning: "/home/soulayrolm/privatemodules/aed2/gcc75/1.3.0" is not a directory
f951: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
    '\n        +> failed: ' + file_name + '\n' + err)
utils.exceptions.TelemacException:
        +> failed: .../v8p3r1/sources/utils/special/plante.F
... The following command failed for the reason in the listing
gfortran -c -cpp -g -fbounds-check -Wall -fbacktrace -finit-real=nan -DHAVE_AED2 -DHAVE_MPI -DHAVE_MUMPS -DHAVE_MED -fconvert=big-endian -frecord-marker=4  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/special  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/damocles  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/parallel  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/hermes  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/bief  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/nestor  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/ad  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/tomawac  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/sisyphe  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/waqtel  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/khione  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/gaia  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/telemac2d  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/gretel  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/utils/partel  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/telemac3d  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/artemis  -I /home/soulayrolm/telemac/v8p3r1/builds/ubugfmpich2/obj/api   -I /home/soulayrolm/softs/hdf5/openmpi110/gcc75/1.10.5/include -I /home/soulayrolm/softs/mumps/openmpi110/gcc75/5.2.1/include -I /home/soulayrolm/privatemodules/aed2/gcc75/1.3.0 -I /home/soulayrolm/softs/aed2/gcc75/1.3.0/include -I /home/soulayrolm/softs/med/openmpi110/gcc75/4.0.0/include /home/soulayrolm/telemac/v8p3r1/sources/utils/special/plante.F
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Warning: "/home/soulayrolm/privatemodules/aed2/gcc75/1.3.0" is not a directory
f951: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

I think it comes from the ubugformpich2 configuration which does not correspond to the characteristics of the cluster.

I am a beginner in this field, does anyone feel like pointing out the mistakes I am making and helping me to compile telemac? Maybe a video conference would be best to make it work once and for all..

Thank you very much
Soulayrol M.
Attachments:
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.