Welcome, Guest
Username: Password: Remember me

TOPIC: installation and run on cluster

installation and run on cluster 9 years 2 months ago #18282

  • Gaeta
  • Gaeta's Avatar
HELLO.
I'M trying to install and run simulations on a very powerful CentOS 7.0 cluster (www.hpc.cineca.it/hardware/galileo) under a research project.
But I'm quite confused on how to compile and run the code with a parallel mode.
I mean, I've already successfully runned the serial version but now I need to use the parallelization.
I got some advices on this useful forum, but I'm really confused.

I attached two possible configuration files. No one worked :(
Any suggestions? What about the batch script (PBS) and the number of processors?

Thanks in advance
Gabriella
Attachments:
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18284

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Gaeta

There is 2 main steps in your demand. Compilation and run

Let's start with the compilation.
Did you achieve the compilation step with one of these configuration file?

Regards
Christophe
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18285

  • Gaeta
  • Gaeta's Avatar
Good morning,
yes I successfully compiled the code with the configuration file called systel.cis-ubuntu_cluster.cfg (attached), the code runs in serial, but i'm quite confused on the parallelization and on how implement the batch script.
With the other conf one (..._hpc.cfg), no.
Any advices?

Thanks
Gabriella
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18286

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

I would change a couple things in your configuration file.
I suggest using the one below:
# _____                              _______________________________
# ____/ TELEMAC Project Definitions /______________________________/
#
[Configurations]
configs:    cinecagalileoopenmpi
[cinecagalileoopenmpi]
root:       /Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2
version:    v6p3
language:   2
modules:    update system
options:    parallel mpi hpc
#
sfx_zip:    .gztar
sfx_lib:    .a
sfx_obj:    .o
sfx_mod:    .mod
sfx_exe:
#
val_root:   <root>\examples
val_rank:   all
######
par_cmdexec:   <config>/partel < PARTEL.PAR >> <partel.log>
#
mpi_cmdexec:   mpiexec -machinefile MPI_HOSTFILE -n 2 <exename>
mpi_hosts:   
#
hpc_stdin: #!/bin/bash
#PBS -S /bin/sh
#PBS -A IscrC_3DTAR
#PBS -l walltime=<walltime>
#PBS -l nodes=<ncnode>:ppn=<ncsize>:mem=20GB
#PBS -o <jobname>-<time>.out
#PBS -e <jobname>-<time>.err
#PBS -pe ompi 2
#PBS -q <queue>
# 
cd $PBS_O_WORKDIR
#
module load intel
module load gnu
module load openmpi
#
export SYSTELCFG=/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/systel.cfg
#
export PATH=/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/scripts/python27:${PATH}
<py_runcode> 
#
hpc_runcode: qsub < <hpc_stdin>
#
cmd_obj:    mpif90 -fopenmp -c -O3 -fconvert=big-endian -frecord-marker=4 -DHAVE_MPI <mods> <incs> <f95name>
cmd_lib:    ar cru <libname> <objs>
cmd_exe:    mpif90 -fopenmp -fconvert=big-endian -frecord-marker=4 -lpthread -v -lm -o <exename> <objs> <libs>
exit
#
mods_all:   -I <config>
#
libs_partel:     /Telemac/svn.opentelemac.org/svn/opentelemac/metis-5.1.0/build/Linux-x86_64/libmetis/libmetis.a
libs_all       :    -lstdc++

You should load either gnu or intel i dont think you need both.
Load the modules and run compileTELEMAC.py


Then to launch a code you will need the following command:

runcode.py module cas --walltime=walltime --queue=queue --ncsize=number of processors --ncnode=number of node --jobname=name of the job


Hope it helps.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
The following user(s) said Thank You: Gaeta

installation and run on cluster 9 years 2 months ago #18287

  • Gaeta
  • Gaeta's Avatar
Thanks for your advice.
Trying to compile the code with the attached (modified) conf file,
I got the following error


Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: /galileo/home/userexternal/mgaeta00//Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/systel.cfg
Traceback (most recent call last):
File "./compileTELEMAC.py", line 506, in <module>
cfgs = parseConfigFile(options.configFile,options.configName)
File "/galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/scripts/python27/config.py", line 238, in parseConfigFile
generalDict,configDict = getConfigs(file,name,bypass)
File "/galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/scripts/python27/config.py", line 191, in getConfigs
parser.error("Could not access required parameters in config file")
NameError: global name 'parser' is not defined


Helps?

File Attachment:

File Name: systel.cis-hpc_2015-09-16.cfg
File Size: 2 KB


File Attachment:

File Name: systel.cis-hpc_2015-09-16.cfg
File Size: 2 KB
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18289

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

Just make sure there are space in front of the bash script so the parsing of the configuration file is done properly.

hpc_stdin: #!/bin/bash
   #PBS -S /bin/sh
   #PBS -o <sortiefile>
   #PBS -e <exename>.err
   #PBS -N <jobname>
   #PBS -l nodes=<nctile>:ppn=<ncnode>
   #PBS -q highp
   source /etc/profile.d/modules.sh
   module load gcc/4.7.2 openmpi/1.6.5/gcc/4.7.2
   <mpi_cmdexec>
   exit

(ignore the content of the bash script itself, it is just an example of where the spaces have to be)

Hope this helps,
Sebastien.
The administrator has disabled public write access.
The following user(s) said Thank You: Gaeta

installation and run on cluster 9 years 2 months ago #18291

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
THere is $ in your systel.cfg that i think should be removed:

export PATH=$/galileo/home/userexternal/mgaeta00//Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/scripts/python27:${PATH}

And you need to run that same line before launching compileTELEMAC.py

I stringly suggest creating a file like the one below:
### TELEMAC settings -----------------------------------------------------------
export HOMETEL=/galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2
export PATH=$HOMETEL/scripts/python27:.:$PATH
### ALIASES -----------------------------------------------------------
export SOURCEFILE=$HOMETEL/configs/name_of_the_file
export SYSTELCFG=$HOMETEL/configs/systel.cis-hpc_2015-09-16.cfg
export USETELCFG=cinecagalileoopenmpi
export RELTEL=v6p3
### COMPILERS -----------------------------------------------------------
module purge
module load gnu
module load openmpi

And running "source path_to_file" be fore running compileTELEMAC.py which after you used that file can be run from anywhere.
That file will set your environement.
And in the systel.cfg you can replace the two export by the "source" of that file as well.

Hope it helps
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
The following user(s) said Thank You: Gaeta

installation and run on cluster 9 years 2 months ago #18292

  • Gaeta
  • Gaeta's Avatar
Thanks a lot for your successful advices. The compilation well ended under the attached configuration file.
Now the running (last step....).
If I launch the code with the command
runcode.py tomawac -s cas_tom --walltime=2:00:00 --queue=parallel --ncsize=2 --ncnode=1 --jobname=job_T

the simulation runs.
The problem is that I need to use a batch script (run_telemac) to launch the simulation by qsub command.
The error is the following

runCode: Fail to run
|mpiexec -machinefile MPI_HOSTFILE -n 2 out_WaveWind_VarS-T_G3
|~~~~~~~~~~~~~~~~~~
|**********************************************************
|
|Open MPI does not support recursive calls of mpiexec
|
|**********************************************************
|~~~~~~~~~~~~~~~~~~
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[24539,1],0]
Exit code: 1

Helps?
G
Attachments:
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18294

  • Gaeta
  • Gaeta's Avatar
sorry, here is the batch file.
Attachments:
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18295

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

It looks like this is not a configuration / TELEMAC install / run anymore. You have successfuly completed this part. It is an MPI or TELEMAC problem (The Frotran part, as opposed to the python/bash).

We ought to test your TOMAWAC test case to be able to debug why it fails.

Sebastien.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.