Welcome, Guest
Username: Password: Remember me

TOPIC: installation and run on cluster

installation and run on cluster 9 years 2 months ago #18306

  • Gaeta
  • Gaeta's Avatar
sorry, it doesn't work also with your (*modified) files.
here are the files:
- run_telemac.txt is the batch file; i launched it in the command line as "qsub run_telemac" in the folder of my test case (containing cas, bc, etc);
- the resulting job.err and job.out files.

Thanks. Hope you could help me to find the problem...
Regards

Gabriella
Attachments:
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18307

  • Gaeta
  • Gaeta's Avatar
I made a test..
after code stop, I entered the temporary folder of the cas and launched the mpirun out_tom (executable).
I got this error

MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 16
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 16
PARAL NCSIZE = 1
EXECUTABLE FILE: /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-00h43min34s/A.EXE
BARRIER PASSED

and checking the PARAL file, I got
1
132

while MPI NCSIZE = 16 (why not 8 as in my script file??)
Sorry, too many lacks in IT probably.
Regards

G
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18310

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Ok,

There seems to be an error when your batch script is generated.
It does not look like it should did you recompile with my modified files ?
Because to start there should be the line
source .../pysource.cinecagalileompi_hpc.sh
In your batch so that means that you are not using the configuration i gave you.

Hope it helps.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18315

  • Gaeta
  • Gaeta's Avatar
Good moorning

Also adding the above line (source .../pysource.cinecagalileompi_hpc.sh) the same error occurs.
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18316

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
That's not what i asked.

The error is that in your bash file the python command should be something like:
runcode.py --mpi ...
and you have :
mpirun runcode.py ...

Your while batch file is incorrect and the only reason i can see for that is that your installation is not using the systel.cfg i gave you.

Could you post the outpu of the command conifg.py ?

Hope it helps.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18317

  • Gaeta
  • Gaeta's Avatar
you mean this?

Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/systel.cfg


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cinecagalileoopenmpi_hpc:

+> root: /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2
+> version: v6p3
+> module: tomawac / damocles / partel / parallel / postel3d / artemis / telemac3d / diffsel / gretel / special / stbtel / bief / dredgesim / sisyphe / telemac2d


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


My work is done
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18318

  • Gaeta
  • Gaeta's Avatar
I resent you the config file, the pysource.cinecagalileompi_hpc.sh and my new batch file I used for compilarion (I first launch the command source .../pysource.cinecagalileompi_hpc.sh, then recompile the code in ..script/python72 as python compileTELEMAC.py --clean).
Compilation sucessful.
Then in my cas folder, I launched the command qsub run_telemac

One think. Actually I deleted the command cd $PBS_O_WORKDIR...
Now the errors are different,
in j.err
/galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/pysource_cinecagalileompi_hpc.sh: line 10: module: command not found
/galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/pysource_cinecagalileompi_hpc.sh: line 11: module: command not found
/galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/pysource_cinecagalileompi_hpc.sh: line 12: module: command not found
/bin/sh: mpif90: command not found

in j.out
Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/configs/systel.cfg


Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+> configuration: cinecagalileoopenmpi_hpc
+> root: /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2
+> version v6p3


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


... reading the main module dictionary

... processing the main CAS file(s)
+> running in English

... checking parallelisation

... handling temporary directories

... checking coupling between codes

... first pass at copying all input files
+> cas_GolfoTaranto_tom
copying: tomawac.dico /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/WACDICO
re-writing: /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/WACCAS
copying: newSelafin_1500_200_50_nearshore1_NEW_SHYF_B_2.slf /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/WACGEO
copying: WaveWind_VarS-T_G3.f /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/wacfort.f
copying: WAVE_timeseries_CROTONE_TOM_F_201410010000_to_2014100110000.txt /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/WACFO1
copying: BC_GolfoTaranto_tom.cli /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/WACCLI
copying: input_for_TOM2_xy_20141001_to_20141010.txt /galileo/home/userexternal/mgaeta00/TAR3D_sim/Golfo/ww00_dt10/Golfo_1-10ottobre_dt10/code/cas_GolfoTaranto_tom_2015-09-17-12h49min30s/WACVEF

... checking the executable
mpif90 -fopenmp -c -O3 -fconvert=big-endian -frecord-marker=4 -DHAVE_MPI -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/special -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/parallel -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/damocles -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/bief -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/tomawac wacfort.f
... The following command failed for the reason above
mpif90 -fopenmp -c -O3 -fconvert=big-endian -frecord-marker=4 -DHAVE_MPI -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/special -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/parallel -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/damocles -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/utils/bief -I /galileo/home/userexternal/mgaeta00/Telemac/svn.opentelemac.org/svn/opentelemac/tags/v6p3r2/builds/cinecagalileoopenmpi_hpc/lib/tomawac wacfort.f

Thanks
Attachments:
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18319

  • Gaeta
  • Gaeta's Avatar
Ok, something starts working!
I made some modification in the batch and conf files.
with the command qsub run_telemac.
The run goes, BUT not a parallel mode!!
grrr :S
in the cas folder I don't have partitioned files (geo, etc) althogh changing the number of processors (both in the steering file and in the batch file as --ncsize=8 (for example))

Why? helps? I'm getting crazy....
G
Attachments:
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18321

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi

It seems to me that it's the telemac script which launch the command qsub < hpc_stdin because before running the computation there is the partition step which should be execute by only 1 processor!

hope this helps
Christophe
The administrator has disabled public write access.

installation and run on cluster 9 years 2 months ago #18322

  • Gaeta
  • Gaeta's Avatar
telemac script = Batch script?
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.