Welcome, Guest
Username: Password: Remember me

TOPIC: Error launching Telemac at Cluster

Error launching Telemac at Cluster 6 years 2 weeks ago #31926

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hi all!

I installed Telemac v7p3r1 at a HPC cluster, with gFortran and SGI MPT 2.18.

At the front end node the scalar and the parallel version work.

The cluster uses PBS scheduler, so I set my config file as follows:
____________________________________________________________________
[mpi]
#
# mpi_cmdexec: /opt/hpe/hpc/mpt/mpt-2.18/bin/mpiexec -wdir <wdir> -n <ncsize> <exename>
mpi_cmdexec: /opt/hpe/hpc/mpt/mpt-2.18/bin/mpiexec -np <ncsize> <exename>
#
options: parallel mpi
par_cmdexec: <config>/partel < PARTEL.PAR >> <partel.log>
#
cmd_lib: ar cru <libname> <objs>
#
incs_parallel: -I /opt/hpe/hpc/mpt/mpt-2.18/include
incs_special: -I /opt/hpe/hpc/mpt/mpt-2.18/include
libs_partel: /beegfs/home/flussbuero/software/openTelemac/metis_5.1.0/lib/libmetis.a
# libs_all: /opt/hpe/hpc/mpt/mpt-2.18/lib/libmpi.so /opt/hpe/hpc/mpt/mpt-2.18/lib/libxmpi.so
libs_all: /opt/hpe/hpc/mpt/mpt-2.18/lib/libmpi.so
incs_api: -I /opt/hpe/hpc/mpt/mpt-2.18/include
#
hpc_stdin: #!/bin/bash
#PBS -S /bin/sh
#PBS -o <sortiefile>
#PBS -e <exename>.err
#PBS -N test
#PBS -l walltime=24:00:00
#PBS -l nodes=<nctile>:ppn=<ncnode>
#PBS -q flussbuero
#PBS -j oe
source /etc/profile.d/modules.sh
module load mpt/2.18
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/beegfs/home/flussbuero/software/lib64/:/beegfs/home/flussbuero/software/gcc-4.9.4/lib64/
export LD_LIBRARY_PATH
<mpi_cmdexec>
#
hpc_cmdexec: chmod 755 <hpc_stdin>; qsub <hpc_stdin>
# hpc_cmdexec: qsub <hpc_stdin>
#
cmd_obj: gfortran --prefix=/beegfs/home/flussbuero/software/gcc-4.9.4 -c -cpp -O3 -DHAVE_MPI -fconvert=big-endian -frecord-marker=4 <mods> <incs> <f95name>
cmd_exe: /opt/hpe/hpc/mpt/mpt-2.18/bin/mpif90 -fconvert=big-endian -frecord-marker=4 -v -lm -o <exename> <objs> <libs>
___________________________________________________________________

When trying to launch a Telemac-2D simulation, I get the following error:
___________________________________________________________________
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 32
BARRIER PASSED
READ_CONFIG: FILE CONFIG NOT FOUND: CONFIG
DEFAULTS VALUES OF LU AND LNG: 6 AND 2
LISTING OF TELEMAC2D

TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC

2D VERSION V7P3 FORTRAN 2003







~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^~^~
~ ~
\ ' o '
/\ o \ o
>=)'> ' /\ '
\/ \ >=)'> ~
/ /\ \/
~ >=)'> / .
\/ )
/ (
~ ) )
} ~ ( ( (
{ ) ) )
} } . ( ( (
{ { /^^^^^^^^^^^^
^^^^^^^^^\ /
^^^^^^^^^






At line 262 of file /beegfs/home/flussbuero/software/openTelemac/v7p3r1/sources/telemac2d/lecdon_telemac2d.f (unit = 1, file = '(·O¬ª*')
Fortran runtime error: File 'T2DDICO' does not exist
At line 262 of file /beegfs/home/flussbuero/software/openTelemac/v7p3r1/sources/telemac2d/lecdon_telemac2d.f (unit = 1, file = '8¸O¬ª*')
Fortran runtime error: File 'T2DDICO' does not exist
...
_________________________________________________________________________

I already tried some different options but without success.

Does anybody have a clue what is going on?

I would be very glad for an answer!


With best regards,
Clemens
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 2 weeks ago #31927

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Could you rerun it with &ETA added at the end of your steering file.
That would tell us what telemac has read from the steering file.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 2 weeks ago #31928

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hi,

it gives the same error message.

In the temp directory I have the HPC_STDIN.
Strangely, when I run the command (last line) manually, the Telemac simulation starts.
___________________________________________________
#!/bin/bash
#PBS -N test
#PBS -o hpc-job.sortie
#PBS -l walltime=24:00:00
#PBS -l nodes=2:ppn=16
#PBS -q flussbuero
#PBS -j oe
source /etc/profile.d/modules.sh
module load mpt/2.18
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/beegfs/home/flussbuero/software/lib64/:/beegfs/home/flussbuero/software/gcc-4.9.4/lib64/
export LD_LIBRARY_PATH
/opt/hpe/hpc/mpt/mpt-2.18/bin/mpiexec -np 32 /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-08-16h11min55s/out_telemac2d
___________________________________________________

But again, it seems that the simulation runs on the front end node?

Clemens
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 1 week ago #31931

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Could you post the listing with &ETA.
&ETA make dmocles dump all the variable he has read from the steering file.
That was what i was interested in i forgot to say that it would crash the same.
To see what is read from the steering file (espacially the path of the dictionnary).


You can have a look at what T2DCAS looks like in your temporary folder.

After your remark on the fact that it runs on the front end node.
Do you reload telemac environement in you PBS script ?
That could be the issue.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 1 week ago #31932

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hi,

I don't have a Telemac environment. I compiled Telemac in my home folder.

I included &ETA at the end of the steering file.
Sorry, the post has a lot of content. Basically the procdure consists of three steps.

1.
I run my shell script. It creates the temporary folder and this output, nothing unusual:

Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

_ _ _
| | (_) (_)
_ _ _ __ | | __ _ __ ___ __ __ _ __ _ __ ___ __ __ _ ___ _ ___ _ __
| | | || '_ \ | |/ /| '_ \ / _ \ \ \ /\ / /| '_ \ | '__| / _ \\ \ / /| |/ __|| | / _ \ | '_ \
| |_| || | | || < | | | || (_) | \ V V / | | | | | | | __/ \ V / | |\__ \| || (_) || | | |
\__,_||_| |_||_|\_\|_| |_| \___/ \_/\_/ |_| |_| |_| \___| \_/ |_||___/|_| \___/ |_| |_|


... parsing configuration file: /beegfs/home/flussbuero/software/openTelemac/v7p3r1/configs/systel.cis-zamg_v7p3r1.cfg


Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+> configuration: mpi
+> root: /beegfs/home/flussbuero/software/openTelemac/v7p3r1


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


... reading the main module dictionary

... processing the main CAS file(s)
+> running in English

... handling temporary directories

... checking coupling between codes

... checking parallelisation

... first pass at updating all input files
copying: Geo_Liesing_Wehrsohle-mod.slf /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/T2DGEO
copying: BOTTOM_BC.cli /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/T2DCLI
copying: Hydrograph_to_MQ.txt /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/T2DIMP
copying: Geo_Liesing_Wehrsohle-mod_iniWD.slf /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/T2DPRE
copying: T2D_hotstart_for_MQ.cas /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/T2DCAS
copying: telemac2d.dico /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/T2DDICO

... checking the executable
copying: telemac2d /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/out_telemac2d

... modifying run command to MPI instruction

... modifying run command to PARTEL instruction

... partitioning base files (geo, conlim, sections, zones and weirs)
+> /beegfs/home/flussbuero/software/openTelemac/v7p3r1/builds/mpi/bin/partel < PARTEL.PAR >> partel_T2DGEO.log
STOP 0

... splitting / copying other input files
duplicating: T2DIMP
partitioning: T2DPRE
+> /beegfs/home/flussbuero/software/openTelemac/v7p3r1/builds/mpi/bin/partel < PARTEL.PAR >> partel_T2DPRE.log
STOP 0

... handling sortie file(s)


Running your simulation(s) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


... modifying run command to HPC instruction
_ _ ___ _ _ _
| | | | |__ \ | | | | | |
| |_ ___ | | ___ _ __ ___ __ _ ___ ) | __| | ______ | |_ _ __ _ _ _ __ | | __
| __| / _ \| | / _ \| '_ ` _ \ / _` | / __| / / / _` | |______| | __|| '__|| | | || '_ \ | |/ /
| |_ | __/| || __/| | | | | || (_| || (__ / /_ | (_| | | |_ | | | |_| || | | || <
\__| \___||_| \___||_| |_| |_| \__,_| \___||____| \__,_| \__||_| \__,_||_| |_||_|\_\


453.zaasfe1
... Your simulation (T2D_hotstart_for_MQ.cas) has been launched through the queue.

+> You need to wait for completion before re-collecting files using the option --merge



My work is done
_________________________________________

2. In the temporary folder:
I have the HPC_STDIN and the sortie file.

HPC_STDIN:
________________________________________
#!/bin/bash
#PBS -N test
#PBS -o hpc-job.sortie
#PBS -l walltime=24:00:00
#PBS -l nodes=2:ppn=16
#PBS -q flussbuero
#PBS -j oe
source /etc/profile.d/modules.sh
module load mpt/2.18
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/beegfs/home/flussbuero/software/lib64/:/beegfs/home/flussbuero/software/gcc-4.9.4/lib64/
export LD_LIBRARY_PATH
/opt/hpe/hpc/mpt/mpt-2.18/bin/mpiexec -np 32 /beegfs/home/flussbuero/0_projects/openTelemac/MQ/T2D_hotstart_for_MQ.cas_2018-11-09-07h49min05s/out_telemac2d
____________________________________________

sortie file:
_________________________________________________________________
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 32
BARRIER PASSED
At line 262 of file /beegfs/home/flussbuero/software/openTelemac/v7p3r1/sources/telemac2d/lecdon_telemac2d.f (unit = 1, file = '(·O¬ª*')
Fortran runtime error: File 'T2DDICO' does not exist
At line 262 of file /beegfs/home/flussbuero/software/openTelemac/v7p3r1/sources/telemac2d/lecdon_telemac2d.f (unit = 1, file = '(·O¬ª*')
Fortran runtime error: File 'T2DDICO' does not exist
At line 262 of file /beegfs/home/flussbuero/software/openTelemac/v7p3r1/sources/telemac2d/lecdon_telemac2d.f (unit = 1, file = '8¸O¬ª*')
Fortran runtime error: File 'T2DDICO' does not exist
READ_CONFIG: FILE CONFIG NOT FOUND: CONFIG
DEFAULTS VALUES OF LU AND LNG: 6 AND 2
LISTING OF TELEMAC2D

TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC

2D VERSION V7P3 FORTRAN 2003







~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^~^~
~ ~
\ ' o '
/\ o \ o
>=)'> ' /\ '
\/ \ >=)'> ~
/ /\ \/
~ >=)'> / .
\/ )
/ (
~ ) )
} ~ ( ( (
{ ) ) )
} } . ( ( (
{ { /^^^^^^^^^^^^
^^^^^^^^^\ /
^^^^^^^^^






MPT ERROR: MPI_COMM_WORLD rank 7 has terminated without calling MPI_Finalize()
aborting job
______________________________________________

3. Manaully launching the simulation within the temp folder works.
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 1 week ago #31933

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Ok so &ETA is indeed useless because it crashes before that.
Could you post the contentnt of T2DCAS that is in your temporary folder.
It is rewritten by the python and sometimes if the path to the dictionary is too long it can generate errors.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 1 week ago #31934

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
It is exactly the same as the specified cas file:

/
/ TELEMAC-2D All-in-one steering file
/

PARALLEL PROCESSORS =32

/ DEBUGGER =1


MAXIMUM NUMBER OF BOUNDARIES = 2000


/
/ FRICTION AND TURBULENCE
/

LAW OF BOTTOM FRICTION =3
/ 3 = Strickler

FRICTION COEFFICIENT =35

/ TURBULENCE MODEL FOR SOLID BOUNDARIES =2

LAW OF FRICTION ON LATERAL BOUNDARIES =0

ROUGHNESS COEFFICIENT OF BOUNDARIES =60

TURBULENCE MODEL =5
/ 1 = default = const. eddy viscosity
/ 2 = Elder
/ 3 = k-epsilon model

VELOCITY DIFFUSIVITY =1.E-6
/ default = 1.E-6
/ in case of Elder or k-e model use 1.E-6


/
/ EQUATIONS, BOUNDARY CONDITIONS
/

VELOCITY PROFILES =1;4

PRESCRIBED FLOWRATES =0.0;5.0

PRESCRIBED ELEVATIONS =636.2;0.0

/ PRESCRIBED VELOCITIES =0.0;0.0

OPTION FOR LIQUID BOUNDARIES =1;1


/
/ EQUATIONS, INITIAL CONDITIONS
/

/ INITIAL CONDITIONS ='CONSTANT ELEVATION'

/ INITIAL ELEVATION =255.37

/ INITIAL DEPTH =0.0

/ NUMBER OF WEIRS = 6


/
/ INPUT-OUTPUT, FILES
/

GEOMETRY FILE ='Geo_Liesing_Wehrsohle-mod.slf'

BOUNDARY CONDITIONS FILE ='BOTTOM_BC.cli'

RESULTS FILE ='res_Liesing_ML_MQ.slf'

LIQUID BOUNDARIES FILE ='Hydrograph_to_MQ.txt'

/ FORTRAN FILE ='weir_hotstart.f'

/ STAGE-DISCHARGE CURVES =1;0

/ STAGE-DISCHARGE CURVES FILE ='Pegelschluessel_Sulm.txt'

/
/ WEIRS
/

/ NUMBER OF WEIRS = 1

/ WEIRS DATA FILE ='weir_KWEbner_var.txt'

/
/ RESTART SIMULATION
/

PREVIOUS COMPUTATION FILE
='Geo_Liesing_Wehrsohle-mod_iniWD.slf'

COMPUTATION CONTINUED =YES

INITIAL TIME SET TO ZERO =YES

/
/ INPUT-OUTPUT, GRAPHICS AND LISTING
/

ORIGINAL DATE OF TIME =0;0;0

LISTING PRINTOUT PERIOD =100

VARIABLES FOR GRAPHIC PRINTOUTS ='U,V,B,H,S,US,F'

MASS-BALANCE =YES

GRAPHIC PRINTOUT PERIOD =600


/
/ NUMERICAL PARAMETERS
/

TIDAL FLATS =YES

TREATMENT OF NEGATIVE DEPTHS =2

SUPG OPTION =0;0;2;2

TYPE OF ADVECTION =14;5;1;1

CONTINUITY CORRECTION =YES

TIME STEP =0.5

NUMBER OF TIME STEPS =21600

FREE SURFACE GRADIENT COMPATIBILITY =0.9

TREATMENT OF THE LINEAR SYSTEM =2

SOLVER =1

MASS-LUMPING ON H =1.0

IMPLICITATION FOR DEPTH =1.0

IMPLICITATION FOR VELOCITY =1.0

/ IMPLICITATION FOR DIFFUSION OF VELOCITY =1.0

/ SOLVER ACCURACY =1.E-4



/
/ FINITE VOLUME OPTIONS
/

/ Attention to Listing printout und Graphics printout periods!

/ EQUATIONS ='SAINT-VENANT VF'

/ FINITE VOLUME SCHEME =5 / 5=HLLC

/ VARIABLE TIME-STEP =true

/ DURATION =7200 / seconds

/ DESIRED COURANT NUMBER =0.95 / or smaller

/ NEWMARK TIME INTEGRATION COEFFICIENT =1.0
/ 1.0: explicit Euler, 0.5: second order in time!

/
/ PHYSICAL CONSTANTS
/

WATER DENSITY =1000.0

&ETA
The administrator has disabled public write access.

Error launching Telemac at Cluster 6 years 1 week ago #31935

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Too bad i was hoping the issue was there.
Is it the same compiler on the computing node and on the front end ?
You add thing to LD_LIBRARY_PATH in your pbs script.
Are they also added on the front end ?

This could be any issue of conflicting librairies.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.