Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Problem : output file seems to be incomplete

Problem : output file seems to be incomplete 12 years 10 months ago #3370

  • Jcharpentier
  • Jcharpentier's Avatar
Hello,

I'm trying to run Telemac2D on a linux cluster. The job manager is Loadleveler.
The simulation starts well, but the output file seems to be incomplete, and the error file is empty.

My case is named 'cas_tours'.
When the simulation starts, temporary files are created :
- delete_cas.cas21490
- signal_cas.cas21490
- fortran_hpclr_MP_v6p1.exe
- cas.cas21490_tmp/ folder which contain 13 temporary files.
The standard outputs are named cas_tours.err which is empty, and cas_tours.out.
Nothing is written after the beginning of the simulation, and at the end, every temporary files are still there.

I attached a tar.gz file with all files I have.

My submission script is job.cmd.
I searched for a solution but I don't have enough knowledge to understand what is going wrong.

Can anybody help me to understand why my simulation is incomplete ?

Thanks in advance
Johanne Charpentier
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3371

  • Jcharpentier
  • Jcharpentier's Avatar
The attachment failed for size reason.

Here is the result of 'ls -lrt' after the simulation :
-rw-r--r-- 1 charpentierj partenaires 10030133 Dec 16 09:23 strickler.txt
-rw-r--r-- 1 charpentierj partenaires 00003753 Dec 16 09:23 Sigloy.liq
-rw-r--r-- 1 charpentierj partenaires 00000307 Dec 16 09:23 mpi_telemac.conf
-rw-r--r-- 1 charpentierj partenaires 28846348 Dec 16 09:23 geo.geo
-rw-r--r-- 1 charpentierj partenaires 00012019 Dec 16 09:23 fortran.f
-rw-r--r-- 1 charpentierj partenaires 02951204 Dec 16 09:23 cli.cli
-rw-r--r-- 1 charpentierj partenaires 10233327 Dec 16 09:23 bathy.txt
-rw-r--r-- 1 charpentierj partenaires 00003320 Dec 16 11:20 cas.cas
-rw-r--r-- 1 charpentierj partenaires 00000393 Dec 28 14:17 job.cmd
-rw-r--r-- 1 charpentierj partenaires 00000000 Dec 28 14:18 cas_tours.err
-rwxr--r-- 1 charpentierj partenaires 00001591 Dec 28 14:18 delete_cas.cas21490
-rwxr--r-- 1 charpentierj partenaires 00000445 Dec 28 14:18 signal_cas.cas21490
-rw-r--r-- 1 charpentierj partenaires 07999453 Dec 28 14:18 fortran_hpclr_MP_v6p1.exe
drwxr-xr-x 2 charpentierj partenaires 00016384 Dec 28 14:18 cas.cas21490_tmp
-rw-r--r-- 1 charpentierj partenaires 00003655 Dec 28 14:18 cas_tours.out


I will try to attach some file separately.
- case file 'cas.cas'
- log file which is in cas.cas21490_tmp/ 'partel.log'.
Hope this will help.

If another file is needed, please tell me.

My submission script is as follows :

#!/bin/sh

#@job_type = mpich
#@node = 1
#@total_tasks=12
#@wall_clock_limit = 00:06:00,00:05:50
#@restart = no
#@environment = COPY_ALL
#@notify_user=charpentier@cines.fr
#@queue

exec 2>cas_tours.err 1>cas_tours.out

source /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/env.sh

export PATH=$PATH:/home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/bin

time telemac2d cas.cas


And the output file :

=========================================================
Telemac System 5.6 to 6.1 - Perl scripts version 6.1
=========================================================
starting...

HOSTTYPE : hpclr
PROJECT : /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel
BASE DIRECTORY : /home/charpentierj/TestTelemacHPC4T2D/tours
LAUNCH DIRECTORY : /home/charpentierj/TestTelemacHPC4T2D/tours
WORK DIRECTORY : /home/charpentierj/TestTelemacHPC4T2D/tours/cas.cas21490_tmp
PARAMETER FILE : cas.cas


*** Using default configuration file :
/home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/config/systel.ini ***



*** Using CUSTOM MPI configuration file :
/home/charpentierj/TestTelemacHPC4T2D/tours/mpi_telemac.conf ***


*** TELEMAC2D ON STATION ***


*** Interactive mode ***


*** RELEASE v6p1 ***

________________________________________________________
Steering file : cas.cas
________________________________________________________

________________________________________________________
Starting execution: telemac2d.bat
________________________________________________________
- FORTRAN FILE : fortran.f
______________________________________________________________________________
*** COMPILATION ***

mpif90 -c -O2 -convert big_endian -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/telemac2d/tel2d_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/sisyphe/sisyphe_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/tomawac/toma_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/bief/bief_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/special/special_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/damocles/damo_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/paravoid/paravoid_v6p1/hpclr -I /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/mpi/hpclr/include t2dfort.f
______________________________________________________________________________
*** LIBRARIES ***

- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/telemac2d/tel2d_v6p1/hpclr/telemac2dv6p1.a
- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/sisyphe/sisyphe_v6p1/hpclr/sisyphev6p1.a
- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/tomawac/toma_v6p1/hpclr/tomawacv6p1.a
- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/bief/bief_v6p1/hpclr/biefv6p1.a
- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/special/special_v6p1/hpclr/specialv6p1.a
- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/damocles/damo_v6p1/hpclr/damov6p1.a
- /home/charpentierj/TestTelemacHPC4T2D/telemac-v6p1/intel/parallel/parallel_v6p1/hpclr/parallelv6p1.a
- -lstdc++
- -lz
- -L/opt/cluster/mumps/MUMPS_4.10.0/lib
- -ldmumps
- -lmumps_common
- -lpord
- -L/opt/cluster/compilers/intel/mkl/10.2.5.035/lib/em64t
- -lmkl_scalapack_ilp64
- -lmkl_blacs_openmpi_ilp64
- -lmkl_intel_lp64
- -lmkl_sequential
- -lmkl_core
- -lguide
- -static-intel
- -L/opt/cluster/metis/metis-4.0.3
- -lmetis
- -L/opt/cluster/scotch/scotch_5.1.11_esmumps/lib
- -lptesmumps
- -lptscotch
- -lptscotcherr
- -lm

*** LINKING ***

______________________________________________________________________________
*** ALLOCATION OF USER FILES ***

- STEERING FILE : cas.cas
- DICTIONARY : telemac2dv6p1.dico
- GEOMETRY FILE : geo.geo




Thanks by advance,
Best regards,
Johanne Charpentier
Attachments:
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3373

  • jmhervouet
  • jmhervouet's Avatar
Hello Johanne,

I suspect a problem with parallelism or with the variable time step. Try a simpler case with the following modifications to your steering file :

/ remove this, it is not useful and drives you mad if you change
/ the name
/ FICHIER DES PARAMETRES ='cas.cas'

/ let's try to see all results first
PERIODE POUR LES SORTIES GRAPHIQUES =1
/ I added a point to show it is a real number
COEFFICIENT DE FROTTEMENT =15.
/ Idem
DEBITS IMPOSES =1.;0.;0.
/ Idem
COTES IMPOSEES =0.;44.;42.
/ unchanged but with this solver can be conjugate gradient : 1
TRAITEMENT DU SYSTEME LINEAIRE =2
/ to start with...
PAS DE TEMPS VARIABLE =NON
/ . added
PAS DE TEMPS =10.
/ scalar run first
PROCESSEURS PARALLELES =0
/
/DUREE DU CALCUL =300.
NUMBER OF TIME STEPS : 30
/NOMBRE DE COURANT SOUHAITE =1
/ see remark above
SOLVEUR =1
IMPLICITATION POUR LA HAUTEUR =0.55


You should see something in your results now, if not tell us what is written on your listing.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3374

  • riadh
  • riadh's Avatar
Hi Johanne

First of all verify if your mesh is OK or no!
Am I wrong or you have more than 770 islands !

If the mesh is ok, verify if you have the latest version of partel. I wonder that your problem is close to the one discussed in this post:
www.opentelemac.org/index.php?option=com...78&Itemid=62&lang=fr
If you have nodes/elements with high indices (> 1 million), the old version of partel is not able to handle them and you should consider the suggestions of this post.

My kind regards
Riadh
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3375

  • Jcharpentier
  • Jcharpentier's Avatar
@riadh :
A team provided me a test case, and I have to run it on my cluster.. I really don't know much about the scientific context of this case.
I hope the mesh is ok, but I really don't know.

In the file .out, I can see :
NOMBRE D'ELEMENTS: 1418656
NOMBRE REEL DE POINTS: 738888

I looked at the discussion you pointed, and it seems that I have what is needed : version 6.1 and when I checked the variable FMT4 in partel.f, it is set to I7 as it is recommended.

thanks a lot

@jmhervouet :
I tried to do all the modifications you told me, but there is a problem when I modify the parameter :
PROCESSEURS PARALLELES = 0

the error file for PROCESSEURS PARALLELES = 0 is:

forrtl: No such file or directory
forrtl: severe (29): file not found, unit 312, file /home/charpentierj/TestTelemacHPC4T2D/tours_28dec/cas.cas26909_tmp/..\strickler.txt
Image PC Routine Line Source
out26909_hpclr.ex 00000000009232ED Unknown Unknown Unknown
out26909_hpclr.ex 0000000000921DF5 Unknown Unknown Unknown
out26909_hpclr.ex 00000000008CC7C0 Unknown Unknown Unknown
out26909_hpclr.ex 00000000008810FA Unknown Unknown Unknown
out26909_hpclr.ex 00000000008808F0 Unknown Unknown Unknown
out26909_hpclr.ex 000000000089363D Unknown Unknown Unknown
out26909_hpclr.ex 000000000042AF29 Unknown Unknown Unknown
out26909_hpclr.ex 00000000005E5DFB Unknown Unknown Unknown
out26909_hpclr.ex 0000000000442DA3 Unknown Unknown Unknown
out26909_hpclr.ex 000000000042BD81 Unknown Unknown Unknown
out26909_hpclr.ex 000000000042A01C Unknown Unknown Unknown
libc.so.6 00000036C261D994 Unknown Unknown Unknown
out26909_hpclr.ex 0000000000429F29 Unknown Unknown Unknown
Command exited with non-zero status 29
1.43user 0.42system 0:01.89elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+332664minor)pagefaults 0swaps
## Erreur : Fin anormale : time ./out26909_hpclr.exe :7424

real 0m13.000s
user 0m2.079s
sys 0m0.632s


Then i tried with the value PROCESSEURS PARALLELES = 1
The error file is a little bit different but there is still a problem.
In addition, a file named mpirun.txt is created with 12 times the hosts name.

the error file for PROCESSEURS PARALLELES = 1 is :

forrtl: No such file or directory
forrtl: severe (29): file not found, unit 312, file /home/charpentierj/TestTelemacHPC4T2D/tours_28dec/cas.cas27118_tmp/..\strickler.txt
Image PC Routine Line Source
out27118_hpclr.ex 0000000000962FED Unknown Unknown Unknown
out27118_hpclr.ex 0000000000961AF5 Unknown Unknown Unknown
out27118_hpclr.ex 000000000090C4C0 Unknown Unknown Unknown
out27118_hpclr.ex 00000000008C0DFA Unknown Unknown Unknown
out27118_hpclr.ex 00000000008C05F0 Unknown Unknown Unknown
out27118_hpclr.ex 00000000008D333D Unknown Unknown Unknown
out27118_hpclr.ex 0000000000431029 Unknown Unknown Unknown
out27118_hpclr.ex 00000000005EBEFB Unknown Unknown Unknown
out27118_hpclr.ex 0000000000448EA3 Unknown Unknown Unknown
out27118_hpclr.ex 0000000000431E81 Unknown Unknown Unknown
out27118_hpclr.ex 000000000043011C Unknown Unknown Unknown
libc.so.6 00000036C261D994 Unknown Unknown Unknown
out27118_hpclr.ex 0000000000430029 Unknown Unknown Unknown

mpirun has exited due to process rank 0 with PID 27145 on
node node052 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
## Erreur : Fin anormale : cd /home/charpentierj/TestTelemacHPC4T2D/tours_28dec/cas.cas27118_tmp; mpirun out27118_hpclr.exe :7424

real 0m14.356s
user 0m2.334s
sys 0m0.688s



Then i tried to adapt the values in my submission script (#@TOTAL_TASKS=1) and in mpi_telemac.conf for both case (PROCESSEURS PARALLELES = 0 & PROCESSEURS PARALLELES = 1) but it doesn't change anything...


Do you have an idea of what can i do to solve this?

Best regards,
Johanne Charpentier
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3376

  • jmhervouet
  • jmhervouet's Avatar
Hello,

This is strange, the file strickler.txt is missing in the temporary folder because it is not mentioned in the steering file as a known data file (in this case it is copied by the perl scripts). There is perhaps a missing line in this file, or more probably in the file fortran.f there is an OPEN command on this strickler.txt file in a hardcoded way, which cannot work unless the name is written with ../strickler.txt with the two points to go backward in the original directory where the file is.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3377

  • Jcharpentier
  • Jcharpentier's Avatar
Hello,

Strange.
In fortran.f there is at line 244 :
open(312,file='..\strickler.txt',form='formatted',status='old')
I don't understand what is happening.
I will try to find a solution...

Thanks a lot for your help
Johanne Charpentier
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3378

  • jmhervouet
  • jmhervouet's Avatar
Hello,

It could be that 312 is out of range for your compiler ? In standard the maximum is perhaps 99.

Another solution would be to declare the file as:

FICHIER DE DONNEES FORMATE : strickler.txt

in the steering file, and to use logical unit 26 everywhere instead of 312. It would be then copied into the temporary folder by the scripts. This is valid when there is no coupling, otherwise (coupling with Sisyphe) the logical unit is:

T2D_FILES(T2DFO1)%LU and may be different from 26.

Hope this helps,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3386

  • jmhervouet
  • jmhervouet's Avatar
Hello,

After thinking (a lot at night...) the solution of the problem is clear:

The path leading to file strickler.txt shows that you are on Linux, with / announcing folders, while in the Fortran it is written for Windows, with \ announcing folders. So you need only to change ..\ into ../ in the Fortran.

This is why hardcoding the file names is not very portable.

With best regards, and happy new year,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re: Problem : output file seems to be incomplete 12 years 10 months ago #3389

  • Jcharpentier
  • Jcharpentier's Avatar
Hello,
Thank you for your help
I modified the paths with the right '/' in fortran.f
Unfortunately, I have still some problems with the output.
I verified all the installation procedure (paths and systel.ini file).
My output file is still stopped after writting :

*** ACQUISITION DES FICHIERS ***

- FICHIER DES PARAMETRES : cas.cas
- DICTIONNAIRE : telemac2dv6p1.dico
- FICHIER DE GEOMETRIE : geo.geo


In the file 'runtel.pl', there is PROGRAMME PRINCIPAL.
I founded a key word named 'sortie'
$sortie ="nul";
I tried to change it to 'listing' but it doesn't seems to work.

I'm sorry, I really don't know much about Telemac.
My job is to make it run for the scientific team which gave me the test case.
If you have any other idea...

Thanks a lot
Johanne Charpentier
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.