Welcome, Guest
Username: Password: Remember me

TOPIC: parallel mode problem

parallel mode problem 11 years 10 months ago #6919

  • Flo64
  • Flo64's Avatar
Hi everybody,

I choose to taken the parallel mode in order to make my runs.
When i change the number of processor, telemac doesn't want to calculate. It calculates when i choose one processor and if i put more of 1 processor, i have this error:

jeliazovski@nemo:~/Documents/bathy_congo/test_flo/blabla3.MAT$ python /PROJETS/oceano/softs/Outils_telemac/v6p2/pytel/telemac2d.py cas.txt


Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: /PROJETS/oceano/softs/Outils_telemac/v6p2/config/systel.cfg


Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+> configuration: ubugfopenmpi
+> root: /PROJETS/oceano/softs/Outils_telemac/v6p2/
+> version v6p2


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Running cas.txt with telemac2d under /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... reading module dictionary
... simulation en Francais avec cas.txt
re-writing: /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/T2DCAS
copying: geom /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/T2DGEO
copying: conlim /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/T2DCLI
copying: telemac2dv6p2.dico /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/T2DDICO


Running your simulation :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


/usr/bin/mpiexec -wdir /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s -n 2 --host nemo /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/out_telemac2dv6p2
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 2
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 2
PARAL NCSIZE = 1
EXECUTABLE FILE: /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/A.EXE
BARRIER PASSED

LISTING DE TELEMAC-2D

TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC

2D VERSION 6.2 FORTRAN 90
WITH SEVERAL TRACERS
COUPLED WITH SISYPHE AND TOMAWAC

NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 2
TELEMAC-2D : 1
LA VALEUR 2 EST GARDEE

********************************************
* LECDON: *
* APRES APPEL DE DAMOCLES *
* VERIFICATION DES DONNEES LUES *
* SUR LE FICHIER DES PARAMETRES *
********************************************

SORTIE DE LECDON. TITRE DE L'ETUDE :
TELEMAC-2D - Initiation - Calcul numero 1

OUVERTURE DES FICHIERS POUR TELEMAC2D

*****************************
* ALLOCATION DE LA MEMOIRE *
*****************************

LIT : FIN DE FICHIER ANORMALE
ON VOULAIT LIRE UN
ENREGISTREMENT DE 72 VALEURS
DE TYPE : CH
SUR LE CANAL : 1

PLANTE : ARRET DU PROGRAMME APRES ERREUR
SORTIE DE PVM : APPEL DE P_EXIT

SORTIE DE MPI

_____________
runcode::main:
/NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT:
|runCAS: fail to run
| /usr/bin/mpiexec -wdir /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s -n 2 --host nemo /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-15h49min41s/out_telemac2dv6p2
jeliazovski@nemo:~/Documents/bathy_congo/test_flo/blabla3.MAT$
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6921

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

Can you try with the option --ncsize=2.
And can you show me what contains the partel_T2DGEO.log

Thanks
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6922

  • Flo64
  • Flo64's Avatar
I try to put the option -ncsize=2 but dosn't calculate.

I put in my parameter file the line : Parallel processors in whixh i put 2 processors. I have a new error:

Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: /PROJETS/oceano/softs/Outils_telemac/v6p2/config/systel.cfg


Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+> configuration: ubugfopenmpi
+> root: /PROJETS/oceano/softs/Outils_telemac/v6p2/
+> version v6p2


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Running cas.txt with telemac2d under /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... reading module dictionary
... simulation en Francais avec cas.txt
re-writing: /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-16h47min53s/T2DCAS
copying: geom /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-16h47min53s/T2DGEO
copying: conlim /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-16h47min53s/T2DCLI
copying: telemac2dv6p2.dico /NOVELTIS/jeliazovski/Documents/bathy_congo/test_flo/blabla3.MAT/cas.txt_2013-01-14-16h47min53s/T2DDICO
partitioning: T2DGEO
+> /PROJETS/oceano/softs/Outils_telemac/v6p2/parallel/parallel_v6p2/ubugfopenmpi/partel < PARTEL.PAR >> partel_T2DGEO.log
Segmentation fault
... The following command failed for the reason above
/PROJETS/oceano/softs/Outils_telemac/v6p2/parallel/parallel_v6p2/ubugfopenmpi/partel < PARTEL.PAR >> partel_T2DGEO.log
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6923

  • jmhervouet
  • jmhervouet's Avatar
Hello,

The number of processors seems different in Telemac-2D and in Sisyphe. Could you change this to see if it is the problem (normally it should not and Telemac-2D leads, but just in case it could raise a problem with Python and not with perl).

Otherwise the error is a segmentation fault in the partitioning program partel, hence the need to see the file partel_T2DGEO.log, asked by my colleague yugi. It could be a known problem if you have hundreds of islands.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6925

  • Flo64
  • Flo64's Avatar
Mr Hervouet,

I give you the file partel_T2DGEO.log because i don't know how to see it.
I try with different numbers of processors and the problem is the same.
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6936

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
You can find the partel_T2DGEO.log in the temporary folder generated when you launch Telemac it should be named nameofthecase_timeofthelaunch
You get this kind of error when the partitionning crashed.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6941

  • Flo64
  • Flo64's Avatar
Yugi,

Thanks for your help,
I find the partel_T2DGEO.log but there is nothing in this file: no informations
The administrator has disabled public write access.

parallel mode problem 11 years 10 months ago #6944

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Could you try to rerun the command below in the temporary file:
/PROJETS/oceano/softs/Outils_telemac/v6p2/parallel/parallel_v6p2/ubugfopenmpi/partel < PARTEL.PAR

And tell me what you get.
The python tends to miss a few information.
Thanks
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

parallel mode problem 11 years 9 months ago #7208

  • petebowyer
  • petebowyer's Avatar
Hello all, I also have problems with partel:
In a directory called chann5k/, with geometry [slf] file chann5kxy.slf and bc file chann5k.cli(both from blue kenue) specified in the cas file]. I have compiled T2d for ifort parallel runs [ip]
doing
> runcode.py -c ip -s chann5knoipar.cas
there is a problem:

forrtl: severe (39): error during read, unit 10, file /home/pbowyer/telemac/chann5k/chann5knoipar.cas_2013-01-29-11h16min23s/T2DGEO
Image PC Routine Line Source
partel 00000000004F807D Unknown Unknown Unknown
partel 00000000004F6B85 Unknown Unknown Unknown
.
.
.
I think I can get round it by copying the chann5k.cli
file to
chann5knoipar.cas_2013-01-29-11h16min23s/T2DCLI
and the chann5kxy.slf file to
chann5knoipar.cas_2013-01-29-11h16min23s/T2DGEO
and running the execytable file (out...) in the chann5knoipar.cas_2013-01-29-11h16min23s/
Dont know if this helps....
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.