Welcome,
Guest
|
TOPIC: Problem with parallel version v7p0
Problem with parallel version v7p0 9 years 9 months ago #15706
|
Dear all,
I have a problem to run Telemac v7p0 in parallel. I have installed the Intel compiler 11.1, and the Intel MPI library 5.0 update 2. Compilations for both configurations (sequential, parallel) are OK. Simulations are running in sequential mode. However, when I launch a simulation (the "malpasset" case) with the key word "PARALLEL PROCESSORS=4" in the steering file (calcul launched with instruction : "telemac2d.py -c wintelmpi t2d_malpasset-small.cas"), I obtain the message : "C:\opentelemac\MPICH2\bin\mpiexec.exe" -n 4 C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-02-03-09h31min05s\out_t2d_malpasset-small.exe" and nothing else happens. I'm obliged to stop manually the run ("Ctrl + c"). All the listing information of this run is given here: C:\opentelemac\v7p0\examples\telemac2d\malpasset>telemac2d.py -c wintelmpi t2d_malpasset-small.cas
Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... parsing configuration file: C:\opentelemac\v7p0\configs\systel.cfg
Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+> configuration: wintelmpi
+> root: C:\opentelemac\v7p0
+> version v7p0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... reading the main module dictionary
... processing the main CAS file(s)
+> simulation en Francais
... checking parallelisation
... handling temporary directories
... checking coupling between codes
... first pass at copying all input files
copying: geo_malpasset-small.slf C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-03-09h31min05s\T2DGEO
copying: t2d_malpasset-small.f C:\opentelemac\v7p0\examples\telemac2d\malpa
sset\t2d_malpasset-small.cas_2015-02-03-09h31min05s\t2dfort.f
copying: geo_malpasset-small.cli C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-03-09h31min05s\T2DCLI
copying: f2d_malpasset-small.slf C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-03-09h31min05s\T2DREF
re-copying: C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-sma
ll.cas_2015-02-03-09h31min05s\T2DCAS
copying: telemac2d.dico C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2
d_malpasset-small.cas_2015-02-03-09h31min05s\T2DDICO
... checking the executable
xilink: executing 'link'
created: t2d_malpasset-small.exe
re-copying: t2d_malpasset-small.exe C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-03-09h31min05s\out_t2d_malpasset-small.ex
e
... modifying run command to MPI instruction
... modifying run command to PARTEL instruction
partitioning: T2DGEO
+> C:\opentelemac\v7p0\builds\wintelmpi\bin\partel.exe < PARTEL.PAR >> part
el_T2DGEO.log
0
partitioning: T2DREF
+> C:\opentelemac\v7p0\builds\wintelmpi\bin\partel.exe < PARTEL.PAR >> part
el_T2DREF.log
0
... handling sortie file(s)
Running your simulation(s) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"C:\opentelemac\MPICH2\bin\mpiexec.exe" -n 4 C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-02-03-09h31min05s\out_t2d_malpasset-small.exe and here nothing else happens The folder "t2d_malpasset-small.cas_2015-02-03-09h53min01s" created, contains the files given in the attached document "Doc_forum.doc" This seems to indicate that my problem only comes from my "mpiexec.exe" configuration? I then tried to launch the command "mpiexec.exe -n 4 out_t2d_malpasset-small.exe" in the folder "t2d_malpasset-small.cas_2015-02-03-09h53min01s". I first had a message asking to give my user name and my password, and after the run performed, but just with one processor (and not 4 as indicated in my steering file and in my command line (mpiexec.exe -n 4). In the obtained listing file I can identify the following messages : "NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 1
TELEMAC-2D : 4
LA VALEUR 1 EST GARDEE"
and
"OUVERTURE DES FICHIERS POUR TELEMAC2D
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 1
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 1
PARAL NCSIZE = 4"
and :
"LE MOT CLE : PROCESSEURS PARALLELES
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE..." A part of listing file is given here : ---------------------------------------------------------------------------
STRCHE (BIEF) : PAS DE MODIFICATION DU FROTTEMENT
IL Y A 1 FRONTIERE(S) SOLIDE(S) :
FRONTIERE 1 :
DEBUT AU POINT DE BORD 1 , DE NUMERO GLOBAL 546
ET DE COORDONNEES : 619.8345 5099.195
FIN AU POINT DE BORD 1 , DE NUMERO GLOBAL 546
ET DE COORDONNEES : 619.8345 5099.195
CORFON (TELEMAC2D) : PAS DE MODIFICATION DU FOND
================================================================================
ITERATION 0 TEMPS : 0.0000 S
USING STREAMLINE VERSION 7.0 FOR CHARACTERISTICS
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 1
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 1
PARAL NCSIZE = 4
EXECUTABLE FILE:
C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-0
2-03-09h31min05s\A.EXE
BARRIER PASSED
LISTING DE TELEMAC-2D ---------------------------------------------------------
---------------------
TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC
2D VERSION 7.0 FORTRAN 90
WITH SEVERAL TRACERS
COUPLED WITH SISYPHE AND TOMAWAC
LE MOT CLE : PRECONDITIONNEMENT
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...
LE MOT CLE : PROCESSEURS PARALLELES
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...
FIN DU FICHIER POUR DAMOCLES
NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 1
TELEMAC-2D : 4
LA VALEUR 1 EST GARDEE
********************************************
* LECDON: *
* APRES APPEL DE DAMOCLES *
* VERIFICATION DES DONNEES LUES *
* SUR LE FICHIER DES PARAMETRES *
********************************************
SORTIE DE LECDON. TITRE DE L'ETUDE :
Le barrage de MALPASSET
OUVERTURE DES FICHIERS POUR TELEMAC2D
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 1
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 1
PARAL NCSIZE = 4
EXECUTABLE FILE:
C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-0
2-03-09h31min05s\A.EXE
BARRIER PASSED
-------------------------------------------------------------------------- Would you have any idea of what I can do to run the simulation with several processors? Thank you very much, Olivier
Attachments:
|
The administrator has disabled public write access.
|
Problem with parallel version v7p0 9 years 9 months ago #15707
|
Hello,
you may have to register the MPI service -- usually it asks for credentials the first time you use it. Please check on this post: #15663 Hope this helps, Sébastien. |
The administrator has disabled public write access.
|
Problem with parallel version v7p0 9 years 9 months ago #15709
|
Hi
Not sure but instead of giving the number of processors inside the steering file, maybe you could try to specify it at the launch with the --ncsize=4. To be sure, check the T2DCAS in the temp directory, I think you will find a line with PARALLEL PROCESSORS = 1 Hope this helps PS: Think to update your profile! |
Christophe
The administrator has disabled public write access.
|
Problem with parallel version v7p0 9 years 9 months ago #15734
|
Hello Sebastien and Christophe,
Thank you very much for your answers. I had registered to the MPI service. And I launched the simulation with the option --ncsize=xx, this created in the T2DCAS the line : "PROCESSEURS PARALLELES : xx" (just before the line &FIN) but it did not succeed in perfoming the simulation in parallel. I then tried the commands "smpd -install"; "mpiexec -remove"; " mpiexec -register"; " mpiexec -validate" and " smpd -status" as described above : ---------------------------------------------
-----------------------------------------------------------------------------
C:\Windows\system32>cd C:\opentelemac\MPICH2\examples
C:\opentelemac\MPICH2\examples>smpd -uninstall
Stopping Intel(R) MPI Library Process Manager.
Intel(R) MPI Library Process Manager stopped.
Intel(R) MPI Library Process Manager removed.
C:\opentelemac\MPICH2\examples>smpd -install
Intel(R) MPI Library Process Manager installed.
C:\opentelemac\MPICH2\examples>mpiexec -remove
Account and password removed from the Registry.
C:\opentelemac\MPICH2\examples>mpiexec -register
account (domain\user) [INTRANET\Utilisateur]: INTRANET\Utilisateur
password:
confirm password:
Password encrypted into the Registry.
C:\opentelemac\MPICH2\examples>mpiexec -validate
SUCCESS
C:\opentelemac\MPICH2\examples>smpd -status
smpd running on intranet
-------------------------------------------------------------------------------------------------------------------------- I then tried to run the cpi.exe in C:\opentelemac\MPICH2\examples, I obtained the results above: -------------------------------------------------------------------------------------------------------------------------------
C:\opentelemac\MPICH2\examples>mpiexec -n 2 -localonly cpi.exe
Enter the number of intervals: (0 quits) Enter the number of intervals: (0 quits
) 4
pi is approximately 3.1468005183939427, Error is 0.0052078648041496
wall clock time = 0.000078
Enter the number of intervals: (0 quits)
----------------------------------------------------------------------------------------------------------------------- After that I launch the test simulation of malpasset: -----------------------------------------------------------------------------------------------------------------------
C:\Windows\system32>cd C:\opentelemac\v7p0\examples\telemac2d\malpasset
C:\opentelemac\v7p0\examples\telemac2d\malpasset>telemac2d.py --ncsize=2 -c wint
elmpi t2d_malpasset-small.cas
Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... parsing configuration file: C:\opentelemac\v7p0\configs\systel.cfg
Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+> configuration: wintelmpi
+> root: C:\opentelemac\v7p0
+> version v7p0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... reading the main module dictionary
... processing the main CAS file(s)
+> simulation en Francais
... checking parallelisation
... handling temporary directories
... checking coupling between codes
... first pass at copying all input files
copying: geo_malpasset-small.slf C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-04-13h31min37s\T2DGEO
copying: t2d_malpasset-small.f C:\opentelemac\v7p0\examples\telemac2d\malpa
sset\t2d_malpasset-small.cas_2015-02-04-13h31min37s\t2dfort.f
copying: geo_malpasset-small.cli C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-04-13h31min37s\T2DCLI
copying: f2d_malpasset-small.slf C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-04-13h31min37s\T2DREF
re-copying: C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-sma
ll.cas_2015-02-04-13h31min37s\T2DCAS
copying: telemac2d.dico C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2
d_malpasset-small.cas_2015-02-04-13h31min37s\T2DDICO
... checking the executable
xilink: executing 'link'
created: t2d_malpasset-small.exe
re-copying: t2d_malpasset-small.exe C:\opentelemac\v7p0\examples\telemac2d\mal
passet\t2d_malpasset-small.cas_2015-02-04-13h31min37s\out_t2d_malpasset-small.ex
e
... modifying run command to MPI instruction
... modifying run command to PARTEL instruction
partitioning: T2DGEO
+> C:\opentelemac\v7p0\builds\wintelmpi\bin\partel.exe < PARTEL.PAR >> part
el_T2DGEO.log
0
partitioning: T2DREF
+> C:\opentelemac\v7p0\builds\wintelmpi\bin\partel.exe < PARTEL.PAR >> part
el_T2DREF.log
0
... handling sortie file(s)
Running your simulation(s) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"C:\opentelemac\MPICH2\bin\mpiexec.exe" -n 2 C:\opentelemac\v7p0\examples\telema
c2d\malpasset\t2d_malpasset-small.cas_2015-02-04-13h31min37s\out_t2d_malpasset-s
mall.exe
Credentials for intranet rejected connecting to intranet
Aborting: Unable to connect to intranet
_____________
runcode::main:
:
|runCode: Fail to run
|"C:\opentelemac\MPICH2\bin\mpiexec.exe" -n 2 C:\opentelemac\v7p0\examples\te
lemac2d\malpasset\t2d_malpasset-small.cas_2015-02-04-13h31min37s\out_t2d_malpass
et-small.exe
|~~~~~~~~~~~~~~~~~~
|
|~~~~~~~~~~~~~~~~~~
C:\opentelemac\v7p0\examples\telemac2d\malpasset>
------------------------------------------------------------------------------------------------------------------------------------- I don't understand why I obtain the message : "Aborting: Unable to connect to intranet", whereas it worked with the cpi.exe. Could it be due to my configuration file : systel.cfg ?(I put it as an attached file) in which I have the option : "mpi_hosts: -mapall" (should I put " mpi_hosts: -localonly"?) Another surprising point is that when I launch the command : "mpiexec.exe -n 2 out_t2d_malpasset-small.exe" in the temporary folder : " C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-02 -04-13h31min37s", it seems that the run is performed twice, as if mpiexec started 2 processes of same rank 0? Listing obtained is shown here : ================================================================================
C:\Windows\system32>cd C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malp
asset-small.cas_2015-02-04-13h31min37s
C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-02
-04-13h31min37s>mpiexec.exe -n 2 out_t2d_malpasset-small.exe
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 1
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 1
PARAL NCSIZE = 2
EXECUTABLE FILE:
C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-0
2-04-13h31min37s\A.EXE
BARRIER PASSED
LISTING DE TELEMAC-2D ---------------------------------------------------------
---------------------
TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC
2D VERSION 7.0 FORTRAN 90
WITH SEVERAL TRACERS
COUPLED WITH SISYPHE AND TOMAWAC
LE MOT CLE : PRECONDITIONNEMENT
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...
FIN DU FICHIER POUR DAMOCLES
NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 1
TELEMAC-2D : 2
LA VALEUR 1 EST GARDEE
********************************************
* LECDON: *
* APRES APPEL DE DAMOCLES *
* VERIFICATION DES DONNEES LUES *
* SUR LE FICHIER DES PARAMETRES *
********************************************
SORTIE DE LECDON. TITRE DE L'ETUDE :
Le barrage de MALPASSET
OUVERTURE DES FICHIERS POUR TELEMAC2D
*****************************
* ALLOCATION DE LA MEMOIRE *
*****************************
READGEO1 : TITRE= TELEMAC 2D : RUPTURE DE BARRAGE SUR FOND SEC$
NOMBRE D'ELEMENTS: 26000
NOMBRE REEL DE POINTS: 13541
FORMAT NON PRECISE DANS LE TITRE
MXPTEL (BIEF) : NOMBRE MAXIMUM D'ELEMENTS VOISINS D'UN POINT : 9
NOMBRE MAXIMUM DE POINTS VOISINS D'UNPOINT : 9
CORRXY (BIEF) : PAS DE MODIFICATION DES COORDONNEES
MAILLAGE : MESH ALLOUE
****************************************
* FIN DE L'ALLOCATION DE LA MEMOIRE : *
****************************************
INBIEF (BIEF) : MACHINE NON VECTORIELLE (SELON VOS DONNEES)
STRCHE (BIEF) : PAS DE MODIFICATION DU FROTTEMENT
IL Y A 1 FRONTIERE(S) SOLIDE(S) :
FRONTIERE 1 :
DEBUT AU POINT DE BORD 1 , DE NUMERO GLOBAL 546
ET DE COORDONNEES : 619.8345 5099.195
FIN AU POINT DE BORD 1 , DE NUMERO GLOBAL 546
ET DE COORDONNEES : 619.8345 5099.195
CORFON (TELEMAC2D) : PAS DE MODIFICATION DU FOND
================================================================================
ITERATION 0 TEMPS : 0.0000 S
USING STREAMLINE VERSION 7.0 FOR CHARACTERISTICS
MASTER PROCESSOR NUMBER 0 OF THE GROUP OF 1
P_INIT: FILE PARAL IS INCONSISTENT WITH MPI PARAMETERS
MPI NCSIZE = 1
PARAL NCSIZE = 2
EXECUTABLE FILE:
C:\opentelemac\v7p0\examples\telemac2d\malpasset\t2d_malpasset-small.cas_2015-0
2-04-13h31min37s\A.EXE
BARRIER PASSED
LISTING DE TELEMAC-2D ---------------------------------------------------------
---------------------
TTTTT EEEEE L EEEEE M M AAAAA CCCCC
T E L E MM MM A A C
T EEE L EEE M M M AAAAA C
T E L E M M A A C
T EEEEE LLLLL EEEEE M M A A CCCCC
2D VERSION 7.0 FORTRAN 90
WITH SEVERAL TRACERS
COUPLED WITH SISYPHE AND TOMAWAC
LE MOT CLE : PRECONDITIONNEMENT
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...
FIN DU FICHIER POUR DAMOCLES
NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 1
TELEMAC-2D : 2
LA VALEUR 1 EST GARDEE
********************************************
* LECDON: *
* APRES APPEL DE DAMOCLES *
* VERIFICATION DES DONNEES LUES *
* SUR LE FICHIER DES PARAMETRES *
********************************************
SORTIE DE LECDON. TITRE DE L'ETUDE :
Le barrage de MALPASSET
OUVERTURE DES FICHIERS POUR TELEMAC2D
forrtl: Le processus ne peut pas accÚder au fichier car ce fichier est utilisÚ p
ar un autre processus.
forrtl: severe (30): open failure, unit 8, file C:\opentelemac\v7p0\examples\tel
emac2d\malpasset\t2d_malpasset-small.cas_2015-02-04-13h31min37s\T2DRES
Image PC Routine Line Source
out_t2d_malpasset 000000014007DEC8 Unknown Unknown Unknown
out_t2d_malpasset 0000000140079169 Unknown Unknown Unknown
out_t2d_malpasset 000000014002E3BD Unknown Unknown Unknown
out_t2d_malpasset 0000000140017D17 Unknown Unknown Unknown
out_t2d_malpasset 00000001400175E1 Unknown Unknown Unknown
out_t2d_malpasset 0000000140020E7F Unknown Unknown Unknown
out_t2d_malpasset 000000014028791D Unknown Unknown Unknown
out_t2d_malpasset 000000014008CC0C Unknown Unknown Unknown
out_t2d_malpasset 000000014008652C Unknown Unknown Unknown
out_t2d_malpasset 0000000140065F3F Unknown Unknown Unknown
kernel32.dll 000000007765652D Unknown Unknown Unknown
ntdll.dll 000000007788C541 Unknown Unknown Unknown
================================================================================ Regards, Olivier PS : my profile is now updated
Attachments:
|
The administrator has disabled public write access.
|
Problem with parallel version v7p0 9 years 9 months ago #15735
|
Hi Olivier
It seems your mpi service is well installed and run well (as the example shown) For the telemac run, you could try to change the configuration to -localonly as it works for cpi.exe, we could think it will work for telemac. I'm not sure about the -mapall option but maybe it could create problems as it should look on the network... For the other problem, the process was probably launched twice that's why you have a problem with some file access. What is surprising is the message MPI NCSIZE = 1! Hope the localonly will solve your problem because I've got no other ideas! regards Christophe PS: think to use tags in you post to create more readable post. |
Christophe
The administrator has disabled public write access.
|
Problem with parallel version v7p0 9 years 9 months ago #15892
|
Hello Christophe,
Thank you very much for your answer. I tried to change the configuration to -localonly but without success. I think I have a problem with mpiexec with Intel. When I launch the example cpi.exe with -n 4, I obtain 4 times the message "Enter the number of intervals: (0 quits) Enter the number of intervals: (0 quits), Enter the number of intervals: (0 quits) ,Enter the number of intervals: (0 quits)" I don't have this problem with Gfortran, for which I can launch my simulation in parallel without problem . I will try to use again mpiexec with Intel, and I will tell you, Regards, Olivier |
The administrator has disabled public write access.
|
Moderators: borisb