Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: WIN EXE Version 6.0 Parallel Multiple PCs

WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2296

  • joysanyal21
  • joysanyal21's Avatar
Hi,

I am trying to run TELEMAC2D on multiple PCs whch are connected through network and part of a same workgroup. I am using the EXE version for WINDOWS (6.0). I placed all necessary input files in a network drive which is mapped in all PCs using same letter.

I modified the systel file in TELEMAC Config folder by uncomment the line

RUN_MPI="mpiexec -file mpirun.txt" and kept all the other options commented.

When Iam trying to launch TELEMAC2d from DOS using one of the participating PCs it give me the following error message
----------------------------------------------------------------------------
R:\TELEMAC\Steady>telemac2d

=========================================================
 Telemac System Freeware 6.0 - Perl scripts version V6.0
=========================================================
starting...

HOSTTYPE         : win
PROJECT          : C:\TELEMAC\V6P0
BASE DIRECTORY   : R:\TELEMAC\Steady
LAUNCH DIRECTORY : R:\TELEMAC\Steady
WORK DIRECTORY   : R:\TELEMAC\Steady\cas3128_tmp
PARAMETER FILE   : cas


*** Using default configuration file :
    C:\TELEMAC\V6P0\config\systel.ini ***



*** Using CUSTOM MPI configuration file :
    R:\TELEMAC\Steady\mpi_telemac.conf ***


*** TELEMAC2D ON STATION ***


*** Interactive mode ***


*** RELEASE v6p0 ***

________________________________________________________
Steering file   :      cas
________________________________________________________

________________________________________________________
Starting execution: telemac2d.bat
________________________________________________________
______________________________________________________________________________
*** DEFAULT PARALLEL EXECUTABLE ***

   C:\TELEMAC\V6P0\telemac2d\tel2d_v6p0\win\telemac2dv6p0_MP.exe
______________________________________________________________________________
*** ALLOCATION OF USER FILES ***

 - STEERING FILE                          : cas
 - DICTIONARY                             : telemac2dv6p0.dico

 - GEOMETRY FILE                          : Geo.slf
    (split for 10 processors)
 - BOUNDARY CONDITIONS FILE               : cas.conlim

    (split for 10 processors)
 - PREVIOUS COMPUTATION FILE              : condInit.ser

    (split for 10 processors)
______________________________________________________________________________
*** MPI MACHINE ***
 MPI machine ok (with 10 processors).
______________________________________________________________________________
*** RUNNING ***

 MPI launcher  : mpiexec -file mpirun.txt
 MASTER PROCESSOR NUMBER            0  OF THE GROUP OF           16
 P_INIT: FILE PARAL NOT FOUND

job aborted:
rank: node: exit code[: error message]
0: GEOG125x32.geog.ad.dur.ac.uk: 0: process 0 exited without calling finalize
1: GEOG125x32.geog.ad.dur.ac.uk: 123
2: GEOG125x32.geog.ad.dur.ac.uk: 123
3: GEOG125x32.geog.ad.dur.ac.uk: 123
4: GEOG125x32.geog.ad.dur.ac.uk: 123
5: GEOG125x32.geog.ad.dur.ac.uk: 123
6: GEOG125x32.geog.ad.dur.ac.uk: 123
7: GEOG125x32.geog.ad.dur.ac.uk: 123
8: DOGPC160XP.geog.ad.dur.ac.uk: 123
9: DOGPC160XP.geog.ad.dur.ac.uk: 123
10: DOGPC160XP.geog.ad.dur.ac.uk: 123
11: DOGPC160XP.geog.ad.dur.ac.uk: 123
12: DOGPC160XP.geog.ad.dur.ac.uk: 123
13: DOGPC160XP.geog.ad.dur.ac.uk: 123
14: DOGPC160XP.geog.ad.dur.ac.uk: 123
15: DOGPC160XP.geog.ad.dur.ac.uk: 123
 Duration of job : 3 seconds ( 0:0:3 ) (system=0 sec)
______________________________________________________________________________
*** FILES DELIVERY ***

 - RESULTS FILE                           : i7XPTest1
ERROR : RESTITUTION FILE i7XPTest1
________________________________________________________
Execution finished: telemac2d.bat
________________________________________________________
No compilation/linking/file errors detected.
No execution errors detected.
Returning exit status 0

=========================================================
 Telemac System Freeware 6.0 - Perl scripts version V6.0
=========================================================
...stopping.

---------------------------------------------------------------------------
Can anybody please help me to solve this problem?

P.S. TELEMAC2d is working fine locally with MPICH2 using multiple cores of WINXP 32 bit PC.

Thanks and regards,

Joy
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2299

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
When you run a parallel simulation locally, could you check you have a file named PARAL in the temporary directory?
Could you check this file also exist when you're trying to run the same simulation on a network?

It looks like a problem of configuration
Hope this helps
Christophe
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2302

  • joysanyal21
  • joysanyal21's Avatar
Hi,

Thank you for your help. I checked my local parallel run (multiple core) and there is a PARAL file in the temp directory.

However, for the multiple PC run I couldn't check it as the temporary folder is just created for a a second and the moment TELEMAC crashes it deletes the tmp folder. It is not possible to check the contains of the temp folder in the multi cpu run as the temp folder exists only for a second or so.

Thanks and regards,

Joy
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2303

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
You could avoir the erase of temp folder by adding the -t option when you run the simulation
Christophe
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2304

  • ails
  • ails's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 17
Hi,

I'm not sure if it's of any help to you but a similar topic exists in : www.opentelemac.org/index.php?option=com...temid=62&lang=fr#586

Regards,

Fabien Decung
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2307

  • joysanyal21
  • joysanyal21's Avatar
Thanks Fabien for pointing to that thread. As a matter of fact, before starting this parallel run I gathered all information from that post. I did what was recommended in it but still doen't work.
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2305

  • joysanyal21
  • joysanyal21's Avatar
Hi,

Thanks for your help. I managed to keep the tmp folder in multiple PC run, although it failed and gave the following message which is somewhat different from the previous one:
---------------------------------------------------------------_____________________________________________________________________________
*** MPI MACHINE ***
 MPI machine ok (with 9 processors).
______________________________________________________________________________
*** RUNNING ***

 MPI launcher  : mpiexec -file mpirun.txt
User credentials needed to launch processes:
account (domain\user) [GEOG\tpcv24]:
password:
 MASTER PROCESSOR NUMBER            0  OF THE GROUP OF            9
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)....................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(292)...............:
MPIR_Barrier_or_coll_fn(121).........:
MPIR_Barrier_intra(83)...............:
MPIC_Sendrecv(192)...................:
MPIC_Wait(540).......................:
MPIDI_CH3I_Progress(353).............:
MPID_nem_mpich2_blocking_recv(905)...:
MPID_nem_newtcp_module_poll(37)......:
MPID_nem_newtcp_module_connpoll(2655):
gen_read_fail_handler(1145)..........: read from socket failed - The specified n
etwork name is no longer available.

 P_INIT: FILE PARAL NOT FOUND

job aborted:
rank: node: exit code[: error message]
0: GEOG125x32.geog.ad.dur.ac.uk: 0: process 0 exited without calling finalize
1: GEOG125x32.geog.ad.dur.ac.uk: 123
2: GEOG125x32.geog.ad.dur.ac.uk: 123
3: GEOG125x32.geog.ad.dur.ac.uk: 123
4: GEOG125x32.geog.ad.dur.ac.uk: 123
5: GEOG125x32.geog.ad.dur.ac.uk: 123
6: GEOG125x32.geog.ad.dur.ac.uk: 123
7: GEOG125x32.geog.ad.dur.ac.uk: 123
8: DOGPC160XP.geog.ad.dur.ac.uk: 1: process 8 exited without calling finalize
 Duration of job : 12 seconds ( 0:0:12 ) (system=0 sec)
______________________________________________________________________________
*** FILES DELIVERY ***

 - RESULTS FILE                           : i7XPTest1
ERROR : RESTITUTION FILE i7XPTest1
________________________________________________________
Execution finished: telemac2d.bat
________________________________________________________
No compilation/linking/file errors detected.
No execution errors detected.

Working directory: T:\JS\TELEMAC\Steady\cas6140_tmp
can be manually deleted with: T:\JS\TELEMAC\Steady\delete_cas6140.bat

Returning exit status 0
---------------------------------------------------------------

The PARAL file does exist in the temp folder of multiple PC case run and the content of it is as follows:

9
33
T:\JS\TELEMAC\Steady\cas6140_tmp\

Thanks and regards,
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2311

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
Could you said us if PARAL is similar to the one which is generated on a local run?

For me the new message looks like a problem with your MPI installation.

Hope this helps
Christophe
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2323

  • joysanyal21
  • joysanyal21's Avatar
Hi,
The content of PARAL for a successful local run with MPI (multiple core) looks like this:

2
24
C:\F\Steady\cas2852_tmp\

Do you think it is a MPI installation issue? I installed MPI with Admin rights, it works fine for multiple cores, what might I possibly have missed?

One more thing, can it be a Windows firewall issue? I understand that MPICH2 adds itself in the firewall exception list but what about the temporary executable? Can it get blocked on the PCS when it is been created in a network drive (my working directory) Please let me know what is your experience with this kind of installation.

Thanks again for your help.

Joy
The administrator has disabled public write access.

Re: WIN EXE Version 6.0 Parallel Multiple PCs 13 years 2 months ago #2349

  • joysanyal21
  • joysanyal21's Avatar
Hi,

Just for an update, I tried to run with WIN Firewall disabled, still gives me the same error as mentioned in the previous post.

For information, I installed MPICH2 with admin right by just duoble clicking the installer package and selecting all default options.

Thanks for your help,

Joy
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.