Welcome, Guest
Username: Password: Remember me

TOPIC: Parrallele installation

Parrallele installation 14 years 2 months ago #545

  • Eric.chateauminois
  • Eric.chateauminois's Avatar
Bonjour

Je suis en train d'essayer d'installer la version parallèle de Telemac2D.
En local cela fonctionne et j'arrive à solliciter les 4 coeurs de mon processeur.
Pour une utilisation sur plusieurs machines "en grappes" je suis confronté à un problème:
-Comme indiqué sur la notice, MPICH2 a été installé sur chacun des ordinateurs
-le systel.ini a bien été modifié en activant les deux lignes suivantes:
#- réseau de PCs
RUN_MPI="mpiexec -file mpirun.txt"
RUN_MPI="mpiexec -logon : -machinefile mpirun.txt -n <N> <EXE>"
-le fichier mpi_telemac.conf est bien configuré
4
3213-chateaumin 2
2879-pc-simul 2


Lors du lancement du run j'obtiens l'erreur suivante:

C:\TELEMAC\TestMultiProc>telemac2d para_c95_10s_h_v_parall-4proc.txt

===================================================
Telemac System 5.9 - Perl scripts version V5P9-0
===================================================
starting...

HOSTTYPE : win
PROJECT : C:\TELEMAC\V5P9
BASE DIRECTORY : C:\TELEMAC\TestMultiProc
LAUNCH DIRECTORY : C:\TELEMAC\TestMultiProc
WORK DIRECTORY : C:\TELEMAC\TestMultiProc\para_c95_10s_h_v_parall-4proc.txt274
4_tmp
PARAMETER FILE : para_c95_10s_h_v_parall-4proc.txt


*** Using default configuration file :
C:\TELEMAC\V5P9\config\systel.ini ***



*** Using specific version v5p9 ***


*** Using CUSTOM MPI configuration file :
C:\TELEMAC\TestMultiProc\mpi_telemac.conf ***


*** TELEMAC2D ON STATION ***


*** Interactive mode ***


*** RELEASE V5P9 ***

________________________________________________________
Steering file : para_c95_10s_h_v_parall-4proc.txt
________________________________________________________

________________________________________________________
Starting execution: telemac2d.bat
________________________________________________________
- FORTRAN FILE : para_c95_10s_h_v_parall-test.f

______________________________________________________________________________
*** LOCAL EXECUTABLE ***

para_c95_10s_h_v_parall-test_win_MP_v5p9.exe

______________________________________________________________________________
*** ALLOCATION OF USER FILES ***

- STEERING FILE : para_c95_10s_h_v_parall-4proc.txt

- DICTIONARY : telemac2dv5p9.dico

- GEOMETRY FILE : para_c95_1s.geo

(split for 4 processors)
- BOUNDARY CONDITIONS FILE : para_c95_1s_h_v.conlim

(split for 4 processors)
______________________________________________________________________________
*** MPI MACHINE ***
MPI machine ok (with 4 processors).
______________________________________________________________________________
*** RUNNING ***

MPI launcher : mpiexec -logon : -machinefile mpirun.txt -n 4 out2744_win.exe
Error: no executable specified
Unable to parse the mpiexec command arguments.
Duration of job : 0 seconds ( 0:0:0 ) (system=0 sec)
______________________________________________________________________________
*** FILES DELIVERY ***

- RESULTS FILE : para_c95_10s_h_v-4proc.res

ERROR : RESTITUTION FILE para_c95_10s_h_v-4proc.res
________________________________________________________
Execution finished: telemac2d.bat
________________________________________________________
No compilation/linking/file errors detected.
No execution errors detected.
Returning exit status 0

===================================================
Telemac System 5.9 - Perl scripts version V5P9-0
===================================================
...stopping.


A noter que les applications wmpiconfig fonctionnent sur chaque ordinateur : la version pmich est detectée sur l'ordinateur distant (dans les deux sens) mais quand je les "scan" j'obtiens un message d'erreur:
3213-CHATEAUMIN
error = 3213-CHATEAUMIN: MPICH2 not installed or unable to query the host
2879-PC-SIMUL
error = 2879-PC-SIMUL: MPICH2 not installed or unable to query the host


Avez vous une idée d'où peut venir le problème? Merci par avance.

Cordialement
The administrator has disabled public write access.

Re:Parrallele installation 14 years 2 months ago #550

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
I think there is two different problems.
    Only one line with RUN_MPI should exist
    There is a problem with MPI (the problem with the scan)
Try to solve this problem before trying to run Telemac
The message
Error: no executable specified
is probably linked to the RUN_MPI line command. You should test with only the first line.

Good luck
Christophe
The administrator has disabled public write access.

Re:Parrallele installation 14 years 2 months ago #562

  • Eric.chateauminois
  • Eric.chateauminois's Avatar
Thanks for your reply

I tried to run Telemac with only the first line and the error changed to the following:

*** MPI MACHINE ***
MPI machine ok (with 4 processors).
______________________________________________________________________________
*** RUNNING ***

MPI launcher : mpiexec -file mpirun.txt
Credentials for eric rejected connecting to 3213-chateaumin
Aborting: Unable to connect to 3213-chateaumin
Duration of job : 4 seconds ( 0:0:4 ) (system=0 sec)
______________________________________________________________________________
*** FILES DELIVERY ***

- RESULTS FILE : para_c95_10s_h_v-4proc.res

ERROR : RESTITUTION FILE para_c95_10s_h_v-4proc.res
________________________________________________________
Execution finished: telemac2d.bat
________________________________________________________
No compilation/linking/file errors detected.
No execution errors detected.
Returning exit status 0

===================================================
Telemac System 5.9 - Perl scripts version V5P9-0
===================================================
...stopping.


C:\TELEMAC\TestMultiProc>


The problem with the wmpiconfig "scan" still remain. Are there some minimal feature about the network to run in multi-PC mode?

Thank you in advance
The administrator has disabled public write access.

Re:Parrallele installation 14 years 2 months ago #584

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello Eric,

Are you under WindowsXP ?

I believe your might have a few more problems with your setup - I will just mention two for now and advise on the solution depending on your answer ...

Problem 1: I noted that your working directory is C:\ ...
Problem 2: WindowsXP has a default registry setup on the number of access to the same file ...

But first an introduction note:

Your cluster should be capable of running TELEMAC under the following conditions (and we have other Windows users who do operate under these conditions):
- a run in parallel on one computer Xo, launched from a DOS command on that same computer Xo (what you are doing at the moment)
- a run in parallel on one computer Y, launched from a DOS command on another computer Xo
- a run in parallel on two or more computers, Xo, X1,X2, ..., launched from a DOS command on one of the same computers, for example Xo
- a run in parallel on two or more computers, Yo,Y1,Y2, ..., launched from a DOS command on another computer Xo

The version and compilation of TELEMAC that you use under the configurations above is the one you access through the DOS command - i.e. Xo. The important/critical point here is that you will get a much better speed-up if you split your computation over all your computers, even if you use one 1 core/processor of each computer. You can actually setup a default mpi_telemac.cong file to include:
Xo 1
Y1 1
y2 1
...
and use more core of each processor (I would advise not more than 2 for quad-core intel)

Last but not least -- where do you store your input/output files (related to Problem 1). This is where you need drive mapping.

Say your files are on Zo, say a Windows Server 2003 computer, you have to map the directory (or one of the root directories including all your TELEMAC project/simulation files) to a letter on the computer you are running the DOS command from, say using the letter T:. But that is not enough -- you need also all the other computers on the cluster to understand T:\...\..\..\files as being the same absolute location of your files, which implies you need to map in the exact same way, the same directory, using the same letter T: on all computers within the cluster (or all those you will use in your simulation).

In your case, while C:\TELEMAC\TestMultiProc is a location known by computer Xo, it does not reference the same absolute location on computer Y1 - C:\TELEMAC\TestMultiProc might not even exist on Y1.

However, having setup this configuration for many users, I have noted that storing your file on a distant server could slow down your simulation dramatically. It would be much faster if the files where stored on one of the computers of your cluster. The procedure above remains exactly the same, in that your need to map with the same letter, the same location on all computers, including Xo, where the files could be. If you do not have enough disk space to store all project files on one computer, you can store this on different computers, then using different letters, for each location or individuals to point to that individual's or project's location.

It does not matter whether your simulation files are where the TELEMAC install is.

Regarding Problem 2, WindowsXP has a manufactured maximum number of accesses to the same file. To allow WindowsXP (and the most recent patch solve this) to get up to 4 connection (running TELEMAC with 4 core, whether on the same computer or not) you need to change a couple of registry keys. Ideally, to get up to many more connection, you need to store your files on a Windows Server 2003 (and above). Windows Server allows for more than 4 connections.

You can do the tests using your organisation’s file server computers even if it is much slower. Ideally you need one of your cluster PC to be a file sharing system. Note that I have seen configuration where the file sharing system is a linux Samba installed on a PC, dual-booted with standard WindowsXP. Samba will authorise the multiple connections. But this gets into expert IT ...

Hope this helps

Sébastien.
The administrator has disabled public write access.

Re:Parrallele installation 14 years 2 months ago #586

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
... I forgot to mention also ...

Having mapped the same drive letter (T:) on all your computers, you need to run your simulation from T:\...\...\files under dos.

Please let us know if this solves your problems.
The administrator has disabled public write access.

Re:Parrallele installation 13 years 7 months ago #1281

  • OBA
  • OBA's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 19
  • Thank you received: 4
Hi,

i have installed Telemac 6.0 and compiled it with g95.
Simulations with single processor work well, I've tried several tests.
Now, I would like install parallelism.
I've already installed metis and mpich2.
First of all : is it possible to run parallelism version with g95?
Lines in my systel.ini are as follow :
LIBS_MPI="<TELEMAC_HOME>\MPICH2\lib\fmpich2.lib"
RUN_MPI="mpiexec -localonly <N> <EXE>"
but i don't know what to put for "FC_MPI=" and "K_MPI=".
And, in what directory should i copy the libmetis.a file?
Thanks for your answers

Olivier
The administrator has disabled public write access.

Re:Parrallele installation 13 years 7 months ago #1288

  • ails
  • ails's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 17
OBA : "First of all : is it possible to run parallelism version with g95?"

As far as I know, we only managed to configure MPICH2 on Windows with the Intel Fortran Compiler.

BUT, you can find a bit of information in the MPICH2 Windows Development Guide (released recently, see for instance §9.11.3):
www.mcs.anl.gov/research/projects/mpich2....3.2-windevguide.pdf

Please let us know if you succeed and how you did it... However, it's an issue we have to investigate.

Best regards,

Fabien
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.