Welcome, Guest
Username: Password: Remember me

TOPIC: Problem Version 6.1 parallel Multiple PC

Problem Version 6.1 parallel Multiple PC 13 years 2 months ago #2574

  • joysanyal21
  • joysanyal21's Avatar
Hi,

With lots of help from the people in this forum I managed to compile the V 6.1 in WIN7 -32 bit using Intel Fortran compiler (using perl script).

It runs fine in serial mode. It also runs perfectly in parallel localonly mode. Problem occurs when I try to run it on multiple PCs. I ran into similar trouble with version 6.0 also.

The main bit of error looks like this:

** MPI MACHINE ***
MPI machine ok (with 6 processors).
______________________________________________________________________________
*** RUNNING ***

MPI launcher : mpiexec -file mpirun.txt
User credentials needed to launch processes:
account (domain\user) [GEOG\tpcv24]:
password:
launch failed: CreateProcess(\\server1\pgtemp$\JS\Steady\T:\JS\Steady\c
as528_tmp\out528_wintelmpi.exe) on 'Host1' failed, error 2 -
The system cannot find the file specified.

launch failed: CreateProcess(\\server1\pgtemp$\JS\Steady\T:\JS\Steady\c
as528_tmp\out528_wintelmpi.exe) on 'Host2' failed, error 2
- The system cannot find the file specified.

launch failed: CreateProcess(\\server1\pgtemp$\JS\Steady\T:\JS\Steady\c
as528_tmp\out528_wintelmpi.exe) on 'Host1' failed, error 2 -
The system cannot find the file specified.

launch failed: CreateProcess(\\server1\pgtemp$\JS\Steady\T:\JS\Steady\c
as528_tmp\out528_wintelmpi.exe) on 'Host2' failed, error 2
- The system cannot find the file specified.

launch failed: CreateProcess(\\server1\pgtemp$\JS\Steady\T:\JS\Steady\c
as528_tmp\out528_wintelmpi.exe) on 'Host1' failed, error 2
- The system cannot find the file specified.

Error posting readv, An established connection was aborted by the software in yo
ur host machine.(10053)
unable to post a read for the command string,
sock error: Error = 10053

Error posting readv, An established connection was aborted by the software in yo
ur host machine.(10053)
Duration of job : 7 seconds ( 0:0:7 ) (system=0 sec)
______________________________________________________________________________
*** FILES DELIVERY ***

- RESULTS FILE : i7XPTest1
ERROR : RESTITUTION FILE i7XPTest1
________________________________________________________
Execution finished: telemac2d.bat
________________________________________________________
No compilation/linking/file errors detected.
No execution errors detected.
'.' is not recognized as an internal or external command,
operable program or batch file.
## Error : System command failed for ./delete_cas528.bat :256

Can anybody suggest what could be the possible solution for this? Anybody who have successfully ran it on multiple PCs in windows may suggest me the exact configuration for the MPI part of the .ini file!

I am using MPI_RUN="mpiexec -file mpirun.txt" and commented out all other options on the participating PCs. is there anything wrong here?

Thanks and regards,

Joy
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2720

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello Joy,

It has been a while since I used TELEMAC with Perl across a network of computers. The "mpiexec -file mpirun.txt" should work. Here is what I can advise ...

1.- You need to make sure MPI is installed on all computers, and that the same username/password can log on each computer.

2.- You need to create a common directory structure on all your computer, i.e. map the same letter (say T:) to the same location on a disk supporting Windows Server (for a Windows cluster) so that all computers including your master will see T: as the same path to the same area of your commoon disk storage.

3.- You may have to use the uncpath option. But let me know if you have trouble at this stage ...

Hope this helps,

Sébastien
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2724

  • joysanyal21
  • joysanyal21's Avatar
Hi Sébastien,

Thank you very much for your reply. I have been doing exactly what you suggested here. I am running it from T drive which a common network drive to all participating PCs. Each participating PC has MPICH2 installed and they can run TELEMAC2d in localonly parallel mode without any trouble.

I don't know anything about uncpath option and I don't have any background in FORTRAN. So I won't be able to play with the source code. I can give the pyton installation a try but it will be of great help if you can suggest me anything in the perl environment.

Regards,

Joy
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2725

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Alright, Perl it is.

I need to know a little bit more about your cluster of computers:
- what operating system do you have on your computers (nodes, master and file share server)
- are these all 32bit or/and 64bit computers
- what version of MPI do you have installed (I assume it is the same)

Sébastien.
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2726

  • joysanyal21
  • joysanyal21's Avatar
Hi Sébastien,

I could never been able to run telemac in localonly parallel mode in 64-bit windows, although I installed 32-bit MPICH2 in them. All the quad core i7 PCs of our department are on 64-bit win 7 and I can't use them for parallel.

So far I have tried with 2 WIN7 32-bit PCs. The common network drive is hosted in some kind of windows server. I am not sure about its config, only the IT guys in our department (Durham University, Geography, UK) will be able to answer it properly.
If it is very important I will ask them.

MPICH2 version is mpich2-1.3.2p1-win-ia32 and this same version is installed in both PCs of my 'cluster'.

For your information, I haven't installed MPICH2 in our department server where the common t drive is hosted. I believe it is not required.
Thanks
Joy
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2727

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi joy

I don't know exactly what you've done for the compilation of v6.1 on your computer but all I know is it's possible to make a full 64bits version of Telemac 6.1 which run under W7.
I've done it recently using:
strawberry-perl-5.12.1.0-64bit.msi
mpich2-1.3.2p1-win-x86-64.msi
Intel Parallel studio XE2011 64 bits mode
On my computer (W7 64bits) i could run my telemac model in localonly on the 4 core without any problem.
It just remain an error message at the beginning (about a refused access to the HKEY_LOCAL_MACHINE\SOFTWARE\MPICH\SMPD\process\XXX registrykey) but the computation run perfectly

Nevertheless, a possible reason could be the way you installed MPICH2. It seems to me that i installed manually from a command windows with administateur rights because with a double click the installation didn't launch the SMPD process...

Hope this helps
Christophe
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2732

  • joysanyal21
  • joysanyal21's Avatar
Hi,

What did you do with the compilation of PARTEL in 64 bit. Since the metis 4.0 library is in 32 bit I don't don't know what to do for this step. Are you using a pre-compiled PARTEl in 64 bit.

For the successful build of localonly parallel version in WIN7 32 bit I used the global command makepar90 and it worked with Intel Parallel studio XE2011 32-bit.
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2735

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
No I had the metis 4.0 sources for a long time (but it's possible to downlaod it on the web (see a specific post in the linux section) so I made a compilation of metis library in 64 bits.
Christophe
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2729

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Joy, Christophe,

I also found that I had to shut down the windows firewall - could that be it ?

Sébastien.
The administrator has disabled public write access.

Re: Problem Version 6.1 parallel Multiple PC 13 years 1 month ago #2730

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Sebastien
On my computer, the firewall is active.
Christophe
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.