Welcome, Guest
Username: Password: Remember me

TOPIC: Telemac 3D in parallel hangs after "Running your simulation(s)"

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21840

  • bmater
  • bmater's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 49
  • Thank you received: 3
Hi,

When attempting to run T3D v6r3r2 on Windows7 in parallel, the program hangs after delivering the message "Running your simulation(s)." The temporary directory and files have all been created, but are not updated. Nothing happens. When I kill the program, I get the message "mpiexec aborting job..." and a traceback. The traceback shows that the program is hanging up in the threading.py python script (line 339 within the "wait" function to be precise). Does anyone have any idea what this function does or what might be going on? Strangely, I've been able to run in parallel before without this problem...not sure what has changed in the interim. Any help would be appreciated. Thanks in advance!

-Ben
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21842

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Dear Ben,

I had a similar issue when trying to run Telemac 2D for the first time in parallel that was resolved with checking that:
  • There are no other installations of Smpd/Mpiexec that could somehow redirect the mpirun or mpiexec command to another folder different to the opentelemac's one.
  • That the smpd service and mpiexec is properly running. In the parallel telemac cmd type:
    mpiexec -validate
    smpd -status
    if the ouput of both above lines are other than ¨success¨ or ¨smpd running on...¨ (soething like that) you need to register mpiexec and restart smpd by doing:
    smpd -stop
    mpiexec -remove 
    mpiexec -register
    smpd -start
    This will ask you in some point your username and password (of the session). And check again validate and status. Then re-run your simulation in the same cmd that you were working on

Also a fresh install or upgrade to the current release (v7p1r1) might be a solution...

kind regards,

José D.
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21853

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi

Usually, this kind of problem exist just because MPI waiting for account/password information but due to the execution of Telemac and the capture of message by the python script, you don't see this request.

To confirm, it's possible after the kill to go into the temp directory and run the same command like in the telemac2d execution (something like mpiexec [parameters] out_telemac3d.exe
In this case, without the python script, you could see the mpi message asking your account and password...
In a command windows, once the account and password had been entered, you could run again the python script which will work without any problem.

Then you could follow Jose's advice to register your account and password to allow all the future execution running well.

Regards
Christophe
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21862

  • bmater
  • bmater's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 49
  • Thank you received: 3
Thanks Jose and Christophe,

Jose:
  • How do I check that there are no other installations of Smpd/Mpiexec?
  • mpi -validate and smpd -status both indicate indicate that the smpd service and mpiexec are running properly. I went ahead and re-registered my username/password anyway. Unfortunately, this didn't resolve the issue. Perhaps I should update my version as you suggest...

Christophe: I was able to get the program to run after running mpiexec in the temporary directory of the killed simulation. I was not, however, prompted for a password...the simulation just immediately started. The problem still persists when I try to run a new simulation...any additional thoughts?

Thanks so much,
Ben
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21863

  • bmater
  • bmater's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 49
  • Thank you received: 3
Also, how do I merge the individual results files after running without the python scripts? It would be nice to look at my results while I'm figuring out the mpi issue.

Thanks.
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21887

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Dear Ben,

Usually the smpd service can be seen in the windows task manager in the Services tab, but i would rather check in the installed programs other mpi instances (e.g. HPC MS-MPI, openMPI, IntelMPI...). The System PATH variable is a good place to look too.

There is something that in windows always seems to cause troubles, doesn't seem to be this case but...did you try to restart & register smpd/mpiexec using a cmd (not the telemac cmd) with admin rights?

About the command to merge the "incomplete" parallel simulation, in the parallel cmd execute:
cd the/parentfolder/of/simulation

runcode.py --merge -w YOURCASNAME-INCOMPLETEFOLDER.cas_DATE telemac2d YOURCASNAME.cas -c THENAMEOFYOURBUILD

Personal example:
runcode.py --merge -w t2d_puerto_sisy01.cas_2016-04-11-09h30min29s telemac2d t2d_puerto_sisy01.cas -c wing64mpi

If you are in a hurry, you can open your individual *.res files contained in the temporal folder of simulation in Bluekenue (they will load just fine).

Regards,

José D.
The administrator has disabled public write access.
The following user(s) said Thank You: bmater

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21933

  • bmater
  • bmater's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 49
  • Thank you received: 3
Hi Jose,

Thanks for your help. I saw a couple other paths in the PATH variable that were related to MPI. Namely

%INTEL_DEV_REDIST%redist\intel64\mpirt;
%INTEL_DEV_REDIST%redist\intel64\compiler;
%INTEL_DEV_REDIST%redist\ia32\mpirt;
%INTEL_DEV_REDIST%redist\ia32\compiler;
C:\Program Files (x86)\Intel\MPI-RT\4.0.2.005\em64t\bin;
C:\Program Files (x86)\Intel\MPI-RT\4.0.2.005\ia32\bin;

I'm hesitant to remove these, because I'm not sure what unwanted side effects that may cause... Could these be interfering?

Yes, I did register/restart smpd/mpiexec using admin rights. I've done it in the same cmd session that I try to run Telemac. I've also done in in a separate cmd window...does this matter? Neither seemed to work for me.

Thanks for the advice on merging. I will give that a try.

Thanks so much!
Ben
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21952

  • bmater
  • bmater's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 49
  • Thank you received: 3
Hi Jose,

I'm still working on getting the v6 to run in parallel config to compile...without much success. I'm currently attempting to upgrade to v7 (see this post 12888).

In the mean time, I've been able to run the model with Christophe's work-around method. I've also been able to merge the individual results files as you describe - thanks!

Ben
The administrator has disabled public write access.

Telemac 3D in parallel hangs after "Running your simulation(s)" 8 years 5 months ago #21953

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Dear Ben,

I'm glad to hear that!, this parallel issue is stuck in my head for some reason, sorry if i keep pointing the wrong direction but...

I've always have some doubts with multiple installations of mpi, especially with intel's one. I'm not confident if there will be no troubles if you remove those keys.

For the time being, in my last reply i forgot to mention something: if there are indeed multiple installations of a command in the path, if i'm not wrong, the one that actually executes after a cmd call (and i believe python pipe calls as well) is the first one listed when executing:
where mpiexec
where smpd

Example of "where python". I have two installations in my path, Anaconda and opentelemac's bundled one. But i manually add the telemac's one before the Anconda entry, so it gets call when invoking "python" (see attached)

I remember that few months back manually placing all the required Telemac's entries in the system PATH before anything just to avoid future issues...

My current system PATH looks like this at the beginning:
C:\opentelemac-mascaret\python27;C:\opentelemac-mascaret\mpich2\bin;C:\opentelemac-mascaret\mpich2\lib;C:\opentelemac-mascaret\mingw64\bin;[REST OF THE PATH]

Hope you can resolve this issue soon!
Attachments:
The administrator has disabled public write access.
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.