Welcome, Guest
Username: Password: Remember me

TOPIC: Run Telemac on multiple nodes !

Run Telemac on multiple nodes ! 10 years 11 months ago #11277

  • phuongTelemac
  • phuongTelemac's Avatar
Hi all,

I would like to know how to run a the telemac on multiple nodes. I have install open telemac on 2 machines and I can run the example programs on each.

I want to run the program on both of the nodes because MPI can run on multiple nodes. I try to define the IP address of my machines at the "mpirun hosts" option in the systel.cfg. However, it seems no effect with this configuration !
How can I fix this problem ? Thank you very much !

Phuong

/* The output that I got:

Running t2d_bumpflu.cas with telemac2d under /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... reading module dictionary
... simulation en Francais avec t2d_bumpflu.cas
copying: t2d_bumpflu.f /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu/t2d_bumpflu.cas_2013-12-03-02h48min43s/t2dfort.f
re-writing: /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu/t2d_bumpflu.cas_2013-12-03-02h48min43s/T2DCAS
copying: telemac2dv6p2.dico /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu/t2d_bumpflu.cas_2013-12-03-02h48min43s/T2DDICO
copying: geo_bumpflu.slf /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu/t2d_bumpflu.cas_2013-12-03-02h48min43s/T2DGEO
copying: geo_bumpflu.cli /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu/t2d_bumpflu.cas_2013-12-03-02h48min43s/T2DCLI
copying: f2d_bumpflu.slf /home/owen/opentelemac/validation/telemac2d/tel2d_v6p2/011_bumpflu/t2d_bumpflu.cas_2013-12-03-02h48min43s/T2DREF
partitioning: T2DGEO
+> /home/owen/v6p2r1/parallel/parallel_v6p2/fedgfopenmpi/partel < PARTEL.PAR >> partel_T2DGEO.log
Current memory used: 0 bytes
Maximum memory used: 0 bytes
***Memory allocation failed for CreateGraphDual: nptr. Requested size: 49890340121960 bytes
sh: line 1: 14211 Segmentation fault (core dumped) /home/owen/v6p2r1/parallel/parallel_v6p2/fedgfopenmpi/partel < PARTEL.PAR >> partel_T2DGEO.log
... The following command failed for the reason above
/home/owen/v6p2r1/parallel/parallel_v6p2/fedgfopenmpi/partel < PARTEL.PAR >> partel_T2DGEO.log



*/
The administrator has disabled public write access.

Run Telemac on multiple nodes ! 10 years 11 months ago #11279

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

The error seems to come from partel.
In the temporary folder there should be a partel_T2DGEO.log.
Could you post it here.
By the way what version of metis are you using ?

Hope it helps.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Run Telemac on multiple nodes ! 10 years 11 months ago #11289

  • phuongTelemac
  • phuongTelemac's Avatar
Hi Yugi,

Thanks for your answer but it might be not the corrected solution. I installed metis-5.1.0 and I saw nothing in the partel_T2DGEO.log, thus, I got nothing from the log file. However, I saw that the MPI_HOSTFILE contains 2 lines with the same hostname of my local machine.

Do you know why it contains only one name in 2 lines. How to add the second machine in the execution ?

Best regards,
Phuong
The administrator has disabled public write access.

Run Telemac on multiple nodes ! 10 years 11 months ago #11305

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
To change what id in MPI_HOSTFILE, use the option "--hosts" of "telemac2d.py"

If your partel_T2DGEO is empty could you try running the command directly from the temporary folder.
home/owen/v6p2r1/parallel/parallel_v6p2/fedgfopenmpi/partel < PARTEL.PAR
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Run Telemac on multiple nodes ! 10 years 11 months ago #11318

  • phuongTelemac
  • phuongTelemac's Avatar
The command cannot be executed from the temporary folder.
The administrator has disabled public write access.

Run Telemac on multiple nodes ! 10 years 11 months ago #11320

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
The problem with MPI_HOSTFILE is not important for the moment. This file is only use for the telemac run itself which appear after the partitioning step which fails on your installation.

If partel exist on your computer, you could execute it in the temporary directory.
What is the message you obtain?

PS: please, update your profile
Christophe
The administrator has disabled public write access.
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.