Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Parallel-failed after adding Batch scripts into configuration file

Parallel-failed after adding Batch scripts into configuration file 7 years 9 months ago #25094

  • huyquangtran
  • huyquangtran's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 271
  • Thank you received: 23
Hello,

I just find that my problem is probably relating to "oversubscribing" on openmpi.

Here is one of the links people have been discussing about this:

stackoverflow.com/questions/35704637/mpi...ough-slots-available

I see one trick to solve the problem is to add --oversubscribe" option, but when I added this into configuration, it does not work?

Has anyone have experience in solving "oversubscribe" or "hotfile"?

Thanks & Best Regards
Huy
The administrator has disabled public write access.

Parallel-failed after adding Batch scripts into configuration file 7 years 9 months ago #25097

  • josekdiaz
  • josekdiaz's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 161
  • Thank you received: 48
Dear Huy,

I'm sorry if I deviate a little but I couldn't catch all the info to undestand the issue (perhaps it's related to mpi-cluster config rather than oversubscribing the nodes):

-Are you using mpich2 or mpi?

-Are the nodes in the cluster properly connected using paswordless communication or using a known host (stored key) config? (so you can connect just using "ssh NODE_NAME_OR_IP" without any hassle)?

-Are you using a config file i.e. mpi_hostfile or somehow adding instructions to Telemac's cfg directly?

-Did Yoann's simple hello-world ran in at least 2 nodes?

Regards,

José Díaz.
The administrator has disabled public write access.
The following user(s) said Thank You: huyquangtran
  • Page:
  • 1
  • 2
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.