Welcome, Guest
Username: Password: Remember me

TOPIC: Opentelemac and HPC Pack (MS-MPI)

Opentelemac and HPC Pack (MS-MPI) 9 years 11 months ago #15274

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello yugi,

Indeed, HPC Pack has dependency capabilities, however I wanted to ask whether the runcode.py script has options to account for such a case.

In the runcode.py script I saw this comment
   runCAS now takes in an array of casFiles, and if possible,
      run these in parallel of one another and as one job on a queue
      where the mpi_exec command do the parallelisation
   Notes:
      - casdir is where the CAS files are.
      - The hpccmd is unique. The mpicmd is not (unfortunately).
and I was wondering if it is relevant to what I am looking for and possibly integrate my hpc command script with it.

Best Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 9 years 11 months ago #15275

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

This is true indeed. You cab use it with multiple CAS files, all ran in parallel. This is particularly useful to wave modeller wishing to run several of the same simulation for different input wave characteristics.

Sorry I did not answer sooner but I have thought very hard about your request -- runs one after the others -- it is an interesting idea. At this stage it is not possible but if you can wait until after early january, I am sure I can come up with a solution.

Hope this helps,
Sébastien.
The administrator has disabled public write access.
The following user(s) said Thank You: cyamin

Opentelemac and HPC Pack (MS-MPI) 9 years 11 months ago #15276

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello Sebastien,

Thank you for your clarification. This is something that was always on my mind because every time I wanted to run sequential jobs, I had to disable the hpc queue commanding of every node and launch smpd in debug mode. This is of course a limitation of MSMPI over MPICH2 (which uses smpd by default).

So, at the moment, I make do but I would love to see that feature added. In fact, I think that is the only thing that I am lacking from my "Opentelemac and HPC Pack (MS-MPI)" setup.

Regarding multiple CAS files, what is the way to define the "array of cas files"? So far I was just copying the command in the following line in a cmd script for every cas file.

Best Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 9 years 11 months ago #15277

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
You can have multiple CAS file (space delimited) on the same command line whether through HPC or on the main node / computer.

telemac3d.py -s cas1 cas2 cas3

Sébastien.
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 9 years 9 months ago #15715

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

I have completed the implementation of doing the jobs in sequential mode (when more than one CAS file is provided) based on Yoann's previous idea.

You have to:

- add the key hpc_depend to your configuration where <jobid> will be replaced live by the process ID of the previous CAS file (for PBS:).
hpc_depend: -W depend=afterok:<jobid>
I have added the example in systel.cis-hydra.cfg
Note that this option varies depending on what queuing system you use.

- you have to add the option --sequential to your running command line
If you don't then the hpc_depend key is ignored if present in your config, so you may as well have it all the time.

The changes are available in guppy for now -- this is being thoroughly validated before a merge with the trunk. Also, each CAS file is being run on its own proc ID, so one proc ID cannot include all CAS file -- but at least there are put in the queue together at the start and then waiting for each others.

Hopefully you can access for testing on your side,

Enjoy.

Sébastien.
The administrator has disabled public write access.
The following user(s) said Thank You: cyamin

Opentelemac and HPC Pack (MS-MPI) 9 years 9 months ago #15753

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello Sebastien,

Thank you for your work. Is it possible to have the necessary files to have a go at it? I don't have access to the trunk, so it may take some time before I can obtain it through the repository.

Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 9 years 9 months ago #15754

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

Here is the whole of the python scripts directory to make sure as well as the example.

Enjoy !

Sébastien.

File Attachment:

File Name: python27.zip
File Size: 326 KB


File Attachment:

File Name: systel.cis-hydra.cfg
File Size: 5 KB
The administrator has disabled public write access.
The following user(s) said Thank You: cyamin

Opentelemac and HPC Pack (MS-MPI) 9 years 9 months ago #15783

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello Sebastien,

I am trying to adapt your work to HPC Pack. At first I am trying to figure out the logic behind it. I have setup two successive cases and added the 2 cas filenames in the command after the '-s' switch. I have also added the '--sequential' switch and an hpc_depend entry in my config file (with an initial guess of the corresponding HPC command for dependency). However, I do not get any error regarding my hpc_depend entry and the jobs submit like if '--sequential' is not there.

Also, you have mentioned that in post #15715:
..but at least there are put in the queue together at the start and then waiting for each others.
If the jobs are submitted at the same time, then the second job will fail because the 'Previous computation file' has not been created yet. If the jobs are to be truly sequential, then the runcode script should wait for the first job to finish (and recollect) and then submit the next one. Is this what you had in mind?

Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 9 years 9 months ago #15785

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

The command in hpc_depend (combined with --sequential) creates the dependency-link at the queuing level, so that the second job will not start until the first one has complete, that is even if you launch them at the same time. You will see both runs submitted in your queue, but one will have R for run and the other H for hold until the first job completes, at which time the queuing system make the second one R run.

The python itself does not do anything but captures the process ID and use that in the hpc_depend of the second. The dependency and waiting is organised by the queuing system.

Also, notes that you have to have your job submitted as python (case of hydry in the config file, i.e. the domain decomposition and merges are carried out within the queuing job as one job). It will not work at this stage if you run only the mpiexec on the queue (i.e. hydra configuration example) while the python runs on the main node.

Hope this helps,
Sébastien.
The administrator has disabled public write access.
The following user(s) said Thank You: cyamin

Opentelemac and HPC Pack (MS-MPI) 9 years 9 months ago #15786

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello,

I think the solution would be to submit the job using python, like you said, which means that I have to rethink the whole process. I have researched job submission with python to HPC Pack in the past but my efforts weren't fruitful. It is not as common as in other queuing systems. I will have another go at it when I have more time to spare. If you have any examples that could help me start, please share them with me.

Best regards,
Costas
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.