Welcome, Guest
Username: Password: Remember me

TOPIC: Opentelemac and HPC Pack (MS-MPI)

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11972

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello all,

I have managed to compile opentelemac using mingw-64 and MS-MPI and have run parallel jobs using mpiexec in the local machine.

However, when I try to use 'Job Scheduler' to submit a job to my windows HPC Pack cluster (using the following syntax),
job submit mpiexec.exe /wdir <wdir> /n <ncsize> /machinefile \\atlas\Company\DataDisk\Work\opentelemac\mpi_hosts.txt <exename>
I get the following error:
Aborting: failed to launch 'out_t2d_malpasset-large.exe' on thor
Error (2) The system cannot find the file specified.

The temporary working directory is, of course, empty, so it seems that when the job submit command is used, mpiexec runs before the temporary working directory is populated with the run files. Any ideas on how to work around this problem?

Best Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11974

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

Can you share your configuration file with us ?
Also, you may want to test adding cd <wdir> the line above mpiexec.exe in your hpc_stdin.

Have a look at systel.cis-hydra.cfg for examples of configuration with a job scheduler.

Hope this helps,
Sébastien.
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11976

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello Sebastien,

This is my current configurations file. I have been playing around (unsuccessfully) with the 'mpi_cmdexec' so don't pay particular attention to it.
# _____                                  ____________________________________
# ____/ Windows gfortran parallel MSMPI /___________________________________/
[wingformsmpi]
#
par_cmdexec:   <config>\partel.exe < PARTEL.PAR >> <partel.log>
#
#mpi_hosts:   
#mpi_cmdexec:   C:\opentelemac\msmpi-4.1.4174.0-windows-x64\bin\mpiexec.exe -wdir <wdir> -n <ncsize> <exename>
mpi_cmdexec:   job submit /workdir:<wdir> /stdout:stdout.txt /stderr:stderr.txt mpiexec.exe /wdir <wdir> <exename>
#
#
cmd_obj:    x86_64-w64-mingw32-gfortran -march=corei7 -Ofast -fopenmp -c -fno-range-check -ffixed-line-length-132 -fconvert=big-endian -frecord-marker=4 <mods> <incs> <f95name> -DHAVE_MPI
cmd_lib:    ar cru <libname> <objs>
cmd_exe:    x86_64-w64-mingw32-gfortran -march=corei7 -Ofast -fopenmp -fconvert=big-endian -frecord-marker=4 -v -o <exename> <objs> -Xlinker --start-group <libs>
#--end-group
#
mods_all:   -I <config>
#
incs_parallel: -I c:\opentelemac\msmpi-4.1.4174.0-mingw-w64-x64\include
libs_partel:	c:\opentelemac\libmetis\libmetis64b_5.0.2.a
libs_all:	c:\opentelemac\msmpi-4.1.4174.0-mingw-w64-x64\lib\libmsmpi.a
#
sfx_obj:    .o
#

My I remind that I am trying to submit jobs to a Microsoft Windows cluster running HPC Pack 2012 (using MS-MPI and not MPICH2). The job scheduler is the one that comes with Microsoft's HPC Pack.

Do you know if anyone has managed to run opentelemac on a windows HPC, or I am alone in this attempt?

Best Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11980

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello, I am not sure how you use your job scheduler ...

If you use a job scheduler, you need to have a hpc_stdin part in your configuration. Otherwise, only the mpi_exec command will be executed where you run telemac.py from.

I see that you have something like "job submit" ... this command needs to be in hpc_cmdexec and not in mpi_exec.

Can you explain a bit more ?

Sébastien.
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11981

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello Sebastien,

The job scheduler idea is new to me and I am learning it as we speak. I 've read that the commands before mpiexec are the job scheduler commands and they could be given in the same line/command (according to Microsoft help). For sure, the Job Scheduler accepts the job given and any other parameter (this is verified in the HPC Job Manager window).

Would it make a difference to telemac if I made a separate hpc_stin part in the configuration?

I am studying the systel.cis-hydra.cfg to try to replicate it for Microsoft HPC. Thanks for your help.

Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11982

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
yes it would help to provide the std_in file to the job scheduler command -- we have not planned the case where both the job scheduler and the mpi_exec command are launched at the same time.

I am not of what the script would look like for the MS scheduler, but here are a few more examples of what it is with BSUB and QSUB ...
#
mpi_hosts:    mg01
mpi_cmdexec: /gpfs/packages/openmpi/1.4.4/gcc/bin/mpiexec -wdir <wdir> -n <ncsize> <exename>
par_cmdexec:   <config>/partel >> <partel.log>
hpc_stdin: #!/bin/bash
   #BSUB -n <ncsize>
   #BSUB -J <jobname>
   #BSUB -o <sortiefile>
   #BSUB -e <exename>.%J.err
   #BSUB -R "span[ptile=9]"
   <mpi_cmdexec>
   exit
hpc_cmdexec:   chmod 755 <hpc_stdin>; bsub -q encore < <hpc_stdin>

mpi_cmdexec: mpiexec -wdir <wdir> -n <ncsize> <exename>
hpc_stdin: #!/bin/bash
   #$ -cwd                        # working directory is current directory
   #$ -V                          # forward your current environment to the execution environment
   #$ -pe mpi-12x1 96             # no of cores requested
   #$ -S /bin/bash                # shell it will be executed in
   #$ -j y                        # merge stderr and stdout
   cat $PE_HOSTFILE | awk '{print $1, " slots=9"}' > machinefile.$JOB_ID
   cat machinefile.$JOB_ID
   <mpi_cmdexec>
   exit
hpc_cmdexec:   chmod 755 <hpc_stdin>; qsub < <hpc_stdin>

Hope this helps,
Sébastien.
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11983

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Dear Sebastien,

I have adapted my configuration file to look like this:
#
par_cmdexec:   <config>\partel.exe < PARTEL.PAR >> <partel.log>
#
#mpi_hosts:   
mpi_cmdexec: mpiexec.exe /wdir <wdir> /cores <ncsize> <exename>
hpc_stdin:   /numcores:<ncsize> /jobname:<jobname> /stdout:<sortiefile>.txt /stderr:<exename>.err <mpi_cmdexec>
#
hpc_cmdexec:   job submit <hpc_stdin>
#
When I execute, I get the following error from my job scheduler:
'HPC_STDIN' is not recognized as an internal or external command,
operable program or batch file.
The positive is that now the temporary working directory is populated with the expected files (plus some new that are due to the hpc configuration).
I still don't understand the purpose and syntax of the "hpc_stdin:" and "hpc_cmdexec:" lines. How does telemac utilizes them. Can you provide some insight?

Regards,
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11984

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
OK - in the options, make sure you have:
parallel mpi hpc
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11985

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
OK, I did so in the [general] section of the configuration file, but it did not appear to make any difference. This is the console output:
Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... parsing configuration file: c:\opentelemac\v6p3r2\configs\systel.cfg


Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    +> configuration: wingformsmpi
    +> root:          C:\opentelemac\v6p3r2
    +> version        v6p3


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


... reading the main module dictionary

... processing the main CAS file(s)
    +> running in English

... checking parallelisation

... handling temporary directories

... checking coupling between codes

... first pass at copying all input files
    +>  twac_yfist.cas
    copying:  tomawac.dico \\atlas\Company\DataDisk\Work\opentelemac\Sifnos\Chained\1.Tomawac\twac_yfist.cas_2014-02-10-16h59min30s\WACDICO
    copying:  Yfist_Geo_v0.slf \\atlas\Company\DataDisk\Work\opentelemac\Sifnos\Chained\1.Tomawac\twac_yfist.cas_2014-02-10-16h59min30s\WACGEO
 re-writing:  \\atlas\Company\DataDisk\Work\opentelemac\Sifnos\Chained\1.Tomawac\twac_yfist.cas_2014-02-10-16h59min30s\WACCAS
    copying:  Yfist_Geo_v0_BC_(555-4).cli \\atlas\Company\DataDisk\Work\opentelemac\Sifnos\Chained\1.Tomawac\twac_yfist.cas_2014-02-10-16h59min30s\WACCLI

... checking the executable
 re-copying:  tomawac.exe \\atlas\Company\DataDisk\Work\opentelemac\Sifnos\Chained\1.Tomawac\twac_yfist.cas_2014-02-10-16h59min30s\out_tomawac.exe

... modifying run command to MPI instruction

... modifying run command to PARTEL instruction
 partitioning:  WACGEO
    +>  C:\opentelemac\v6p3r2\builds\wingformsmpi\bin\partel.exe < PARTEL.PAR >> partel_WACGEO.log

... handling sortie file(s)

... modifying run command to HPC instruction


Running your simulation(s) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Job has been submitted. ID: 67.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... Your simulation has been launched through the queue.

   +> You need to wait for completion before re-collecting files using the option --merge



My work is done


Everything seems to run fine in the command windows BUT the conputation does not start and I still get this error:
'HPC_STDIN' is not recognized as an internal or external command,
operable program or batch file.

Any ideas?
Costas
The administrator has disabled public write access.

Opentelemac and HPC Pack (MS-MPI) 10 years 9 months ago #11986

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Under Linux, we have to change the properties of the local file HPC_STDIN (the content of which is what is in the hpc_stdin) using something like (based on chmod, before executing the job summit):
hpc_cmdexec: chmod 755 <hpc_stdin>; qsub <hpc_stdin>
-- Is it possible that you have to do that with your file too ?

Can you otherwise try something like:
hpc_cmdexec:   job submit <wdir>\<hpc_stdin>

although I am not sure this will work ...

Hope this helps.
Sébastien.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.