Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Cannot run TELEMAC v6p1 in parallel

Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2049

  • bzindovic
  • bzindovic's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 68
  • Thank you received: 3
Dear all,

I have been playing with Python scripts shipped with new version of TELEMAC system (v6p1 under Windows 7 64-bit and XP 32-bit) but I cannot run it in parallel. I have set the cfg file to point to the section for parallel computing (in the file provided with the system, it is the section called wintelmpi and I have set the paths to libraries and executables), put appropriate number of processors in case file and also in mpi_telemac.conf but the output for the SpillwayStillingBasing example is always:

... reading module dictionary
C:\TELEMAC\Racunica\SpillwayStillingBasing\cas.tel
... running in English
... inconsistent CAS file: C:\TELEMAC\Racunica\SpillwayStillingBasing\cas.tel
+> you may be using an inappropriate configuration: wintelmpi
+> or may be wishing for scalar mode while using parallel
Press any key to continue . . .


This is rather odd since I have built the parallel version but I cannot run it. Does anyone can help me with this?

Regards,
Budo
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2054

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
When you just run ncompileTELEMAC.py, it will do it for all configuration in the first line of the config.cfg file. The same when you run it with runcode.

This helps with our automated management system where we go through all possible plateforms and compilers and configurations etc.

you should add the option -c wintelmpi for you runcode if you wish to run only the parallel mode.

Hope this helps,

Sébastien.
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2062

  • bzindovic
  • bzindovic's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 68
  • Thank you received: 3
Dear Sébastien,

I made only one configuration available (and that was wintelmpi) but that didn't seem to work. I've tried what you advised me on putting an explicit switch for configuration and tried three different version:


telemac2d -c wintelmpi cas.tel

telemac2d --configname=wintelmpi cas.tel

runcode.py -c wintelmpi telemac2d cas.tel


but they all gave the same output as in my previous message.

I've attached an archive that contains case file for the Spillway problem, mpiconfig file and my TELEMAC configuration file. Maybe you can tell me what am I doing wrong. I have no more ideas where to check.

Best regards,
Budo

File Attachment:

File Name: Problem_SetUP.zip
File Size: 1823
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2064

  • Chris Cawthorn
  • Chris Cawthorn's Avatar
Hi Budo,

I notice that you have PARALLEL PROCESSORS = 1 in your steering file. I found that running in 'parallel' with only one processor triggers the same "Inconsistent CAS file" error in the python script.

If you have PARALLEL PROCESSORS = n with n>1, then I think it should work. I certainly can't see any obvious problems with your config file.

Whether or not runcode.py is correct to report an error for PARALLEL PROCESSORS = 1 is a different story. According to the manual, this keyword should mean "Run using the parallel-compiled version of TELEMAC, but with only one processor". By contrast, PARALLEL PROCESSORS = 0 (the default) means simply "Run the scalar version of TELEMAC".

Based on this, we probably ought to change/remove the error message. Thanks for the report!

Chris
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2075

  • bzindovic
  • bzindovic's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 68
  • Thank you received: 3
Hello Chris,

Thank you for the prompt reply.

I indeed have set the PARALLEL PROCESSORS = 1 in my steering file and mpiconfig file for testing MPICH2 programs on my single core laptop (rather old 32-bit machine but works OK). It doesn't mean that you actually have parallel or multi-core processors but its a number of tasks (in our case computations) given to a targeted host computers. Maybe this keyword is outdated since it can cause a confusion and should be renamed to NUMBER OF THREADS, since TELEMAC system calls MPICH2 with a number provided with that keyword for a given number of threads (the word thread is also used throughout the provided Python scripts). Choosing one MPICH2 thread is valid (I made a successful run with TELEMAC V6P0).

Comparing setting between versions previous and current version of TELEMAC, I see that the MPICH2 -localonly option for hosts is changed to -mapall. I browsed the Web but didn't find what is it used for. Can you provide me some insight?

Best regards,
Budo
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2080

  • Chris Cawthorn
  • Chris Cawthorn's Avatar
The point about PARALLEL PROCESSORS is that it's perfectly valid to run TELEMAC in 'parallel' mode with one thread, but that the runcode.py script wasn't written with that possibility in mind. The problem lies in these two lines of runcode.py
61   if ncsize > 1 and 'parallel' not in cfg['MODULES'].keys(): return False
62   if ncsize < 2 and 'paravoid' not in cfg['MODULES'].keys(): return False 
The fix may be as simple as changing the first inequality to 'ncsize > 0', but I'd need to look more closely to make sure that this change wouldn't have any unintended consequences.

As for the MPICH options, it isn't really the case that -mapall replaces -localonly. The -mapall option creates on each remote host a temporary mapping of each network drive mapped on the local host, to facilitate sharing of input files between machines in Windows. This link (to a PDF file) has a little more detail on the Windows-specific options for mpiexec. It has nothing to do with specifying host machine names, but was included in the mpi_hosts part of the config file as part of a quick workaround.

-localonly, as I'm sure you know, means that MPICH will only launch processes on the local host. Unless the configuration of MPICH itself is altered, this is the default mode of operation of mpiexec.exe. I'm not quite sure why it was deleted from the mpi_hosts line in the TELEMAC config file, but for a 'real' parallel run it is up to the user to set this line to suit their own system (i.e. with real host names etc.). Remember that the config files distributed with TELEMAC are all really just templates, and almost always need editing by the user. Hopefully we will explain the setup procedure more clearly when we release fresh documentation of the Python scripts.

For your purposes (running on one processor with multiple threads), you should need to have only
mpi_hosts: -localonly

I hope that this helps to clarify things.
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2089

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Please note also that with the Python version you do not need the mpi_telemac.conf file anymore. The default option will be -localonly. Only when you use a cluster in network that the names of your computers would have to be listed in your config file, or like Chris Cawthorn does it, you can refer to your own local file from the config file.

Chris ... can you expend on that last point please ?

Hope this helps.

Sébastien
The administrator has disabled public write access.
The following user(s) said Thank You: bzindovic

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2091

  • Chris Cawthorn
  • Chris Cawthorn's Avatar
Sure. It's a bit off the original topic, but might be useful to other users who are having trouble with running TELEMAC in parallel using the Python scripts.

I prefer to have some flexibility in choosing the number and names of hosts for each parallel run, rather than always running with the same set of hosts. To do this, one can set
mpi_hosts:   -machinefile ..\hosts.txt
mpi_cmdexec	:   C:\MPICH_32\bin\mpiexec.exe -mapall <wdir> <ncsize> <hosts> <exename>
in the systel.cfg file. When you want to run a simulation, you need to create a file called hosts.txt in the same directory as the TELEMAC steering file. This file is described in the MPICH2 User Guide, but would usually be a simple list of host names, for example:
host1
host2
host2
host3
Repeating host2 is OK here - this means that two processors on host2 will be used, but only one processor will be used on each of host1 and host3. I can then run TELEMAC using up to 4 parallel processors. To run with more processors, I need only to edit my hosts.txt file to include more hostnames.
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2096

  • bzindovic
  • bzindovic's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 68
  • Thank you received: 3
Sébastien and Chris,

Thank you for your in-depth answers.

I've tried to do as Chris suggested on my 64-bit laptop but it refuses to launch the computation. It seems that I have problems with my MPICH2 installation. As soon as I sort this, I'll try Chris' suggestion and will inform you.

Best regards,
Budo
The administrator has disabled public write access.

Re:Cannot run TELEMAC v6p1 in parallel 13 years 3 months ago #2099

  • Chris Cawthorn
  • Chris Cawthorn's Avatar
Hi Budo,

You've probably already thought of this, but just in case you haven't...

Make sure that the path for mpiexec.exe is correct for your system if you're copying my mpi_cmdexec config line. This could be the 'problem' with your MPICH2 installation.

If it isn't, good luck sorting it out!
Chris
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.