Dear Telemac users,
I installed Telemac v6p2 and v7p0r0 on Windows7 workstations. On each one of these, I can run a simulation in parallel with mpich2. However, I tried to create a cluster using two of these workstations, and the simulation worked but is much slower than if using the cores of only one of these two computers. I posted a similar question last week for my Linux setup (
opentelemac.org/index.php/kunena/12-linu...ng-if-using-16-cores) but the origin of the problem seems to be different under Windows7 since I'm unable to run a single simulation involving more than one node. I suspect that my problem with Windows7 is related to security.
Here are the main facts about my configuration:
- The two workstations that I'm using are identical twins (except their name). They have a 64-bit architecture, are hex-cores with 24Gb memory, and run Windows7.
- I compiled Telemac v7.0 using Intel Visual Fortran Compiler v11.1.072 and I'm using Python v2.7.9 (I think this version includes numpy, scipy, and matplotlib) and Metis v5. For the Metis library, I ignore the exact sub-version number, but it was created on September 12th, 2012 and is 1181Kb. I have attached my configuration file.
- The two computers belong to the same network and recognize my user name (which happens to be an administrator of both machines). Therefore, there is a single domain, user and password involved.
- MPICH2 is installed on both machines, at the same location, and the SMPD service is running.
- I created a 'case' directory on the primary machine (the one that holds the Telemac binaries), shared this directory to my user (with full control), and mapped a drive toward this directory (n:\) on both machines.
- Since, at this point, my multi-node calculation was not starting, I registered the hosts (using c:\mpich2\bin\mpiexec.exe -register -host) so that each computer recognizes the other participating machine.
- I also disabled the firewall on both workstations.
- Finally, I modified runcode.py to build the host file (i.e. the file MPI_HOSTFILE in the temporary directory) according to the content of the field 'mpi_hosts' in config.cfg.
I then run my simulation using:
cd n:\t2dwork\
python c:\telemac\v7p0\scripts\python27\runcode.py telemac2d -s T2DCAS -c wintelmpi -w tmp
If I declare a single host in config.cfg (the workstation on which the Telemac binaries are located), my parallel simulation works fine with these settings. However, if I declare two workstations (as shown in the file config.cfg that is attached to this post), the last line displayed in the standard output is:
USING STREAMLINE VERSION 7.0 FOR CHARACTERISTICS
although the requested cores are working at 100% according to Windows Task Manager.
My questions are the following:
1) Is there something wrong with my configuration?
2) Are there other security settings (not related to the firewall) that need to be modified?
Regards,
Yannick