Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Node management with MPICH2

Node management with MPICH2 9 years 5 months ago #17385

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
I try to run Malpasset case in parallel after an automatic parallel installation (MPICH2).

The problem is that the case in parallel is slower than in scalar.

It looks like all the partition are solved on the same node.

Everything looks fine trying the example cpi.
I uninstall and install smpd...

erreur1.png


erreur2.png


Does anyone have an idea to solve it?
Thanks,
The administrator has disabled public write access.

Node management with MPICH2 9 years 5 months ago #17390

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello,

Have tried to run the malpasset case in parallel on a single remote host? Your configuration file would also be useful.

Regards,
Costas
The administrator has disabled public write access.

Node management with MPICH2 9 years 5 months ago #17408

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Hi Costas.

Yes I have tried...and it works.

It also works in parallel but it is slower than in scalar (looks like just one core is working).

I use the default configuration file from the automatic installer.

File Attachment:

File Name: systel_parallel_v7p0.cfg
File Size: 1 KB
The administrator has disabled public write access.

Node management with MPICH2 9 years 5 months ago #17410

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
So to summarize: You have two remote nodes/hosts. You are able to run parallel jobs on each one of them, but you cannot split it between them. Correct?

Have you defined rules to open firewall ports on each node? When I was using MPICH2 (I now use MS-MPI) I had defined rules for smpd.exe (and possibly mpiexec.exe) to open a specific range of ports and pass this range to the execution command, e.g.:
mpi_cmdexec: mpiexec.exe /wdir <wdir> /env MPICH_PORT_RANGE 10000,11000 /host node01 /cores <ncsize> <exename>

Also, if possible, try to split the computation between two remote hosts, excluding your local PC. That would help troubleshooting communication errors.

Regards,
Costas
The administrator has disabled public write access.

Node management with MPICH2 9 years 5 months ago #17416

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Hi Costas

Vocabulary maybe confusing. I want to run telemac in parallel on a single machine with 8 cores (i7-4770... actually 4 physical cores and 8 with hyperthread).

No I haven't defined any rules (I just register).

It is may first parallel installation with windows and MPICH2. Under ubuntu, openMpi I didn't change anything.

I have just test cpi.exe but it doesn't prove anything.

Thanks a lot for you help
The administrator has disabled public write access.

Node management with MPICH2 9 years 5 months ago #17418

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
OK, I was confused by the post subject. The problem should be much easier to find. If scalar works, then it should be MPICH2's fault.

If you have one local node only, then you should add '-localonly' in your mpi_cmdexec command. Also check that you have registered smpd with your current user account.

Regards,
Costas
The administrator has disabled public write access.

Core management with MPICH2 9 years 5 months ago #17419

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Sorry for the confusion. Node may be unappropriate.

When I run "mpiexec -n 8 -localonly out_t2d_malpasset-large.cas..."

It looks like everything is done on a single processor (and it is slow)



LE MOT CLE : PROCESSEURS PARALLELES
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...


FIN DU FICHIER POUR DAMOCLES


NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 0
TELEMAC-2D : 8
LA VALEUR 0 EST GARDEE

and when I abort the run...

Attachment erreur3.png not found




I will try to register with machine administrator login instead of local network login
The administrator has disabled public write access.

Core management with MPICH2 9 years 5 months ago #17420

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
What is really strange is the initial message with the different number of parallel processors.
As i understand it, you will probably have a declaration inside the steering file which indicate parallel processor 0 and the, with the mpiexec command, you run 8 times the scalar simulation.

You should check the steering file inside the temp directory to give the right number or processors.

Other possibility, which is more logical, let the parallel computation run directly from the normal telemac script.
telemac2d.py --ncsise=8 malpasset_large.cas or something similar

if your systel.cfg file is well configure, the parallel run will automatically managed.

Hope this helps
Christophe
The administrator has disabled public write access.

Core management with MPICH2 9 years 5 months ago #17421

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Yes,

that's very strange.

when I read T2CAS... "PROCESSEURS PARALLELES : " is repeated twice.

One is comming from the copy of the steering file (because i can see when it is commented or not) and the second, at the end of T2CAS is taken from --ncsize (if i add it) or from the steering file.

Anyway, whatever I do on T2CAS, i always meet the same issue.
The administrator has disabled public write access.

Core management with MPICH2 9 years 5 months ago #17422

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
I always have this message...

USING STREAMLINE VERSION 7.0 FOR CHARACTERISTICS
USING STREAMLINE VERSION 7.0 FOR CHARACTERISTICS
CORRXY (BIEF) : PAS DE MODIFICATION DES COORDONNEES

MAILLAGE : MESH ALLOUE

LE MOT CLE : SOLVEUR
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...

LE MOT CLE : PRECONDITIONNEMENT
EST CITE AU MOINS 2 FOIS, SEULE LA DERNIERE VALEUR EST CONSERVEE...

FIN DU FICHIER POUR DAMOCLES

NOMBRE DE PROCESSEURS PARALLELES DIFFERENT :
DEJA DECLARE (CAS DE COUPLAGE ?) : 0
TELEMAC-2D : 8
LA VALEUR 0 EST GARDEE

When I edit the T2CAS, and execute with mpiexec from the directory... it doesn't change anything
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.