Welcome, Guest
Username: Password: Remember me

TOPIC: telemac2d error during mpiexec

telemac2d error during mpiexec 10 years 2 months ago #14177

  • schaad
  • schaad's Avatar
I am unable to run telemac2d v6p3 in parallel mode. The config file I have compiled the code with is based on 'ubugfopenmpi' (see attached systel.cfg). The compiling process completes without trouble, but when I run a case I get the following error when mpiexec is run. The case runs fine in serial mode (using 'ubugfortrans'). Any wisdom is much appreciated!
Running your simulation(s) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/usr/bin/mpiexec -wdir /home/sschaad/Desktop/steady_01/steady_02.cas_2014-09-10-10h10min12s -n 4 out_telemac2d

===========================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 255
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===========================================================================
Attachments:
The administrator has disabled public write access.

telemac2d error during mpiexec 10 years 2 months ago #14180

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi

Did you check if the partionning step is well finished?
Look into the temp directory the partel.log file

Hope this helps
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: schaad

telemac2d error during mpiexec 10 years 2 months ago #14182

  • schaad
  • schaad's Avatar
Yes, it appears the partitioning successfully finished (see attached).

Any other thoughts about what could be wrong?

Thanks, Simon

EDIT: Here is log file in plain text:
 +-------------------------------------------------+
   PARTEL: TELEMAC SELAFIN METISOLOGIC PARTITIONER
                                                    
   REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)
                  JEAN-MICHEL HERVOUET (LNHE)
                  CHRISTOPHE DENIS     (SINETICS) 
                  YOANN AUDOUIN        (LNHE) 
   PARTEL (C) COPYRIGHT 2000-2002 
   BUNDESANSTALT FUER WASSERBAU, KARLSRUHE
  
   METIS 5.0.2 (C) COPYRIGHT 2012 
   REGENTS OF THE UNIVERSITY OF MINNESOTA 
  
   BIEF 6.2 (C) COPYRIGHT 2012 EDF
 +-------------------------------------------------+
  
  
   MAXIMUM NUMBER OF PARTITIONS:       100000
  
 +--------------------------------------------------+
  

 SELAFIN INPUT NAME <INPUT_NAME>:  INPUT: T2DGEO                                                                                                                                                                                                                                                    

 BOUNDARY CONDITIONS FILE NAME :  INPUT: T2DCLI                                                                                                                                                                                                                                                    

 NUMBER OF PARTITIONS <NPARTS> [2 -100000]:  INPUT:    4

 PARTITIONING OPTIONS: 

 PARTITIONING METHOD <PMETHOD>                              [1 (metis) OR 2 (scotch)]:  INPUT:   1

 WITH SECTIONS? [1:YES 0:NO]:  INPUT:    0
  
 ONE-LEVEL MESH.
 NDP NODES PER ELEMENT:                    3
 NPOIN NUMBER OF MESH NODES:            7873
 NELEM NUMBER OF MESH ELEMENTS:        14410
  
 THE INPUT FILE ASSUMED TO BE 2D SELAFIN
 TIMESTEP:    0.00000000     S =    0.00000000     H
 THERE ARE            1  TIME-DEPENDENT RECORDINGS

 THERE IS   2 LIQUID BOUNDARIES:

 BOUNDARY   1 : 
  BEGINS AT BOUNDARY POINT:    385 , WITH GLOBAL NUMBER:     5882
  AND COORDINATES:     4270.996           9123.006    
  ENDS AT BOUNDARY POINT:    397 , WITH GLOBAL NUMBER:     5767
  AND COORDINATES:     4187.929           8918.166    

 BOUNDARY   2 : 
  BEGINS AT BOUNDARY POINT:    963 , WITH GLOBAL NUMBER:     5174
  AND COORDINATES:     3367.806           12434.83    
  ENDS AT BOUNDARY POINT:    971 , WITH GLOBAL NUMBER:     5212
  AND COORDINATES:     3405.787           12590.26    

 THERE IS   2 SOLID BOUNDARIES:

 BOUNDARY   1 : 
  BEGINS AT BOUNDARY POINT:    971 , WITH GLOBAL NUMBER:     5212
  AND COORDINATES:     3405.787           12590.26    
  ENDS AT BOUNDARY POINT:    385 , WITH GLOBAL NUMBER:     5882
  AND COORDINATES:     4270.996           9123.006    

 BOUNDARY   2 : 
  BEGINS AT BOUNDARY POINT:    397 , WITH GLOBAL NUMBER:     5767
  AND COORDINATES:     4187.929           8918.166    
  ENDS AT BOUNDARY POINT:    963 , WITH GLOBAL NUMBER:     5174
  AND COORDINATES:     3367.806           12434.83    
  THE MESH PARTITIONING STEP STARTS
 BEGIN PARTITIONING WITH METIS
  RUNTIME OF METIS    0.00000000      SECONDS
  THE MESH PARTITIONING STEP HAS FINISHED
 TREATING SUB-DOMAIN            1
 TREATING SUB-DOMAIN            2
 TREATING SUB-DOMAIN            3
 TREATING SUB-DOMAIN            4
 OVERALL TIMING:    3.90000008E-02  SECONDS
  
 +---- PARTEL: NORMAL TERMINATION ----+
Attachments:
The administrator has disabled public write access.

telemac2d error during mpiexec 10 years 2 months ago #14184

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Impossible to see the contents of the partel_T2DGEO.log!
But according to you, it seems to be OK
Try to run the mpiexec command manually in the temp directory (with the same syntax than in the output file of your first post

regards
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: schaad

telemac2d error during mpiexec 10 years 2 months ago #14187

  • schaad
  • schaad's Avatar
I get the same error when I run the mpiexec command in the temp directory. I tried to get more info from the error by turning all the verbosity options... no luck. The error is below. There is some mention of an 'execvp error', but my gut feeling is that this is not useful.
$ mpiexec out_telemac2d -n 4
[proxy:0:0@gazelle] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file out_telemac2d (No such file or directory)

========================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 255
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
========================

Thanks!!
The administrator has disabled public write access.

telemac2d error during mpiexec 10 years 2 months ago #14188

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi

maybe one idea!
please delete the out_telemac2d exe file in the directory and rerun the simulation
If you made a first trial in serial mode, maybe this is this version which was not updated when you try to run in parallel...
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: schaad

telemac2d error during mpiexec 10 years 2 months ago #14189

  • schaad
  • schaad's Avatar
I am not sure I understand. I tried running the serial out_telemac2d exe with mpiexec in the parallel temp folder. I got the same error. Should this have worked?
The administrator has disabled public write access.

telemac2d error during mpiexec 10 years 2 months ago #14190

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Sorry if I'm not clear

In the telemac system, the executable is copied in the directory where you launch the simulation. And if you run another simulation, this is this local one which is used.
So my idea is you ran a first simulation on serial mode then the local executable is a serial one.
Then you try to run the simulation in parallel and telemac try to use the local executable which is serial and this generate the crash.

So if you clean you're directory of simulation (delete executable file and temp directory) and if you run the simulation in parallel, Telemac will copy the parallel executable and this could solve the problem.

Hope this clarify my first explaination
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: schaad

telemac2d error during mpiexec 10 years 2 months ago #14197

  • schaad
  • schaad's Avatar
I understand now. This did not work either... Any other ideas? Your help is much appreciated!
The administrator has disabled public write access.

telemac2d error during mpiexec 10 years 2 months ago #14200

  • schaad
  • schaad's Avatar
Problem Solved!

The problem lied with the installation of openmpi. I had two versions installed and apparently there was a conflict. Through 'apt-get' I had installed both 'mpich' and 'openmpi-bin' without realizing it. Removal of both versions and cleanly installing 'openmpi-bin' did the trick.

Thanks Christophe for the tips!
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.