Welcome, Guest
Username: Password: Remember me

TOPIC: Failure of parallel run with many sources and tracers

Failure of parallel run with many sources and tracers 6 years 4 months ago #30960

  • liuy
  • liuy's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 13
Hi,
Recently I try to use V7P3 run a simulation with 100+ sources and 100+ tracers (with a tracer at each point source), I have read some post in this forum and get lots of information to help me deal with the errors,
www.opentelemac.org/index.php/kunena/21-...tracers?limitstart=0
,followed some advice from this post ,I have change the MAXKEYWORD to 3000 in declarations_special.F.
I can run 50 sources and 50 tracers in series mode,after increase the number of sources or tracer ,the model won't run as "VALUES OF THE TRACERS AT THE SOURCES" exceed the limit of MAXKEYWORD.
When I try to run the model in parallel,the model Fail to run when I use sources and tracers larger than 6 ,attachment are the log when I use 7 sources and 7 traces.
Hope you can help me,thanks very much.

liuy
INITIALISING TELEMAC2D FOR
 INBIEF (BIEF): NOT A VECTOR MACHINE (ACCORDING TO YOUR DATA)
 STRCHE (BIEF): NO MODIFICATION OF FRICTION

 NUMBER OF LIQUID BOUNDARIES:           1

 CORFON (TELEMAC2D): NO MODIFICATION OF BOTTOM

 SOURCE POINT            1 PUT ON POINT
   361460.312500000       AND    3128881.00000000
 LOCATED AT    5.38216092761111       METRES

 SOURCE POINT            2 PUT ON POINT
   361518.781250000       AND    3128866.00000000
 LOCATED AT    8.93618215809678       METRES

 SOURCE POINT            3 PUT ON POINT
   361558.281250000       AND    3128863.25000000
 LOCATED AT    8.96020934821358       METRES

 SOURCE POINT            4 PUT ON POINT
   361612.968750000       AND    3128852.00000000
 LOCATED AT    5.78724688977156       METRES

 SOURCE POINT            5 PUT ON POINT
   361631.281250000       AND    3128842.00000000
 LOCATED AT    6.47623359370284       METRES

 SOURCE POINT            6 PUT ON POINT
   361619.937500000       AND    3128796.75000000
 LOCATED AT    10.2746730485065       METRES

 SOURCE POINT            7 PUT ON POINT
   361590.500000000       AND    3128770.25000000
 LOCATED AT    5.90529423486034       METRES

================================================================================
 ITERATION        0    TIME:   0.0000 S

                            BALANCE OF T1 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000

                            BALANCE OF T2 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000

                            BALANCE OF T3 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000

                            BALANCE OF T4 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000

                            BALANCE OF T5 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000

                            BALANCE OF T6 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000

                            BALANCE OF T7 (UNIT:  * M3)
     INITIAL QUANTITY OF TRACER   :    0.000000
 TELEMAC2D INITIALISED
 THE LIQUID BOUNDARIES FILE CONTAINS
        1441  LINES WITH:
 1        2        3        4        5        6        7        8
 9        10       11       12       13       14       15       16
 17       18       19       20       21       22       23       24
 25       26       27       28       29       30       31       32
 33       34       35       36       37       38       39       40
 41       42       43       44       45       46       47       48
 49       50       51       52       53       54       55       56
 57       58       59       60       61       62       63       64
 65       66       67       68       69       70       71       72
 73       74       75       76       77       78       79       80
 81       82       83       84       85       86       87       88
 89       90       91       92       93       94       95       96
 97       98       99       100      101      102      103      104
 105      106      107      108      109      110      111      112
 113      114      115      116      117      118      119      120
 121
-------------------------------------------------------------------------- 
USING STREAMLINE VERSION 7.3 FOR CHARACTERISTICS

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD  PARALLEL::ORG_CHARAC_TYPE1:: NOMB NOT IN RANGE [0..MAX_BASKET_SIZE]

with errorcode 2.  MAX_BASKET_SIZE, NOMB:           10          11



NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. PLANTE: PROGRAM STOPPED AFTER AN ERROR

You may or may not see output from other processes, depending on RETURNING EXIT CODE:            2

exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 46435 on
node Mariana exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[Mariana:46432] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[Mariana:46432] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
_____________
runcode::main:
:
   |runCode: Fail to run
   |mpirun -wdir  telemacCode/examples/telemac2d/0/t2d_tideFEMNeap1_6.cas_2018-07-24-16h07min07s -np 2  telemacCode/examples/telemac2d/0/t2d_tideFEMNeap1_6.cas_2018-07-24-16h07min07s/out_user_fortran
   |~~~~~~~~~~~~~~~~~~
   |[Mariana:46432] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
   |~~~~~~~~~~~~~~~~~~
The administrator has disabled public write access.

Failure of parallel run with many sources and tracers 6 years 4 months ago #30994

  • riadh
  • riadh's Avatar
Hello

In order to user such number of sources and tracers, you should not change the value of MAXKEYWORD. you have to change the following:
1- MAXIMUM NUMBER OF SOURCES=100 (default value 20)
2- MAXIMUM NUMBER OF TRACERS=100 (default value 20)

Be carefull with the use of liquid boundary file, in which the order of source discharge and tracer values should be respected. See the user manual for more details.

I hope this helps

with my best regards

Riadh
The administrator has disabled public write access.
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.