Welcome, Guest
Username: Password: Remember me

TOPIC: v8p3r0 failed to run any examples in parallel

v8p3r0 failed to run any examples in parallel 2 years 7 months ago #40135

  • biodc172
  • biodc172's Avatar
Hi,
I compiled telemac-mascrate v8p3r0 version and the compilation works fine. But when I try to run the example in diractory /example/telemac2d/gouttedo using the command telemac2d.py t2d_gouttedo.cas --ncsize=4. it pops with the following error:
Running your CAS file(s) for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

gfortranHPC: 
    

    +> Gfortran compiler 11.2.0 with mpich 3.1.4

    +> root:    /es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0
    +> module: ad / api / artemis / bief
               damocles  / gaia  / gretel  / hermes
               identify_liq_bnd  / khione  / mascaret  / nestor
               parallel  / partel  / postel3d  / sisyphe
               special  / stbtel  / telemac2d  / telemac3d
               tomawac / waqtel


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


... processing the steering file

... checking parallelisation

... handling temporary directories
         copying: t2d_gouttedo.cas -> <root>/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/T2DCAS
         copying: telemac2d.dico -> <root>/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/T2DDICO
         copying: geo_gouttedo.cli -> <root>/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/T2DCLI
         copying: geo_gouttedo.slf -> <root>/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/T2DGEO
[                                                                ]   0[\\\\\\\\\\\\\\\\\\\\\\                                            ]                                                                        [                                                                ]   0[\\\\\\\\\\\\\\\\                                                  ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\                                 ]                                                                        [                                                                ]   0[\\\\\\\\\\\\\\\\                                                  ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\                                 ]                                                                                 
... partitioning base files (geo, conlim, sections, zones and weirs)
[                                                                ]   0[\\\\\                                                             ]  [\\\\\\\\\\\                                                       ]  [\\\\\\\\\\\\\\\\                                                  ]  [\\\\\\\\\\\\\\\\\\\\\\                                            ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\                                       ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\                                 ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\                            ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\                      ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\                 ]  [\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\           ]                                                                             +> /es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/builds/gfortranHPC/bin/partel < partel_T2DGEO.par >> partel_T2DGEO.log
   Current memory used:           0 bytes
   Maximum memory used:           0 bytes
***Memory allocation failed for CreateGraphDual: nptr. Requested size: 9208409919624 bytes
STOP 0

... splitting / copying other input files

... checking the executable
  > compiling objs
         compiling: user_condin_h.f ... completed
         compiling: user_condin_trac.f ... completed
         created: out_user_fortran


Running your simulation(s) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



In /es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s:
mpirun -n 4 /es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/out_user_fortran


 MASTER PROCESSOR NUMBER            0  OF THE GROUP OF            4
 EXECUTABLE FILE: /es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/A.EXE
 LISTING OF TELEMAC2D------------------------------------------------------------------------------

                TTTTT  EEEEE  L      EEEEE  M   M  AAAAA  CCCCC
                  T    E      L      E      MM MM  A   A  C
                  T    EEE    L      EEE    M M M  AAAAA  C
                  T    E      L      E      M   M  A   A  C
                  T    EEEEE  LLLLL  EEEEE  M   M  A   A  CCCCC

                        2D    VERSION V8P3   FORTRAN 2003

                        ~^~^~^~^~^~^~^~^~^~^~^^~^~^~^~^~^~
                          ~                            ~
                               \   '    o      '
                               /\ o       \  o
                             >=)'>    '   /\ '
                               \/   \   >=)'>        ~
                               /    /\    \/
                        ~         >=)'>   /     .
                                    \/                   )
                                    /                   (
                                          ~          )   )
                          }     ~              (    (   (
                         {                      )    )   )
                          }  }         .       (    (   (
                         {  {               /^^^^^^^^^^^^
                        ^^^^^^^^^\         /
                                  ^^^^^^^^^

 WARNING IN DICTIONARY:
 FOR KEYWORD: FINITE VOLUME SCHEME FOR TRACER DIFFUSION
 THE NUMBER OF DEFAULT VALUES            1  IS DIFFERENT FROM THE DECLARED SIZE            2

 DIFFERENT NUMBER OF PARALLEL PROCESSORS:
 DECLARED BEFORE (CASE OF COUPLING ?):           4
 TELEMAC-2D :           0
 VALUE            4  IS KEPT

                   ********************************************
                   *               LECDON:                    *
                   *        AFTER CALLING DAMOCLES            *
                   *        CHECKING OF DATA  READ            *
                   *         IN THE STEERING FILE             *
                   ********************************************

 EXITING LECDON. NAME OF THE STUDY:
 TELEMAC 2D: DROPLET IN A BASIN

 OPENING FILES FOR TELEMAC2D
 OPENING: T2DGEO-geo_gouttedo.slf
 OPENING: T2DCLI-geo_gouttedo.cli
 OPENING: T2DRES-r2d_gouttedo_v1p0.slf

                          *****************************
                          *    MEMORY ORGANIZATION    *
                          *****************************

 READ_MESH_INFO: TITLE= TELEMAC 2D : GOUTTE D'EAU DANS UN BASSIN$
            NUMBER OF ELEMENTS:     8710
            NUMBER OF POINTS:     4624

            TYPE OF ELEMENT: TRIANGLE
            TYPE OF BND ELEMENT: POINT

            SINGLE PRECISION FORMAT (R4)

Abort(2) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 2) - process 2
Abort(2) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 2) - process 3
Traceback (most recent call last):
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/telemac2d.py", line 7, in <module>
    main('telemac2d')
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/runcode.py", line 279, in main
    run_study(cas_file, code_name, options)
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/execution/run_cas.py", line 169, in run_study
    run_local_cas(my_study, options)
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/execution/run_cas.py", line 65, in run_local_cas
    my_study.run(options)
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/execution/study.py", line 637, in run
    self.run_local()
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/execution/study.py", line 465, in run_local
    run_code(self.run_cmd, self.sortie_file)
  File "/es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/scripts/python3/execution/run.py", line 182, in run_code
    raise TelemacException('Fail to run\n'+exe)
utils.exceptions.TelemacException: Fail to run
mpirun -n 4 /es01/home/jnist10/chengqz/software/telemac_mascaret/telemac-mascaret-v8p3r0/examples/telemac2d/gouttedo/t2d_gouttedo.cas_2022-04-01-14h28min01s/out_user_fortran

I am using Python 3.8 gcc11.2.0 mpich 3.4.2
I tried to compile it with different versions of mpich, they all raise the same error with no specific indication.
I attached the cfg file and my pysource script.

Could somebody help me please
Best regards.
Attachments:
The administrator has disabled public write access.

v8p3r0 failed to run any examples in parallel 2 years 7 months ago #40159

  • pham
  • pham's Avatar
  • OFFLINE
  • Administrator
  • Posts: 1559
  • Thank you received: 602
Hello,

I suppose the computation is OK with one core, is not it?

In the temporary file, you should have listing files for each subdomain when running in parallel. The ones for nodes 2 and 3 should be investigated, you may have more information written inside.

Anyway, I have an installation of TELEMAC with mpich 3.2.1 and it works fine.
You can also try to use openmpi if it better works for you.

Hope this helps,

Chi-Tuan
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.