Welcome, Guest
Username: Password: Remember me

TOPIC: Parallel Installation:attempting to use an MPI routine before initial

Parallel Problem: 7 years 10 months ago #24808

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
It looks like metis crashed.
Is there more information in the partel_T2DGEO.log fiel that is in your temporary folder ?
If not try to rerun the following command in your temporary folder:
/home/huyquangtran/telemac/v7p2/builds/uniopenmpi/bin/partel < PARTEL.PAR
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24809

  • huyquangtran
  • huyquangtran's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 271
  • Thank you received: 23
Dear Yoann,

There is a Partel_T2DGEO.log in the temporary folder as attached.

Thanks & Best Regards
Huy

+
+
PARTEL/PARRES: TELEMAC METISOLOGIC PARTITIONER

REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)
JEAN-MICHEL HERVOUET (LNHE)
CHRISTOPHE DENIS (SINETICS)
YOANN AUDOUIN (LNHE)
PARTEL (C) COPYRIGHT 2000-2002
BUNDESANSTALT FUER WASSERBAU, KARLSRUHE

METIS 5.0.2 (C) COPYRIGHT 2012
REGENTS OF THE UNIVERSITY OF MINNESOTA

BIEF V7P2R0 (C) COPYRIGHT 2012 EDF
+
+


MAXIMUM NUMBER OF PARTITIONS: 100000

+
+

--INPUT FILE NAME <INPUT_NAME>:
INPUT: T2DGEO
--INPUT FILE FORMAT <INPFORMAT> [MED,SERAFIN,SERAFIND]:
INPUT: SERAFIN
--BOUNDARY CONDITIONS FILE NAME:
INPUT: T2DCLI
--NUMBER OF PARTITIONS <NPARTS> [2 -100000]:
INPUT: 2
PARTITIONING METHOD <PMETHOD> [1 (METIS) OR 2 (SCOTCH)]:
--INPUT: 1
--CONTROL SECTIONS FILE NAME (OR RETURN) :
NO SECTIONS
--CONTROL ZONES FILE NAME (OR RETURN) :
NO ZONES
--WEIR FILE NAME (OR RETURN) :
NO WEIRS
--GEOMETRY FILE NAME <INPUT_NAME>:
INPUT: T2DGEO
--GEOMETRY FILE FORMAT <GEOFORMAT> [MED,SERAFIN,SERAFIND]:
INPUT: SERAFIN
+---- PARTEL: BEGINNING
+


READ_MESH_INFO: TITLE= TELEMAC 2D : GOUTTE D'EAU DANS UN BASSIN$
NUMBER OF ELEMENTS: 8978
NUMBER OF POINTS: 4624

FORMAT NOT INDICATED IN TITLE


ONE-LEVEL MESH.
NDP NODES PER ELEMENT: 3
ELEMENT TYPE : 10
NPOIN NUMBER OF MESH NODES: 4624
NELEM NUMBER OF MESH ELEMENTS: 8978

THE INPUT FILE ASSUMED TO BE 2D
THERE ARE 1 TIME-DEPENDENT RECORDINGS
THE MESH PARTITIONING STEP STARTS
BEGIN PARTITIONING WITH METIS
ERROR: TRY TO RUN PARTEL WITH A SERIAL CONFIGURATION



PLANTE: PROGRAM STOPPED AFTER AN ERROR
RETURNING EXIT CODE: 2
Attachments:
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24810

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
You get this error when partel was compiled wihtout -DHAVE_MPI.
You should try recompiling ("compileTELEMAC.py --clean")
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24811

  • huyquangtran
  • huyquangtran's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 271
  • Thank you received: 23
Dear Yoann,

I did include -DHAVE_MPI in my configuration. After I got this error, I have recompiled, clean up everything, but there is nothing changed.

Do you have any further suggestion?

Best Regards
Huy

[uniopenmpi]
options: parallel mpi
#
par_cmdexec: <config>/partel < PARTEL.PAR >> <partel.log>
#
mpi_cmdexec: mpif90 -wdir <wdir> -n <ncsize> <exename>
mpi_hosts:
#
cmd_obj: gfortran -c -O3 -DAHVE_MPI -DAHVE_VTK -cpp -fconvert=big-endian -frecord-marker=4 <mods> <incs> <f95name>
cmd_lib: ar cru <libname> <objs>
cmd_exe: gfortran -fconvert=big-endian -frecord-marker=4 -lpthread -v -o <exename> <objs> <libs>
#
incs_all: -I/usr/local/easybuild/software/OpenMPI/1.10.2-GCC-4.9.2-openib/include
libs_all: -L/usr/local/easybuild/software/OpenMPI/1.10.2-GCC-4.9.2-openib/lib/libmpi.so
-L/usr/local/easybuild/software/METIS/5.1.0-GCC-4.9.2/lib/libmetis.a
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24812

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
In your systel you have:
-DAHVE_MPI -DAHVE_VTK
instead of
-DHAVE_MPI -DHAVE_VTK
...
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
The following user(s) said Thank You: huyquangtran

Parallel Problem: 7 years 10 months ago #24813

  • huyquangtran
  • huyquangtran's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 271
  • Thank you received: 23
How stupid I am :woohoo: , I will recompile it, and see how it goes.

Thanks

Huy
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24823

  • huyquangtran
  • huyquangtran's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 271
  • Thank you received: 23
Hi,

I have recompiled with a new configuration, and still got errors with parallel run. Do you know what wrong?

Could you please help?

Thanks

Huy



[huyquangtran@spartan seiche]$ python /home/huyquangtran/telemac/v7p2/scripts/python27/runcode.py telemac2d -s -c parallel --ncsize=2 t2d_seiche.cas


Loading Options and Configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

_ _ _
| | (_) (_)
_ _ _ __ | | __ _ __ ___ __ __ _ __ _ __ ___ __ __ _ ___ _ ___ _ __
| | | || '_ \ | |/ /| '_ \ / _ \ \ \ /\ / /| '_ \ | '__| / _ \\ \ / /| |/ __|| | / _ \ | '_ \
| |_| || | | || < | | | || (_) | \ V V / | | | | | | | __/ \ V / | |\__ \| || (_) || | | |
\__,_||_| |_||_|\_\|_| |_| \___/ \_/\_/ |_| |_| |_| \___| \_/ |_||___/|_| \___/ |_| |_|


... parsing configuration file: /home/huyquangtran/telemac/v7p2/configs/systel.mint.cfg


Running your CAS file for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+> configuration: parallel
+> root: /home/huyquangtran/telemac/v7p2


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


... reading the main module dictionary

... processing the main CAS file(s)
+> running in English

... handling temporary directories

... checking coupling between codes

... checking parallelisation

... first pass at copying all input files
copying: geo_seiche.slf /home/huyquangtran/telemac/v7p2/examples/telemac2d/seiche/t2d_seiche.cas_2017-01-17-13h42min42s/T2DGEO
copying: t2d_seiche.f /home/huyquangtran/telemac/v7p2/examples/telemac2d/seiche/t2d_seiche.cas_2017-01-17-13h42min42s/t2dfort.f
copying: geo_seiche.cli /home/huyquangtran/telemac/v7p2/examples/telemac2d/seiche/t2d_seiche.cas_2017-01-17-13h42min42s/T2DCLI
re-copying: /home/huyquangtran/telemac/v7p2/examples/telemac2d/seiche/t2d_seiche.cas_2017-01-17-13h42min42s/T2DCAS
copying: telemac2d.dico /home/huyquangtran/telemac/v7p2/examples/telemac2d/seiche/t2d_seiche.cas_2017-01-17-13h42min42s/T2DDICO

... checking the executable
created: t2d_seiche
re-copying: t2d_seiche /home/huyquangtran/telemac/v7p2/examples/telemac2d/seiche/t2d_seiche.cas_2017-01-17-13h42min42s/out_t2d_seiche

... modifying run command to MPI instruction

... modifying run command to PARTEL instruction

... partitioning base files (geo, conlim, sections, zones and weirs)
+> /home/huyquangtran/telemac/v7p2/builds/parallel/bin/partel < PARTEL.PAR >> partel_T2DGEO.log
Current memory used: 0 bytes
Maximum memory used: 0 bytes
***Memory allocation failed for CreateGraphDual: nptr. Requested size: 137439004680 bytes
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
partel 0000000000535345 Unknown Unknown Unknown
partel 0000000000532F67 Unknown Unknown Unknown
partel 00000000004E57B4 Unknown Unknown Unknown
partel 00000000004E55C6 Unknown Unknown Unknown
partel 00000000004998B6 Unknown Unknown Unknown
partel 000000000049DB80 Unknown Unknown Unknown
libpthread.so.0 00007FC92E84F100 Unknown Unknown Unknown
partel 0000000000418AB4 Unknown Unknown Unknown
partel 0000000000444CCA Unknown Unknown Unknown
partel 000000000040952E Unknown Unknown Unknown
libc.so.6 00007FC92E49FB15 Unknown Unknown Unknown
partel 0000000000409429 Unknown Unknown Unknown
runPartition:
|runPARTEL: Could not split your file T2DGEO (runcode=174) with the error as follows:
|
|... The following command failed for the reason above (or below)
|/home/huyquangtran/telemac/v7p2/builds/parallel/bin/partel < PARTEL.PAR >> partel_T2DGEO.log
|
| You may have forgotten to compile PARTEL with the appropriate compiler directive
| (add -DHAVE_MPI to your cmd_obj in your configuration file).
|
|Here is the log:
|
|
| +
+
|
| PARTEL/PARRES: TELEMAC METISOLOGIC PARTITIONER
|
|
|
| REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)
|
| JEAN-MICHEL HERVOUET (LNHE)
|
| CHRISTOPHE DENIS (SINETICS)
|
| YOANN AUDOUIN (LNHE)
|
| PARTEL (C) COPYRIGHT 2000-2002
|
| BUNDESANSTALT FUER WASSERBAU, KARLSRUHE
|
|
|
| METIS 5.0.2 (C) COPYRIGHT 2012
|
| REGENTS OF THE UNIVERSITY OF MINNESOTA
|
|
|
| BIEF V7P2R0 (C) COPYRIGHT 2012 EDF
|
| +
+
|
|
|
|
|
| MAXIMUM NUMBER OF PARTITIONS: 100000
|
|
|
| +
+
|
|
|
| --INPUT FILE NAME <INPUT_NAME>:
|
| INPUT: T2DGEO
|
| --INPUT FILE FORMAT <INPFORMAT> [MED,SERAFIN,SERAFIND]:
|
| INPUT: SERAFIN
|
| --BOUNDARY CONDITIONS FILE NAME:
|
| INPUT: T2DCLI
|
|--NUMBER OF PARTITIONS <NPARTS> [2 -100000]:
|
| INPUT: 2
|
| PARTITIONING METHOD <PMETHOD> [1 (METIS) OR 2 (SCOTCH)]:
|
| --INPUT: 1
|
| --CONTROL SECTIONS FILE NAME (OR RETURN) :
|
| NO SECTIONS
|
| --CONTROL ZONES FILE NAME (OR RETURN) :
|
| NO ZONES
|
| --WEIR FILE NAME (OR RETURN) :
|
| NO WEIRS
|
| --GEOMETRY FILE NAME <INPUT_NAME>:
|
| INPUT: T2DGEO
|
| --GEOMETRY FILE FORMAT <GEOFORMAT> [MED,SERAFIN,SERAFIND]:
|
| INPUT: SERAFIN
|
| +---- PARTEL: BEGINNING
+
|
|
|
|
|
| READ_MESH_INFO: TITLE= newSelafin
|
| NUMBER OF ELEMENTS: 11186
|
| NUMBER OF POINTS: 6400
|
|
|
| FORMAT NOT INDICATED IN TITLE
|
|
|
|
|
| ONE-LEVEL MESH.
|
| NDP NODES PER ELEMENT: 3
|
| ELEMENT TYPE : 10
|
| NPOIN NUMBER OF MESH NODES: 6400
|
| NELEM NUMBER OF MESH ELEMENTS: 11186
|
|
|
| THE INPUT FILE ASSUMED TO BE 2D
|
| THERE ARE 1 TIME-DEPENDENT RECORDINGS
|
|
|
| THERE IS 1 SOLID BOUNDARIES:
|
|
|
| BOUNDARY 1 :
|
| BEGINS AT BOUNDARY POINT: 1 , WITH GLOBAL NUMBER: 1
|
| AND COORDINATES: 0.000000 0.000000
|
| ENDS AT BOUNDARY POINT: 1 , WITH GLOBAL NUMBER: 1
|
| AND COORDINATES: 0.000000 0.000000
|
| THE MESH PARTITIONING STEP STARTS
|
| BEGIN PARTITIONING WITH METIS
|
| RUNTIME OF METIS 0.0000000E+00 SECONDS
|
| THE MESH PARTITIONING STEP HAS FINISHED
|
| ISOLATED BOUNDARY POINT 5 4
|
| ISOLATED BOUNDARY POINT 4 1
Attachments:
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24824

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
It looks like there is an error with your mesh:
| ISOLATED BOUNDARY POINT 5 4
|
| ISOLATED BOUNDARY POINT 4 1 

Check you boundary file.

Hope it helps.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24825

  • huyquangtran
  • huyquangtran's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 271
  • Thank you received: 23
Hi Yoann,

This is an example from: examples/telemac2d/seiche.

I just want to test my configuration, but i failed.

Do you have any more idea?

Thanks

Huy
The administrator has disabled public write access.

Parallel Problem: 7 years 10 months ago #24827

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Could be an error from your compilation of metis 32/64 bits ?
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.