Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Error in format of geo file during parallelisation

Error in format of geo file during parallelisation 7 years 7 months ago #26174

Dear Telemac users,

I'm facing a problem when trying to parallelize a tidal model. There is an error affecting geo temporary files.

" ERROR IN FORMAT OF FILE /home/user/tide.cas_2017-04-21-19h55min1
7s/T2DGEO00003-00001IT IS A SERAFIND FILE "



I changed GEOMETRY FILE FORMAT (SERAFIN or SERAFIND) but i always get the error.

I checked the size of my temporary files (from T2DGEO00003-00000 to T2DGEO00003-00003 for ncsize=4 )and it appears that my first temporary file T2DGEO00003-00000 have the same size than my original T2DGEO file and all the others represent only few octets (288). Even when i changed the ncsize. So it seems that parallelisation of my GEO file is wrong but i can't figure out.

I used also runSELAFIN.py to pass from SERAFIN to SERAFIND my GEOfile, unsuccessfully..

Does anyone have a clue or something that could help me ?

Thank you
Jean-Rémy
Attachments:
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 7 months ago #26210

  • riadh
  • riadh's Avatar
Hello

Do you run the case in a scalar mode?
It seems to be only a simple uncompatible format of the geometry file.

kind regards

Riadh ATA
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 7 months ago #26213

Hello riadh, thank you for your answer.

I can run it in a scalar mode, without any problem so it's not the format of geo file which is the problem.

But, I tried to run malpasset test case and I found messsage error 'ISOLATED BOUNDARY POINT' similar to this subject :

www.opentelemac.org/index.php/community-...l-issue?limitstart=0


In fact, i realized, thanks to this discussion that I forgot to compile metis library. For the moment I have to wait for that technical service download Cmake on the server because I need it to compile. Then, i will be able to run it in parallel, without problem I hope !

Kind regards

Jean-Rémy
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26626

Hi riadh,
In fact, i downloaded new metis, parmetis etc and i don't think the problem comes from my grid because I have the same problem with malpasset test case.

I compiled telemac with compileTELEMAC.py and my systel.cfg. In scalar mode it is ok with ncsize=1 but from ncsize>1 i still have the error and not a valid partitioning

I'm wondering if the problem comes from mpi library etc ....

Anyone has an idea or had the same problem before ?

I specify that i'm working with thor cluster ( i provide my systel.cf and my submission script)

Thanks

Jean-Rémy
Attachments:
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26627

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
Did you check if the partitioning step is OK?
Are you able to open the partitioned selafin files?

regards

ps: you could try to just run the partitioning step by specifying --split in the command line
Christophe
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26628

Hi c.coulet,

When i check the partel.log everything seems ok except that runtime of metis = 0s



+

+
PARTEL/PARRES: TELEMAC METISOLOGIC PARTITIONER

REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)
JEAN-MICHEL HERVOUET (LNHE)
CHRISTOPHE DENIS (SINETICS)
YOANN AUDOUIN (LNHE)
PARTEL (C) COPYRIGHT 2000-2002
BUNDESANSTALT FUER WASSERBAU, KARLSRUHE

METIS 5.0.2 (C) COPYRIGHT 2012
REGENTS OF THE UNIVERSITY OF MINNESOTA

BIEF 7.1 (C) COPYRIGHT 2012 EDF
+
+


MAXIMUM NUMBER OF PARTITIONS: 100000

+
+

--INPUT FILE NAME <INPUT_NAME>:
INPUT: T2DGEO
--INPUT FILE FORMAT <INPFORMAT> [MED,SERAFIN,SERAFIND]:
INPUT: SERAFIN
--BOUNDARY CONDITIONS FILE NAME:
INPUT: T2DCLI
--NUMBER OF PARTITIONS <NPARTS> [2 -100000]:
INPUT: 2
PARTITIONING METHOD <PMETHOD> [1 (METIS) OR 2 (SCOTCH)]:
--INPUT: 1
--CONTROL SECTIONS FILE NAME (OR RETURN) :
NO SECTIONS
--CONTROL ZONES FILE NAME (OR RETURN) :
NO ZONES
--GEOMETRY FILE NAME <INPUT_NAME>:
INPUT: T2DGEO
--GEOMETRY FILE FORMAT <GEOFORMAT> [MED,SERAFIN,SERAFIND]:
INPUT: SERAFIN
+---- PARTEL: BEGINNING
+


READ_MESH_INFO: TITLE= LE BARRAGE DE MALPASSET
NUMBER OF ELEMENTS: 104000
NUMBER OF POINTS: 53081

FORMAT NOT INDICATED IN TITLE


ONE-LEVEL MESH.
NDP NODES PER ELEMENT: 3
ELEMENT TYPE : 10
NPOIN NUMBER OF MESH NODES: 53081
NELEM NUMBER OF MESH ELEMENTS: 104000

THE INPUT FILE ASSUMED TO BE 2D
THERE ARE 1 TIME-DEPENDENT RECORDINGS

THERE IS 1 SOLID BOUNDARIES:

BOUNDARY 1 :
BEGINS AT BOUNDARY POINT: 1 , WITH GLOBAL NUMBER: 583
AND COORDINATES: 619.8345 5099.191
ENDS AT BOUNDARY POINT: 1 , WITH GLOBAL NUMBER: 583
AND COORDINATES: 619.8345 5099.191
THE MESH PARTITIONING STEP STARTS
BEGIN PARTITIONING WITH METIS
RUNTIME OF METIS 0.0000000E+00 SECONDS
THE MESH PARTITIONING STEP HAS FINISHED
TREATING SUB-DOMAIN 1
-- WRITING TIMESTEP 0 AT 0.0000000E+00
TREATING SUB-DOMAIN 2
-- WRITING TIMESTEP 0 AT 0.0000000E+00
OVERALL TIMING: 0.1862000 SECONDS

+---- PARTEL: NORMAL TERMINATION ----+



But when i check my partitioning files in my temporary folder (with a parallelisation with 2 nodes) :

-rw-r--r-- 1 jhugue01 lienss 2.1M Jun 1 14:11 T2DGEO
-rw-r--r-- 1 jhugue01 lienss 2.1M Jun 1 14:11 T2DGEO00001-00000
-rw-r--r-- 1 jhugue01 lienss 268 Jun 1 14:11 T2DGEO00001-00001



In fact, in T2DGEO00001-00001 i have only the title of the geofile, and in the T2DGEO00001-00000 i have the entire T2DGEO file.
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26629

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Ok
So you have a problem with the partitionning step.
First of all, you could try to run it manually to check.
I remember few years ago we had a problem with one of your previous cluster to run in parallel...
regards
Christophe
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26630

Ok then i will explore manually the partel step
Thank you !
Jean-Rémy
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26631

Ok so manually, trying to run ./partel i got the error :

BEGIN PARTITIONING WITH METIS
Current memory used: 0 bytes
Maximum memory used: 0 bytes
***Memory allocation failed for CreateGraphDual: nptr. Requested size: 137439004680 bytes
RUNTIME OF METIS 0.0000000E+00 SECONDS


So , yes the problem is coming from partitioning step.

I don't know for the moment how to get rid of this failure, i tried it on a virtual machine and on an linux environment so i don't think it's coming from the fact i work on a virtual machine ( i was expecting that could be the problem)

But i will try to point exactly at which step the partel.F

I'm also still wondering if my configuration file for the cluster is well done (compilation with this file is ok)

# ____/ Compilation pour Thor avec mpi intel /________________________________/
[thor.intel]
#
options: mpi
#
mpi_cmdexec: mpirun -machinefile MPI_HOSTFILE -np <ncsize> <exename>
#
cmd_obj: mpiifort -c -O3 -convert big_endian -DHAVE_VTK -DNO_INQUIRE_SIZE -DHAVE_MPI <mods> <incs> <f95name>
cmd_lib: ar cru <libname> <objs>
cmd_exe: mpiifort -convert big_endian -o <exename> <objs> <libs>
#
incs_parallel: /opt/intel/impi/5.1/compilers_and_libraries_2016.1.150/linux/mpi/intel64/include
libs_partel: /home/jhugue01/Apps/metis-5.1.0/build/Linux-x86_64/libmetis/libmetis.a

libs_all: /opt/intel/impi/5.1/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib/libmpi.a
/home/jhugue01/Apps/metis-5.1.0/build/Linux-x86_64/libmetis/libmetis.a




Regards,
Jean-Rémy
The administrator has disabled public write access.

Error in format of geo file during parallelisation 7 years 5 months ago #26633

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
It could be great if you could try to recompile partel on your cluster and join the listing or check if everything goes well
You could just run compiletelemac.py -m "clean partel"
Regards
Christophe
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.