Welcome, Guest
Username: Password: Remember me

TOPIC: run and configure MPI

run and configure MPI 9 years 1 month ago #18559

  • j.dasilva
  • j.dasilva's Avatar
Hello,

I'm build my own cluster for few projects (for now only with 2 computers for test). I use gfortran and MPICH2.

I have compiled the systel.cfg and run the test case “digue” for telemac2d. I have used the command :
$ sudo mpiexec -f /home/promethee/machinefile -n 2 python /home/promethee/telemac/scripts/python27/telemac2d.py /home/promethee/test/digue/t2d_digue.cas

I have this error message with I suppose a problem with PARTEL
or my configuration(see the attachment). I put also the config file.

hoping have some help from the community

Best regards
Attachments:
The administrator has disabled public write access.

run and configure MPI 9 years 1 month ago #18581

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

In your temporary folder there should be a file named partel_T2DGEO.log. Could you post it here it should contain the answer.

Hope it helps.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

run and configure MPI 9 years 1 month ago #18586

  • j.dasilva
  • j.dasilva's Avatar
hello

thank you for your time
Attachments:
The administrator has disabled public write access.

run and configure MPI 9 years 1 month ago #18591

  • j.dasilva
  • j.dasilva's Avatar
the link seems dead and i can't access to my post ... i don't know why

sorry i post the log file here
+
+

PARTEL: TELEMAC SELAFIN METISOLOGIC PARTITIONER
+
+

PARTEL: TELEMAC SELAFIN METISOLOGIC PARTITIONER
REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)

JEAN-MICHEL HERVOUET (LNHE)
REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)
CHRISTOPHE DENIS (SINETICS)
JEAN-MICHEL HERVOUET (LNHE)
YOANN AUDOUIN (LNHE)
CHRISTOPHE DENIS (SINETICS)
PARTEL (C) COPYRIGHT 2000-2002
YOANN AUDOUIN (LNHE)
BUNDESANSTALT FUER WASSERBAU, KARLSRUHE
PARTEL (C) COPYRIGHT 2000-2002

BUNDESANSTALT FUER WASSERBAU, KARLSRUHE
METIS 5.0.2 (C) COPYRIGHT 2012

REGENTS OF THE UNIVERSITY OF MINNESOTA
METIS 5.0.2 (C) COPYRIGHT 2012

REGENTS OF THE UNIVERSITY OF MINNESOTA
BIEF 6.2 (C) COPYRIGHT 2012 EDF

+
+
BIEF 6.2 (C) COPYRIGHT 2012 EDF

+
+



MAXIMUM NUMBER OF PARTITIONS: 100000
MAXIMUM NUMBER OF PARTITIONS: 100000


+
+
+
+




SELAFIN INPUT NAME <INPUT_NAME>:
SELAFIN INPUT NAME <INPUT_NAME>:
INPUT: T2DGEO
INPUT: T2DGEO

BOUNDARY CONDITIONS FILE NAME :

INPUT: T2DCLI
BOUNDARY CONDITIONS FILE NAME :
INPUT: T2DCLI


NUMBER OF PARTITIONS <NPARTS> [2 -100000]:
NUMBER OF PARTITIONS <NPARTS> [2 -100000]:
INPUT: 2
INPUT: 2

PARTITIONING OPTIONS:

PARTITIONING OPTIONS:

PARTITIONING METHOD <PMETHOD> [1 (METIS) OR 2 (SCOTCH)]:

INPUT: 1
PARTITIONING METHOD <PMETHOD> [1 (METIS) OR 2 (SCOTCH)]:

WITH SECTIONS? [1:YES 0:NO]:
INPUT: 1

INPUT: 0
WITH SECTIONS? [1:YES 0:NO]:
INPUT: 0

WITH ZONES? [1:YES 0:NO]:
INPUT: 0

WITH ZONES? [1:YES 0:NO]:
INPUT: 0


ONE-LEVEL MESH.
ONE-LEVEL MESH.
NDP NODES PER ELEMENT: 3
NDP NODES PER ELEMENT: 3
NPOIN NUMBER OF MESH NODES: 9734
NPOIN NUMBER OF MESH NODES: 9734
NELEM NUMBER OF MESH ELEMENTS: 19202
NELEM NUMBER OF MESH ELEMENTS: 19202


THE INPUT FILE ASSUMED TO BE 2D SELAFIN
THE INPUT FILE ASSUMED TO BE 2D SELAFIN
TIMESTEP: 0.00000000 S = 0.00000000 H
TIMESTEP: 4.00000000 S = 1.11111114E-03 H
THERE ARE 2 TIME-DEPENDENT RECORDINGS
TIMESTEP: 0.00000000 S = 0.00000000 H
TIMESTEP: 4.00000000 S = 1.11111114E-03 H
THERE ARE 2 TIME-DEPENDENT RECORDINGS

THERE IS 2 LIQUID BOUNDARIES:

BOUNDARY 1 :
BEGINS AT BOUNDARY POINT: 199 , WITH GLOBAL NUMBER: 6479
AND COORDINATES: 1232.500 -450.5842
ENDS AT BOUNDARY POINT: 226 , WITH GLOBAL NUMBER: 6876
AND COORDINATES: 1232.500 450.9624

BOUNDARY 2 :
BEGINS AT BOUNDARY POINT: 72 , WITH GLOBAL NUMBER: 1964
AND COORDINATES: 0.000000 60.00000
ENDS AT BOUNDARY POINT: 80 , WITH GLOBAL NUMBER: 1965
AND COORDINATES: 0.000000 -60.00000

THERE IS 2 SOLID BOUNDARIES:

BOUNDARY 1 :
BEGINS AT BOUNDARY POINT: 80 , WITH GLOBAL NUMBER: 1965
AND COORDINATES: 0.000000 -60.00000
ENDS AT BOUNDARY POINT: 199 , WITH GLOBAL NUMBER: 6479
AND COORDINATES: 1232.500 -450.5842

BOUNDARY 2 :
BEGINS AT BOUNDARY POINT: 226 , WITH GLOBAL NUMBER: 6876
AND COORDINATES: 1232.500 450.9624
ENDS AT BOUNDARY POINT: 72 , WITH GLOBAL NUMBER: 1964
AND COORDINATES: 0.000000 60.00000
THE MESH PARTITIONING STEP STARTS
BEGIN PARTITIONING WITH METIS

THERE IS 2 LIQUID BOUNDARIES:

BOUNDARY 1 :
BEGINS AT BOUNDARY POINT: 199 , WITH GLOBAL NUMBER: 6479
AND COORDINATES: 1232.500 -450.5842
ENDS AT BOUNDARY POINT: 226 , WITH GLOBAL NUMBER: 6876
AND COORDINATES: 1232.500 450.9624

BOUNDARY 2 :
BEGINS AT BOUNDARY POINT: 72 , WITH GLOBAL NUMBER: 1964
AND COORDINATES: 0.000000 60.00000
ENDS AT BOUNDARY POINT: 80 , WITH GLOBAL NUMBER: 1965
AND COORDINATES: 0.000000 -60.00000

THERE IS 2 SOLID BOUNDARIES:

BOUNDARY 1 :
BEGINS AT BOUNDARY POINT: 80 , WITH GLOBAL NUMBER: 1965
AND COORDINATES: 0.000000 -60.00000
ENDS AT BOUNDARY POINT: 199 , WITH GLOBAL NUMBER: 6479
AND COORDINATES: 1232.500 -450.5842

BOUNDARY 2 :
BEGINS AT BOUNDARY POINT: 226 , WITH GLOBAL NUMBER: 6876
AND COORDINATES: 1232.500 450.9624
ENDS AT BOUNDARY POINT: 72 , WITH GLOBAL NUMBER: 1964
AND COORDINATES: 0.000000 60.00000
THE MESH PARTITIONING STEP STARTS
BEGIN PARTITIONING WITH METIS
RUNTIME OF METIS 0.00000000 SECONDS
THE MESH PARTITIONING STEP HAS FINISHED
RUNTIME OF METIS 0.00000000 SECONDS
THE MESH PARTITIONING STEP HAS FINISHED
The administrator has disabled public write access.

run and configure MPI 9 years 1 week ago #18913

  • j.dasilva
  • j.dasilva's Avatar
Hi

I comeback to you again. I have reinstalled the cluster and did some tests.
The configs file is in attachment. I have tried to start a run in parrallel using the processeurs from my master and my node (two processors for each machine).
For start a job i use the command:
mpiexec -n<number of processor> -f <host_file> telemac2d.py <cas_file>

The job starts correctly but the processors don't work together but separately. Concretely when i start a job with 3 processors for example, I have the same job running on two processors so two result files (the third one being the HYD_Proxy).

I have the same problem when i start the job on one machine as well.

I suppose a mistake on my configs file but i don't know where. If the community can enlighten me …

Julien
Attachments:
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.