Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Parallel issue

Parallel issue 11 years 9 months ago #7183

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Hi everybody,

I've apparently compiled telemac with success (ubuntu, 64bits, openmpi and gfortran). When I try to run the malpasset case in parallel, I get the following message.

n=0 (scalar mode is ok), n=1 is ok.

when n>1
I get the following message :
Screenshotfrom2013-01-28180531.png


partel can't create partel_T2DREF.log (empty file)

Any thought?
Thanks for your help
Attachments:
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7186

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

It seems that metis is crashing. What is the size of your mesh? (Number of elements)
Try to launch the partel command from the temporary folder you should get more information.
The command is /home/alexis/opentelemac/v6p2/parallel/parallel_v6p2/ubugfopenmpi/partel < PARTEL.PAR

Thanks
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7189

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Yugi,

I run the malpasset large mesh case. I thought first that I used inappropriate solver for parallelism but as you can see my problem comes at the partitionning step.

I am working on malpasset cases. First I tried with the large model (104000 elements), I get the same error message probably linked with memory.
Current memory used: 424656 bytes
Maximum memory used: 424656 bytes
***Memory allocation failed for CreateGraphDual: nind. Requested size: 1493823985345728 bytes


I tried with the small model (26000 elements), and the problem is the same.
Current memory used: 108336 bytes
Maximum memory used: 108336 bytes
***Memory allocation failed for CreateGraphDual: nind. Requested size: 41197326371688 bytes


As recommended, I have tried the partel command from the temporary folder :

alexis@alexis-HP-Compaq-6005-Pro-MT-PC:~/malp/t2d_malpasset-small_p0.cas_2013-01-29-09h10min32s$ /home/alexis/opentelemac/v6p2/parallel/parallel_v6p2/ubugfopenmpi/partel < PARTEL.PAR

+
+
PARTEL: TELEMAC SELAFIN METISOLOGIC PARTITIONER

REBEKKA KOPMANN & JACEK A. JANKOWSKI (BAW)
JEAN-MICHEL HERVOUET (LNHE)
CHRISTOPHE DENIS (SINETICS)
PARTEL (C) COPYRIGHT 2000-2002
BUNDESANSTALT FUER WASSERBAU, KARLSRUHE

METIS 4.0.1 (C) COPYRIGHT 1998
REGENTS OF THE UNIVERSITY OF MINNESOTA

BIEF 5.9 (C) COPYRIGHT 2008 EDF
+
+

=> THIS IS A PRELIMINARY DEVELOPMENT VERSION
DATED: TUE JAN 27 11:11:20 CET 2009

MAXIMUM NUMBER OF PARTITIONS: 100000

+
+


SELAFIN INPUT NAME <INPUT_NAME>: INPUT: T2DREF

BOUNDARY CONDITIONS FILE NAME : INPUT: T2DCLI

NUMBER OF PARTITIONS <NPARTS> [2 -100000]: INPUT: 2

PARTITIONING OPTIONS:

PARTITIONING METHOD <PMETHOD> [1 OR 2]: INPUT: 1

WITH SECTIONS? [1:YES 0:NO]: INPUT: 0

ONE-LEVEL MESH.
NDP NODES PER ELEMENT: 3
NPOIN NUMBER OF MESH NODES: 13541
NELEM NUMBER OF MESH ELEMENTS: 26000

THE INPUT FILE ASSUMED TO BE 2D SELAFIN
TIMESTEP: 0.00000000 S = 0.00000000 H
TIMESTEP: 200.000000 S = 5.55555560E-02 H
TIMESTEP: 400.000000 S = 0.111111112 H
TIMESTEP: 600.000000 S = 0.166666672 H
TIMESTEP: 800.000000 S = 0.222222224 H
TIMESTEP: 1000.00000 S = 0.277777791 H
TIMESTEP: 1200.00000 S = 0.333333343 H
TIMESTEP: 1400.00000 S = 0.388888896 H
TIMESTEP: 1600.00000 S = 0.444444448 H
TIMESTEP: 1800.00000 S = 0.500000000 H
TIMESTEP: 2000.00000 S = 0.555555582 H
TIMESTEP: 2200.00000 S = 0.611111104 H
TIMESTEP: 2400.00000 S = 0.666666687 H
TIMESTEP: 2600.00000 S = 0.722222209 H
TIMESTEP: 2800.00000 S = 0.777777791 H
TIMESTEP: 3000.00000 S = 0.833333313 H
TIMESTEP: 3200.00000 S = 0.888888896 H
TIMESTEP: 3400.00000 S = 0.944444418 H
TIMESTEP: 3600.00000 S = 1.00000000 H
TIMESTEP: 3800.00000 S = 1.05555558 H
TIMESTEP: 4000.00000 S = 1.11111116 H
THERE ARE 21 TIME-DEPENDENT RECORDINGS

THERE IS 1 SOLID BOUNDARIES:

BOUNDARY 1 :
BEGINS AT BOUNDARY POINT: 1 , WITH GLOBAL NUMBER: 546
AND COORDINATES: 619.8345 5099.195
ENDS AT BOUNDARY POINT: 1 , WITH GLOBAL NUMBER: 546
AND COORDINATES: 619.8345 5099.195
USING ONLY METIS_PARTMESHDUAL SUBROUTINE
THE MESH PARTITIONING STEP METIS STARTS
Current memory used: 108336 bytes
Maximum memory used: 108336 bytes
***Memory allocation failed for CreateGraphDual: nind. Requested size: 41197326371688 bytes
THE MESH PARTITIONING STEP HAS FINISHED
RUNTIME OF METIS 0.00000000 SECONDS
ISOLATED BOUNDARY POINT 597 1727
ISOLATED BOUNDARY POINT 502 13166
ISOLATED BOUNDARY POINT 1044 8798
ISOLATED BOUNDARY POINT 508 13177
ISOLATED BOUNDARY POINT 610 1620
ISOLATED BOUNDARY POINT 2248 9163
ISOLATED BOUNDARY POINT 617 8947
ISOLATED BOUNDARY POINT 627 8932
ISOLATED BOUNDARY POINT 1085 9460
ISOLATED BOUNDARY POINT 707 13119
ISOLATED BOUNDARY POINT 734 9641
ISOLATED BOUNDARY POINT 2260 139
ISOLATED BOUNDARY POINT 773 9727
ISOLATED BOUNDARY POINT 771 9644
ISOLATED BOUNDARY POINT 204 10213
ISOLATED BOUNDARY POINT 222 10363
ISOLATED BOUNDARY POINT 357 9824
ISOLATED BOUNDARY POINT 720 9007
ISOLATED BOUNDARY POINT 816 11528
ISOLATED BOUNDARY POINT 822 11449
ISOLATED BOUNDARY POINT 800 12202
ISOLATED BOUNDARY POINT 1001 11992
ISOLATED BOUNDARY POINT 846 11954
ISOLATED BOUNDARY POINT 796 11212
ISOLATED BOUNDARY POINT 870 13019
ISOLATED BOUNDARY POINT 847 12827
ISOLATED BOUNDARY POINT 807 2128
ISOLATED BOUNDARY POINT 806 2291
ISOLATED BOUNDARY POINT 989 1711
ISOLATED BOUNDARY POINT 1104 1905
ISOLATED BOUNDARY POINT 139 1707
ISOLATED BOUNDARY POINT 166 12754
ISOLATED BOUNDARY POINT 990 1608
ISOLATED BOUNDARY POINT 988 1899
ISOLATED BOUNDARY POINT 2250 1746
ISOLATED BOUNDARY POINT 566 2261
ISOLATED BOUNDARY POINT 570 1897
ISOLATED BOUNDARY POINT 573 2099
ISOLATED BOUNDARY POINT 578 1985
ISOLATED BOUNDARY POINT 1018 1713
ISOLATED BOUNDARY POINT 1012 1917
ISOLATED BOUNDARY POINT 1020 8777
ISOLATED BOUNDARY POINT 596 8778
ISOLATED BOUNDARY POINT 595 1895
ISOLATED BOUNDARY POINT 964 9490
ISOLATED BOUNDARY POINT 1008 13120

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F051B9890F7
#1 0x7F051B9896D4
#2 0x7F051B0CE49F
#3 0x41B087 in MAIN__ at partel.f:?
Segmentation fault (core dumped)

Cheers,
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7346

  • gourish
  • gourish's Avatar
Hi,

I was also getting the same error as mentioned by Bernard, when I tried to run TELEMAC in parallel.
The TELEMAC system (version 6.2) which I had compiled was using metis version 4.0.3.
When I checked the file "partel.f" I found that it is using metis version 5.0.2.

I have recompiled TELEMAC using metis version 5.0.2 and now partel is not crashing however I am getting a file read error at line 472 of "partel.f" when it reads the file 'T2DGEO'.

I am trying to fix the file problem by correcting the code, however has anyone else encountered the same problem in reading 'T2DGEO' in "partel.f"?

I shall post the corrected "partel.f" if I am able to correct the read problem in "partel.f".

Regards,
Gourish
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7347

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

This might be un-related but it helped me in the past to increase the value of MAXFRO from 300 to 3000 (for instance). Except for PARTEL, the change should be made in DECLARATIONS_TELEMAC2D (and other declarations_*)

Notes that the changes have been made in coming release v6p3.

Hope this helps.
Sebastien.
The administrator has disabled public write access.
The following user(s) said Thank You: gourish

Parallel issue 11 years 9 months ago #7359

  • gourish
  • gourish's Avatar
Hi,
Another thing which I forgot to mention is that I was testing partel program in standalone mode by compiling partel.f separately, in which case I was getting the file read error as I have mentioned in the earlier reply.

When I compiled telemac with metis v5.0.2, the parallel version worked perfectly without any error.

I have tested the validation case 051_mersey from telemac version 6.2, and it has completed the calculation with 2 and 4 number of processors.

I advice Abernard to compile telemac with metis v5.0.2, and then try running the case in parallel.

Cheers
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7414

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Hi everybody,

I have found a solution to make it works. When I was compiling metis-5.0.2, I switch IDXTYPEWIDTH to 64 (as recommended when reading the compilation procedure). It seems that it was the source of my problem. I tried with IDXTYPEWIDTH = 32 in include/metis.h, and recompiled the parallel module and now it's working.

Can anyone confirm me that IDXTYPEWIDTH 64 is not suitable (even if my machine works on 64bits).

Thanks for your help,
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7420

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Hi,

The only reason I can think of is that all your code was compiled in 32 bites as well. Sometimes the compiler does not switch to 64 bites by default you have to force it by adding an option.

Cheers.
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7438

  • abernard
  • abernard's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 210
  • Thank you received: 45
Hi Yugi,

My computing skills are limited. Anyway, it works in parallel and it's now a very interesting optimization comparing to my starting point.

My first installation was the v6p1 under windows xp sp3 32 bits, scalar mode, gfortran. It was a shame to use my system HP Compaq 6005 Pro/AMD Phenom II X4 B95 with this configuration.
I switched to ubuntu 64 bits available for AMD, gfortran/openmpi.

If I understand, if libmetis.a (compiled with IDXTYPEWIDTH 64) returns an error and limetis.a (compiled with IDXTYPEWIDTH 32)doesn't, it means that my telemac system is compiled in 32bits. I follow the step by step tutorial and there is no information about 32 or 64bits compilation. Does it mean that my gfortran (suggested by my distribution) compiled telemac in 32bits ?

Regards,
The administrator has disabled public write access.

Parallel issue 11 years 9 months ago #7449

  • yugi
  • yugi's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 851
  • Thank you received: 244
Yes I think it has.
What do you get when you type gfortran -v?
There are 10 types of people in the world: those who understand binary, and those who don't.
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.