Welcome, Guest
Username: Password: Remember me

TOPIC: Parallel performance of T2D on infiniband cluster

Parallel performance of T2D on infiniband cluster 14 years 5 months ago #240

  • jeremie
  • jeremie's Avatar
  • OFFLINE
  • Junior Boarder
  • Hydro-Quebec
  • Posts: 39
  • Thank you received: 7
Hi all,

We currently use Telemac2D v5p9 at Hydro-Quebec on a 120-cpu cluster. I thought I would share with you some numbers on the parallel performance of T2D.

The benchmark was performed on Xeon 5570 2.93 machines (8 procs per node) connected through an infiniband switch. The sample case is a 264000-element mesh (133000 nodes). Simulation time is 1800 s with a 1 s time step.

Calculation times are as follows:

nProcs tcalc (s)
2 1883
4 758
8 515
16 272
24 156
32 107
40 86
48 68
56 59
64 44
72 39
80 57
88 51

The attached figure shows relative calculation time as a function of nprocs. The dashed line is the expected theoretical relative time, our reference time being set to 1.0 for our first test on 2 procs. We expect calc time to decrease by half from 2 procs to 4 procs, by 4 from 2 to 8, and so on...

The figure shows that T2D's performance is very close to the relative times we were expecting. On our current setup, there is no performace gain over 72 procs. This is probably due to the increased communication overhead.

perf.png


We would be very interested to see what other are getting. For the sake of comparison, I will soon post some numbers for the Malpasset case.
The administrator has disabled public write access.

Re:Parallel performance of T2D on infiniband cluster 14 years 4 months ago #256

  • jmhervouet
  • jmhervouet's Avatar
Congratulations !

Larger cases may run optimally up to several thousands of processors (something like 100 millions elements was reached at Daresbury laboratory near Manchester).

On my HP z600 Linux machine on the Malpasset test case I get : 54 s on 1 processor and 10 s with 8 processors, but this is a small case : 26000 elements (however it took 24 hours in 1993...).

Regards, Jean-Michel Hervouet
The administrator has disabled public write access.

Re:Parallel performance of T2D on infiniband cluster 14 years 4 months ago #276

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Bravo HydroQC !

I should add a couple interesting facts:

- multi-core processors (ideally suited for multi-threading, just as GPGPU) are not the same than multi-processors (ideally suited for distributed simulation, like the automated domain decomposition of TELEMAC). Indeed, if you have a 3 computers, each with 2 quad-core processors, you can run 3x2 = 6 simulations with a speed-up factor of 6 or so assuming you use 2 cores of each of your computers. With quad-core processors, you can probably push to use 2 of each quad-core without too much degradation of the speed-up, again as long as you use all 3 computers. After that it rapidly decrease – besides, this is only true with the i7 Intel processor. Note that you would get the same speed-up with dual-core processors as quad-core have not produced the speed-up intended compared to dual-core processors.

- quad-core are just as fast as dual-core

- EDF and laboratories in the UK are working hard to optimise TELEMAC for multi-core computer with an hybrid MPI and OpenMP parallelisation. The best gain will always be with the number of computers or blades you have, rather using multi-core processors as algebra accelerators (just as we would with GPGPU).

Sebastien.
The administrator has disabled public write access.

Re:Parallel performance of T2D on infiniband cluster 14 years 2 months ago #636

  • LeoP
  • LeoP's Avatar
The parallel performance on multi core processors (there are already 6-core available) is limited by memory access. You can steer that somewhat by minimizing memory acces through combination of tasks on the same memory in one loop instead of multiple loops. Sequential access is also less demanding than random access.

Leo
The administrator has disabled public write access.

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.