Welcome, Guest
Username: Password: Remember me

TOPIC: Cluster - Optimisation

Cluster - Optimisation 12 years 9 months ago #3638

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hello,

we have access to a cluster which has 16 cores per computer node and has 1314 nodes with infiniband QDR link. We don't have access to all nodes but potentially to some of them B)

So far I used 1 node for the Telemac-2D and -3D simulations because using e.g. 2 nodes, always, resulted in a slower computation time, independent of the mesh size.
Is there a way to get faster computation times using more nodes?

It was a big and delightful first step from scalar Telemac to parallel Telemac and now I want more!

Clemens
The administrator has disabled public write access.
The following user(s) said Thank You: TelemacUser1

Re: Cluster - Optimisation 12 years 9 months ago #3639

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

I am surprised -- you should see the opposite ...

As was said on this website within some of the other forum topics, multi-core is not really adapted for our types of scientific codes, in that it does not scale linearly. I am guessing you are seeing 8 to 10 times faster on you 16-core (many-core ?) ... which is not probably not bad as opposed to nothing.

However TELEMAC has a better speed-up accross multiple nodes. UK-STFC Daresbury Laboratories (one of the TELEMAC-MASCARET Consortium newest members) did tests on super HPC up to thousands of nodes.

In summary, we are confident it works. Would it be possible that you think you use 2 nodes but that your queuing system does not split on 2 nodes and that everything is still plaved on one node ?

Good luck in your investigation.

Sébastien.
The administrator has disabled public write access.

Re: Cluster - Optimisation 12 years 9 months ago #3640

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Clemens

Cluster configuration could be sometimes a difficult task if you don't have the possibility to well adapt all the settings possibilities.

As Sebastien said, the multicore is not as efficient than the multiprocessors due to memory bandwidth. But it also depend of the processor architecture (maybe you could join a description of the cluster?)

Personally, i've made some different cluster installation and the parallelism allways speed up the computation when the settings are OK.

First, i think you should ask to an IT manager to help you to configure the job manager because the majority of problems comes from here.
Hereunder, is a list of test you could do to check the scalability.
  • Start with an unique model (eventually without any fortran file) or a telemac test case (malpasset for example)
  • check the run with 8 processors to see the location of each process
  • try to modify the job manager to run the 8 process on 8 different empty nodes (This should be the best configuration)

Hope this helps
Christophe
The administrator has disabled public write access.

Re: Cluster - Optimisation 12 years 9 months ago #3794

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hello Sébastien and Christophe,

thank you for the answers and suggestions!
Yes, it was actually what you said that I had only access to one node.
In the meantime the issue has been resolved by the IT technicians and Telemac with more nodes works faster now!
If you are interested in the cluster hardware:

www.vsc.ac.at/about-vsc/vsc-pool/vsc-2

Best regards,

Clemens
The administrator has disabled public write access.

Re: Cluster - Optimisation 12 years 9 months ago #3800

  • a.weisgerber
  • a.weisgerber's Avatar
Clemens

If possible, it would be great if you could write a little summary of your findings with parallel processing in the "Benchmark" category of the forum.

You can indicate the speedup obtained, the type and size of the model, the hardware setup and number of nodes/cores used...

This will be very helpful for new users that are trying to understand what can be obtained by using parallel processing.

Many thanks
The administrator has disabled public write access.

Re: Cluster - Optimisation 12 years 9 months ago #3826

  • konsonaut
  • konsonaut's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 413
  • Thank you received: 144
Hello,

enclosed a picture of a Telemac-2D benchmark test.
I used the malpasset validation case with large mesh (53081 nodes / 104000 elements) and the related steering file: t2d_malpasset-large_p0.cas. The files can be found in the validation cases.
The hardware is described in a link provided in a post above, Fortran compiler is Intel XE 2011.
Some important parameters given in the steering file:

FE
Wave equation
Type of advection: 6;5
Time step: 1.0 s
Numer of time steps: 4000
Listing printout period: 10
Graphic printout period: 1000

VSC-2_benchmark_results_639x385.png


Best regards,

Clemens
The administrator has disabled public write access.
The following user(s) said Thank You: a.weisgerber

Re: Cluster - Optimisation 12 years 9 months ago #3829

  • a.weisgerber
  • a.weisgerber's Avatar
Thank you Clemens.

I copied your post into the benchmark section of the forum for reference purpose. Feel free to add information to it when you have more results in the future.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.