Welcome, Guest
Username: Password: Remember me

TOPIC: Malpasset case benchmark on several intel multicore CPUs

Malpasset case benchmark on several intel multicore CPUs 11 years 5 months ago #8956

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hi all,

I used the MPICH2 ability to distribute telemac2d malpasset-large computation in PCs in my windows cluster. There exist 3 different CPU architectures in the cluster:
  • Intel Core i7 860 @2.80GHz (4 cores)
  • Intel Core2 Duo @3.00GHz(2 cores)
  • Intel Xeon E3 1220L V2 @2.30GHz (2 cores)

Unfortunately, I am unable to distribute computation using the gfortran compiler, so the benchmark results are based only on the Intel fortran compiler.

The optimization flags used with the intel compiler for all the machines are:
/O3 /QaxSSE4.2 /arch:SSE4.1

There are, of course, more aggressive optimizations that can be employed when targeting a single CPU, especially the newer Xeon E3.

Chart4_CPU_Arch_Bench.png


The results show that the benefit of using more than 2 cores on the core i7 is of little benefit. Unfortunately, the Intel Xeon CPU is installed on a low power compact machine and it has only 2 cores. I would be very keen to see whether Xeon quad core CPUs, or the latest i7's CPUs experience the same bottleneck when using more than 2 cores.

Some attention should paid to the 'Intel Turbo Boost' technology (Core i7 and Xeon E3). It changes the core clock rate differently when using 1, 2 and 3 or 4 cores, which adds a small penalty to computations times when using more than 1 core. However, this penalty cannot explain the poor results that i7 gives, which must be an issue with the CPU architecture.

It would also be interesting to see how AMD multicore CPUs are doing on this aspect.

Regards,
Costas
The administrator has disabled public write access.

Malpasset case benchmark on several intel multicore CPUs 11 years 5 months ago #8957

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Costas

Thanks a lot for this feedback.

In my opinion, and also with comparison to the test I made (and report on topix #7409) it seems that windows is probably responsible of a large part of those "bad" results.

On Linux (RHEL), the speed-up on this test case is linear up to 16 cores...

Regards
Christophe
The administrator has disabled public write access.

Malpasset case benchmark on several intel multicore CPUs 11 years 5 months ago #8959

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hi Christophe,

In the post you indicated, the CPUs of the machine you tested must the Xeon E7-8837 not I7s. Information on the CPU specs can be found here:
http://ark.intel.com/products/53576/

I suspect that Xeon CPUs are more capable than the 'consumer' i7 in multicore computations and might not experience the same bottleneck. It seems quite logical considering the huge price difference between the two CPU families.

Regards,
Costas
The administrator has disabled public write access.

Malpasset case benchmark on several intel multicore CPUs 10 years 4 months ago #13504

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Just an update to my previous results:

Chart4_CPU_Arch_Bench_v2.png


These results were obtained using mingw-64 compiler (version 4.8.3) and msmpi (version 4.2.4400).

Compilation was done specifically for each architecture (although it appeared that it makes little difference) and the -Ofast optimizations were used.

It is impressive how nicely the XeonE3 CPU scales up to 4 cores, even though it is a 2-core chip (4 virtual cores with hyperthreading enabled).

Costas
The administrator has disabled public write access.

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.