open TELEMAC-MASCARET The mathematically superior suite of solvers

Skip to content
Jump to main navigation and login
Jump to additional information

Nav view search

Navigation

Search

You are here: Home

Potential vectorization errors in BIEF routines

Welcome, Guest

Potential vectorization errors in BIEF routines

TOPIC: Potential vectorization errors in BIEF routines

Potential vectorization errors in BIEF routines 10 years 5 months ago #13233

jaj
OFFLINE
Senior Boarder
Posts: 69
Thank you received: 7

Dear all,

this message does not concern Telemac-3D but the library BIEF, but I do not know where to send it.

By searching for reasons of "non-reproducibility" of results with Telemac-3D between consecutive runs I have found that in the following routines of BIEF:

cflp12.f
cg1112.f
cg1113.f
vc08bb.f

the forced vectorization directives !DIR$ IVDEP before some loops are always active. This means, that when one uses Intel Fortran Compiler with optimisation -O2 or higher on an Intel Xeon processor, the given loops are forcibly vectorized for the use in this processor SIMD unit of a given vector length. If one has not sorted the mesh appropriately, the results might be unpredictable (does anyone there remember that one has to sort your mesh for a vector processor?). This happens even when a "serial machine" LVMAC=LV=1 is declared.

In theory, clean programming with forced vectorization would require the methodology as applied, for example, in assve1.f:

IF(LV.EQ.1) THEN
!
! SCALAR MODE
!
DO 40 IELEM = 1 , NELEM
X(IKLE(IELEM)) = X(IKLE(IELEM)) + W(IELEM)
40 CONTINUE
!
ELSE
!
! VECTOR MODE
!
DO 60 IB = 1,(NELEM+LV-1)/LV
!VOCL LOOP,NOVREC
!DIR$ IVDEP
DO 50 IELEM = 1+(IB-1)*LV , MIN(NELEM,IB*LV)
X(IKLE(IELEM)) = X(IKLE(IELEM)) + W(IELEM)
50 CONTINUE
60 CONTINUE
!
ENDIF

In this case the forced vectorization is active only if the user wishes this explicitly (and the mesh is appropriately sorted).

NOTICE: The bugs of this kind might be especially nasty with random behavior and can go undetected for years...

Best regards,
Jacek

PS. One switches the vectorization with -no-vec -no-simd off. jaj

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13239

jmhervouet

Hello Jacek,

This is stunning, it was totally overlooked (I'll remove the lines), but could be interesting, do you think we should revive the old good renumbering process and force vectorisation ? Many years ago the speed-up of vectorisation was about a factor 10 in Telemac, this was left over with the rising power of parallelism.

Regards,

Jean-Michel

P.S. you will be interested to know that reproducibility between scalar and parallel with Tomawac will be in the next version 7.0. Finite element assembly is done with 8 bytes integers. With other modules there remains the problem of dot product of large vectors, which precludes the use of integers.

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13242

jaj
OFFLINE
Senior Boarder
Posts: 69
Thank you received: 7

Dear Jean-Michel,

I am just testing Telemac-3D with the Wesel (3D) example for the runs on a MIC (vel Xeon Phi) (co-)processor in the so-called native modus, i.e. executing completely on the (separate) PCIe-card. MIC has 60 x86-cores which are "weak" -- with limited instruction set compared to Xeon and each with only 512KB cache -- but each equipped with twice so long SIMD vector unit as a Xeon core, 512 Bits, i.e. 8 single precision, or 4 double precision vector length. Because each card supports an operating system, you have an impression you have logged in to a Linux shared memory (very) parallel machine with 60 (weak) vector processors. You can immediately compile and run any parallel legacy program you already have.

However, Telemac is badly optimized for serial execution (we know why), therefore the (although many) serial cores cannot do better than Xeon CPU cores, and the (too few) vectorized parts cannot be sped up much with the vector length of 4... I have a slowdown of usually 10 with one MIC compared to 16 Xeon cores, at best - with all resources and tricks -- 6 times slower.

What is annoying, with all (auto)vectorization on, the results are (very!) different between consecutive runs on the MIC. Only switching the vectorization off (-no-vec -no-simd) delivers predictable results. Therefore I checked all that forced vectorisation from the mighty Telemac@Cray past, but unfortunately in vain, throwing all IVDEPs out brings nothing.

We bother a bit about the new processors - the coming "Knights Landing" will have 72 stronger Atom cores and SIMD length doubled compared to MIC and can be applied as the main node processor, not only as a co-processor. But unfortunately, Telemac does not like the new many-core architecture.

Best regards,
Jacek

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13249

Lufia	Dear all, are the routines really called by telemac3d in the Wesel example? I've used some simple write statements in the routines to check if they get called and had no success. I'm not an expert in Bief, but from the comments in the source code it looks as this parts of Bief are used for the QUASI-BUBBLE elements? Best regards, Leo
	The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13250

jmhervouet

Hello,

Exactly, in fact, cg1112.f and cg1113.f can be vectorised without risk (and maybe a compiler will do it without being told), only cflp12.f and vc08bb.f may have backward dependencies that would require a specific numbering of elements, so I removed the CDIR$ IVDEP for further versions for these last two subroutines.
Now if we want to take advantage of vectorisation we should work on matrix-vector product with edge-based storage, which questions the numbering of elements, brand new subject...

Regards,

Jean-Michel

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13252

jaj
OFFLINE
Senior Boarder
Posts: 69
Thank you received: 7

Hello,

yes it is unfortunately true that removing the potential vectorization errors in the questioned routines does not help in the Telemac-3D Wesel example. When I noticed that the results are "non-reproducible" with vectorization on and reproducible with vectorization off, I have, as a veteran Cray or Fujitsu vector computer user, immediately searched for the forced vectorization directives in the sources and discovered the above mentioned bugs. In vain, the results on the MIC are consequently still not reproducible. So this is not the end of the story. I think I have to try another examples, maybe this disease concerns only Telemac-3D, or some specific part of it. Anyway, -no-vec -> no problem.

However - this might explain the strange "random" errors occurring from time to time also on the Xeon CPU (or other CPUs with SIMD vector unit) and Intel compiler applying with a higher optimisation SSE and/or AVX instruction set. I was shocked to have one of these "Telemac ghosts appearances" on the normal Xeon CPU, what triggered my more serious small research into the code.

Best regards,
Jacek

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13254

jaj
OFFLINE
Senior Boarder
Posts: 69
Thank you received: 7

Dear Jean-Michel,

we have done quite a few years ago (ca. 2007?) a small research with Telemac-2D concerning the usage of a truly parallel vector computer of Hitachi. In theory this would be an ideal architecture for Telemac, which was written on a vector computer for a vector computer, with once 70-80% vectorized code (a very good result!) - and then parallelized with the domain decomposition. Perfect?

Unfortunately it occured that although all data structures and (most of) loops remained ready-for-vectorization (and therefore causing annoying massive cache misses on serial processors...), for success we had to go back to pre-2000 routines with all that element-by-element storage in order to get vectorized runs - and only for execution on -one- vector processor. The reason for this was, that we could sort only for EBE, and not for the edge-based storage, and, of course, we could sort only whole meshes and not their partitions for the MPI-parallel runs... And the sorting programs run for long hours, hours... Urgh! Given up.

A similar situation is for the many-core architecture (MIC: cores augmented with SIMD vector units); due to small caches you get punished with cache-misses even more than on the CPU cores due to the vector data structures, and because of the missing sorting routines for forced vectorization you cannot use vector units effectively. Compiler-made auto-vectorization (supposed to be safe?) brings only ca. 5% execution time improvement.

So, it seems we got stuck in the past, don't we?

Best regards,
Jacek

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13253

Lufia	Hello, I've started a small test with the gfortan compiler and the standard compiler flags for opensuse for the WESEL example. My local machine is a Intel i5-3470, so far (after 3 runs) the results are reproducible. But it needs some more tests. Maybe the Intel Compiler and the aggressive optimization/vectorization is the problem? Best regards, Leo
	The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13255

jaj
OFFLINE
Senior Boarder
Posts: 69
Thank you received: 7

Dear Leo,

must admit I use gfortran only sporadically, I have had some problems with it in the past (UnTRIM user interface was "too modern"). Please check if it forcibly vectorizes the given loops in question (important) and if they are executed at all in your example. By Intel, it would be -vec-report=3 or higher.

Yes I do not like Intel compiler optimizations as well, they do much too much for speed on the costs of the accuracy.

Best regards,
Jacek

The administrator has disabled public write access.

Potential vectorization errors in BIEF routines 10 years 5 months ago #13256

jaj
OFFLINE
Senior Boarder
Posts: 69
Thank you received: 7

Hello,

in order to finish this thread: It occurs that results of Telemac-3D (example Wesel) and Telemac-2D (example Donau) are perfectly reproducible between consecutive runs on the MIC (Xeon Phi) processor, when one applies by the Intel Fortran optimization -O2 (auto-vectorization on!) additionaly the floating point model "source", it is "-fp-model source". It means all intermediate results are rounded up to the source-defined precision. The default is "-fp-model fast=1" and it seems to be not so good for Telemac. One consults the Intel Compiler manual for details.

(This maybe also a hint also for other procesors...)

Best regards,
jaj

The administrator has disabled public write access.

Moderators: pham

Potential vectorization errors in BIEF routines

Powered by Kunena Forum

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.

Latest News

TUC-2025 announcement
25 October 2024

Great news: The next TELEMAC User Conference will be organised by Bangor University on the 15th and 16th of October 2025, at the School of Ocean Sciences in Menai Bridge, on the Isle of Anglesey, Wale [ ... ]

16 May 2024 TUC 2024 announcement
26 March 2024 Telemac-Mascaret user, your opinion matters !
26 July 2023 Junior Researcher (Doctoral Candidate)
27 December 2022 TUC 2023 announcement

More inNews - Events News - Career News - Events

Latest forum posts

- Running fully coupled and decoupled
- 20 hours 30 minutes ago
- Scouring simulation of clear water riverbed
- 20 hours 33 minutes ago
- BK 3.12.23 - issue with map object and .i2s.
- 1 day 11 hours ago
- Simulation fails when running with prescribed SSC ...
- 1 day 12 hours ago
- BK 3.12.23 - beta
- 1 day 14 hours ago