Welcome, Guest
Username: Password: Remember me

TOPIC: ***** Memory Corruption in read_dataset - solved

***** Memory Corruption in read_dataset - solved 7 years 11 months ago #24511

  • j_floyd
  • j_floyd's Avatar
This has been replicated from a previous post because it is important and caused a headache for me for a number of days.

A READ_DATASET memory corruption occurs due to bad array sizes passed to get_data_value. Memory checks dont work because get_data_value redefines the array size with NPOIN_PREV size.

The problem ...
VARSO3 allocates some variables with DIM1=0 (eg if hydrostatic calc). See point_telemac3d.

However the get_data_value call in read_dataset passes the size of the array as NPOIN_PREV regardless of the array allocated size. This causes overwites of memory.

This error sometimes shows itself - sometimes not. But it may explain some strange behaviours that I could not explain.

A Solution ...
Either use the DIM1 value for the variable or skip the read if dim1 is zero. Not sure how using dim1 effects the interpolation system so the safest approach is to skip the read if dim1==0.

Hope there are no more of these in the code. Real array sizes need to be passed for all variables, or use the OS calls.
The administrator has disabled public write access.

***** Memory Corruption in read_dataset - solved 7 years 11 months ago #24567

  • riadh
  • riadh's Avatar
Hello John

Sorry for this late reply again.

I'm trying to fix the bug and I need to validate it. Is it possible to send me your initial model that causes the trouble to test the efficiency of my patch.
(you can use directly my mail riadh dot ata at edf dot fr)
Thank you
with my kind regards

Riadh
The administrator has disabled public write access.

***** Memory Corruption in read_dataset - solved 7 years 11 months ago #24573

  • j_floyd
  • j_floyd's Avatar
Glad to hear that someone is looking at it. I was disappointed originally.

All you need is a continued computation, the wconv and wc,wn,wd variables are all set to zero length (unless a nonhydrostatic run had been done). There are probably some other variables that are setup in point_telemac3d etc with zero size. Need to ensure that none of them are filled using global array sizes eg in discrete do loops. I havent checked if the OS subroutines do a DIM1 check? They should.

Whether it shows up in the run really depends on the compiler memory layout and what the overwrites impact on. So I expect a random variable response between different machines. This highlights the problem of using pointers, the same problem that C code also suffers by default.

The problem showed itself when I tried to print out the TA variables whilst debugging something else. BUT it didnt always do the same thing. Eg different versions of telemac seemed to run ok even though the same coding error was evident. So feeding you my example wont necessarily show up.

My solution was to skip the read for any variable that DIM1==0. But really need to check that DIM1>=NPOIN3.

cheers
John
The administrator has disabled public write access.

***** Memory Corruption in read_dataset - solved 7 years 11 months ago #24575

  • riadh
  • riadh's Avatar
That's clear. I'll do it.
Thank you.
The administrator has disabled public write access.

***** Memory Corruption in read_dataset - solved 7 years 11 months ago #24585

  • j_floyd
  • j_floyd's Avatar
Thought of something over night ...

The problem with read_dataset is that those non-used variables have been written to the continue file, and probably filled the vlues with effectively random data. So DIM1==0 variables should not be written to start with. This is done with the TA varaibles - if they are not calculated then they are not written to the continue file.

However it still shows that if variables are allowed to be defined with zero allocated space more checking has to be done on the manipulation of all variables. Allocations less then npoin2 or npoin3 should not be allowed.

Another reason why this can be difficult to reproduce is that for security reasons a lot of compiler systems are now randomly choosing where to allocate variable space, rather than sequentially, so that the structure of the program source does not indicate where to find specific variables in the memory space -> makes it more difficult for hackers who gain access to running machines. Not really a problem for our type of programs but is very important system software and databases, especially where the program structure is available through open source.

cheers
john
The administrator has disabled public write access.
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.