Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: Final merged result file appears corrupted.

Final merged result file appears corrupted. 8 years 9 months ago #19470

  • j_floyd
  • j_floyd's Avatar
I have run a Telemac (v7p1r0) 3D hydrodynamics completed without any errors, however when I try to open the result file I get errors. The run log files show that the file merge completed without any declared errors. I am also creating delwaq interface files which also seem to merge without error.

Fudaa (V1.3) will not open the result file - just a generic error message.

Trying to read the same file with the python utilities also fails. Error report is
".. Cannot read 310600 floats from your binary file
+> Maybe it is the wrong file format ?
"

Some extra information. I did a short 2 day run on Friday morning that was able was OK - result file readable. The next run was a 25 day simulation (same system - mpi) which returns 'corrupted' file.

Any suggestions on how to track down what is wrong? I am currently looking at what a python program sees in the file.
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19471

  • j_floyd
  • j_floyd's Avatar
I have now used the parseSELAFIN utilities to step through the data file.

There is a distinct difference bewtween the 1 vs 25 day run.

The 1 day run is loadable in fudaa, and the variables look ok.

The 25 day run is NOT loadable by Fudaa, and the data file is definitely corrupted. Python shows that IKLE3 variable has incorrect values in it, with very large numbers (and some -ve numbers - sign of integer overflow).

Very strange that these errors are linked to length of model run. The same MPI layout is used for all runs.

I wrote a small test python script that checks the binary 'linkages' by reading the record size and seeking to the next record. For the 1 day run, files reads to end. For the larger 25 day run, it crashes with unrealistic values after the INKLE3 record.

Do you have any indication of problems with gfortran (version 5.3.1 64bit) and the gretel prcessing?

The compile uses the following (default) options
-fconvert=big-endian -frecord-marker=4


Cheers
John
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19472

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
You're right, this is strange as the result file for a short simulation and for a long simulation has the same header, this is just the number of results records which change ...

To analyze for precisely your problem, could you confirm that the 1 day result is a result of a parallel run?
If you keep the temporary directory of the computation, could you check the piecewise result (T3DRESXXXXX-XXXXX) and confirm if you could read it or not.
What is the size of the result file?

regards
Christophe
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19473

  • j_floyd
  • j_floyd's Avatar
Thanks for the advice - we have Australia Day holiday tomorrow, so I will get back to you on Wednesday.

I am running the 25 days tonight, leaving all the decomposed results untouched. I will merge them separately.

Suggestion an option to stop the python telemac3d script from deleting the working directory would be useful. eg --keep option.

But yes very strange. I cant really see how this would happen at the moment.
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19475

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
option is -t
look telemac3d.py --help to have the full option list

Regards
Christophe
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19522

  • j_floyd
  • j_floyd's Avatar
Thanks found that .... cheers
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19523

  • j_floyd
  • j_floyd's Avatar
Chrisophe,

Extra info posted separately ..

Confirm 1day run was in run over 10 processors.

Can run the record size chk routine for all the individual T3DRES files without error, that is for all records in the file.

As indicated below the merged RES file finishes at about 4G. The indivual RES files are approx 830M each - there expecting around 8G. The Delwaq merge has no trouble producing an 11G file. So its not a filesystem limit.

Cheers
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19479

  • maximota
  • maximota's Avatar
Hello,
I have a similar problem in Telemac 3D v7p1.
The output files were merged without any errors. But when I'm trying to open the 2D result file in Blue Kenue I get an error: Unknown or invalid file format while reading 2D mesh geometry.
In Matlab telheadr gives an error: inconsistent dimension of IPOBO array
I see that m.IKLE contains numbers bigger than N points.

This is a model run for 20 days. 460000 nodes.

We suspect that something goes wrong during merging. The output file has always the same size (4.159 Gb). Our output file should be at least 30 Gb large. If we run a simulation with an output file smaller than 4.159 Gb then we don't have problems.
If we keep the temporary folder and merge the output files manually with the older version (v6p3), we get correct output file.

Regards,
Tatiana
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19521

  • j_floyd
  • j_floyd's Avatar
Yes I agree - there seems to be some limit at around 4G size.

I have just run recollection manually and although the gretel log file was continuing to write records the RES file size was staying at 4295127012 bytes long. Should be around 8G (adding the individual sizes together).

What fortran compiler/system are you running? I am on linux (>1T byte file size supported in the filesystem I am on).

I am also producing delwaq out and they are fine (even at 11G).

John
The administrator has disabled public write access.

Final merged result file appears corrupted. 8 years 9 months ago #19527

  • j_floyd
  • j_floyd's Avatar
There is a filesize problem. Have found that util_selafin uses stream access for the file read/write - am currently experimenting with enlarging the MY_POS variable to integer*8. Original my_pos was defined as just INTEGER ... which maybe compiler dependent.
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.