Welcome, Guest
Username: Password: Remember me

TOPIC: Benchmark performance of ppUtils vs Selafin vs TelemacFile

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 2 weeks ago #44248

  • Lux
  • Lux's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 96
  • Thank you received: 39
Dear Thomas,

I had a look to github.com/seareport/xarray-selafin/ and see that your tried to define a custom xarray backend for Telemac (Selafin file format).

Do you use full Python scripts with Selafin to read and pputils to write Selafin files?
Could you achieve satisfactory performance on reasonable Telemac file size?

In the case you want to use PyTelTools, let me know, we can extract a single Python file (with only 3 classes : Read, Write and Header).

Waiting forward to hearing good news of Telemac with xarray,
Luc
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44258

  • tomsail
  • tomsail's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 43
  • Thank you received: 17
Hi Luc, thanks for all your inputs

for the xarray backend implementation, I valued 2 criterias for choosing the libraries:
  • Ease of installation: Have a minimal working setup (the least required libraries possible)
  • Performance: Use the fastest tools available

I think the best solution, with regard to the speed tests that were undertaken last week is to use :
1. TelemacFile if HERMES is recognised in the environment
2. revert back to Selafin class if the HERMES is not recognised

more info in the issue I created here: github.com/seareport/xarray-selafin/issues/5

Last but not least, I created git for the speed tests benchmark.

github.com/tomsail/selafin_benchmark

I implemented a minimal working setup for PyTelTools (as I only need the Serafin class, out of all the functions availa ble in your framework)

Feel free to fork, correct what I have done and submit a pull request
The administrator has disabled public write access.
The following user(s) said Thank You: Lux

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44259

  • Lux
  • Lux's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 96
  • Thank you received: 39
Dear Thomas,

Thank you for your message and proposal.
I perfectly understand your point of view (ease of installation AND performance), both are important. And if they are compatible, this is just perfect ;).

I did a PR with PyTelTools improvments, you should achieve better performance.
A benchmark with time below 0.001 second is not very reliable, do not hesitate to compare on larger files, the differences are more relevant. On my laptop (with SSD), I can parse (read the whole file and build Python objects) in around 7 seconds for a 11 Go Selafin file.

Hope it helps,
Luc
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 2 weeks ago #44250

  • borisb
  • borisb's Avatar
  • OFFLINE
  • Admin
  • Posts: 129
  • Thank you received: 68
TelemacFile is intended for use with Hermes, and is much faster than pure Python solutions in this case, as it relies on Fortran subroutines to work with meshes. As fast as Python is in its latest versions, it can't match the speed of compiled code.
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 2 weeks ago #44252

  • Lux
  • Lux's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 96
  • Thank you received: 39
Dear all,

I just did a major improvement in PyTelTools in using np.frombuffer instead of struct.unpack (see commit 24b97ac).
I notice an improvment in the reading performance of a factor 10 at least, which is quite significant.
I would be interested in a benchmark with TelemacFile (Hermes) versus this updated version of PyTelTools.

Thank you in advance,
Luc
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44257

  • Lux
  • Lux's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 96
  • Thank you received: 39
A benchmark for the entire file reading would also be appreciated.
Indeed the header is only read once and the records read are in practice numberous.
This action is similar to `xarray.Dataset.load` (which should not be called in user code).

I implemented all the Python reading/writing optimization I could imagine in PyTelTools, and limit the calling to file.seek (move file object’s current position).
Would be great to compare the Fortran implementation (with Hermes/TelemacFile) with a fully Python version. It think np.frombuffer is quite efficient in comparison with Fortran (Hermes) reading.

Here is the code for PyTelTools (requires the latter update):
with Serafin.Read('r2d.slf', 'en') as resin:
    resin.read_header()  # fills resin.header (read mesh)
    # resin.get_time()  # not necessary in this context (only fills `resin.time` to time serie in in float format)
    for time, values in resin.iter_on_all_frames():
        pass  # print(time, values .shape)

Best Regards,
Luc
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44260

  • tomsail
  • tomsail's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 43
  • Thank you received: 17
Hi Luc,

I merged and adapted the tests to go through all times and tried on the big file I shared in my previous post, as you suggested.

The results are in the README: github.com/tomsail/selafin_benchmark

Well.. huge congrats, you are even faster that HERMES..

Would you be keen to propose an implementation for the xarray backend?
The administrator has disabled public write access.
The following user(s) said Thank You: Lux

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44262

  • Lux
  • Lux's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 96
  • Thank you received: 39
Hi Thomas,

Thank you for the benchmark on small and large files, it is instructive (it shows that numpy is well optimized and the best choice to build arrays from binary data).

I am ready to help for the xarray implementation, even if I am discovering this library... What have already succeeded with hermes/pputils? Implementing an optimized version with lazy loading and dask is challenging ;).

Best Regards,
Luc
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44272

  • tomsail
  • tomsail's Avatar
  • OFFLINE
  • Junior Boarder
  • Posts: 43
  • Thank you received: 17
Hi Luc,

I have implemented PyTelTools as a backend of xarray.
Some of the tests pass, others fail because of this error:
if diff_size != 0 and diff_size != 1:
   raise SerafinValidationError(
      "Something wrong with the file size (header and frames). "
      "File is probably corrupted, difference of %i bytes" % diff_size
   )

I don't know if it is because of the Selafin files or if I did an error when implementing the routine.

I implemented a minimal setup of Serafin.py (which you've seen already on the other git). All major implementations are in
xarray_selafin_backend/xarray_backend.py

You can give me the follow up directly on github

github.com/seareport/xarray-selafin/pull/7
The administrator has disabled public write access.

Benchmark performance of ppUtils vs Selafin vs TelemacFile 10 months 1 week ago #44274

  • sebourban
  • sebourban's Avatar
  • NOW ONLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Thank you indeed Thomas for your benchmark - it shows that when dealing with SERAFIN files, TELEMAC should revert to python in all cases (thanks to Luc's upgrades). That would also reduce the dependencies of the system to several other libraries that we do not need anymore (except for the MED format), and help with the installation of the systeme as a whole.

Looking forward to see xarrays in action.

Sébastien.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.