Welcome, Guest
Username: Password: Remember me

TOPIC: Write out results file for failed simulation

Write out results file for failed simulation 1 year 10 months ago #41882

  • Renault
  • Renault's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 120
  • Thank you received: 33
Hi all,

I had a quick look at the reference manual and through the forums, and couldn't find anything about this. For context, I'm trying to calibrate a GAIA (Wilcock and Crowe, if that matters) simulation based on observed data. I've been incrementing simulation time as I go, in order to get quicker results before attempting longer runs.

The problem I have is that at some point, the simulation will crash due to a subcritical boundary condition. In my most recent simulation, this was after 37 minutes of simulated time, whereas I was attempting to run it for 60 minutes. (A previous run of 20 minutes was successful, so I upped it to 60 minutes.)

What's more, because I'm using an HPC cluster which writes to scratch directories, all the intermediate files are lost when the task exists. This means I can't go look at the most recent files before the crash. I could also rerun the simulation up until the point where it crashed, but that seems like a waste of computing time. All that said, my question is: Is there a way to tell TELEMAC/GAIA to write a results file when the simulation fails (in the same way that it would when the simulation is complete), regardless of the state of completion? This might help me understand why the simulation crashed, e.g. did the upstream boundary sink by 10 feet?

Thanks so much for any ideas!
André
The administrator has disabled public write access.

Write out results file for failed simulation 1 year 10 months ago #41883

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
Your problem is not really linked to Telemac but to your cluster and the way it works...
Telemac allways write it's results files even when the simulation crash. The only step which is not done is the merge of partitionned results.
In your case, you probably have an extra step in the launching procedure to copy the files from the scratch directory to your launching directory.
Usually, when I met such problems, I make a direct connection to the computation node and make a manual copy from the scratch to my running directory. All I need is to know the computing node and for me the jobid as the scratch directory use this number as a directory name where the files are...

Hope this helps

PS: think to update your profile!
Christophe
The administrator has disabled public write access.

Write out results file for failed simulation 1 year 10 months ago #41884

  • Renault
  • Renault's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 120
  • Thank you received: 33
Hi Christophe,

Thank you so much for your advice! You're right, the intermediate files are still alive (for now) in the scratch directory. I should have looked before, but I didn't because in the past, the scratch directory has erased files quickly. Since then, I've changed scratch directories and the files last longer.

I wasn't sure how to merge the files manually, but with some trial and error, I copied and adapted the example from the ipy notebooks. From the simulation directory, run:
gretel.py --geo-file T2DGEO --geo-file-format SERAFIND --res-file T2DRES --res-file-format SERAFIND --ncsize X --bnd-file T2DCLI
gretel.py --geo-file GAIGEO --geo-file-format SERAFIND --res-file GAIRES --res-file-format SERAFIND --ncsize X --bnd-file GAICLI
(With X the number of parallel processes used to run the simulation, e.g. 4 or 12). Note that I use double format, so adapt SERAFIND to SERAFIN or MED as needed)
(Also of note, this can likely be adapted to whatever modules you are using. Just find all the *GEO and *CLI files, and change the *RES file name to match in every gretel.py command.)

The above commands automatically find all the partitioned files and merge them. Then, the results files (*RES) can be moved or copied to whatever location:
cp T2DRES ~/path/to/results/telemac-results.slf
cp GAIRES ~/path/to/results/gaia-results.slf

From there, the selafins should work as normal for postprocessing. In my case, I can see that there is some bed rippling forming at the entry, perhaps this is causing numerical instability... but that's a problem for another thread :silly:

I hope this can be useful for anyone else in my position. Maybe this can be added as an option in the dico in the future, with default value of NO? I can see why it may not be a good idea, but IMO it would be useful for users to be able to inspect the simulation where it fails, without having to go "under the hood" of TELEMAC.
WRITE RESULTS ON FAILED SIMULATION =YES

PS: the profile settings don't show Windows 10 or GAIA as option, I think they need to be updated! ;)
The administrator has disabled public write access.

Write out results file for failed simulation 1 year 9 months ago #42017

  • Renault
  • Renault's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 120
  • Thank you received: 33
Hi,

A quick follow up for *nix users (and maybe PowerShell users?). I was tired of typing out these commands, so I wrote them into a shell script. Your mileage may (in fact, will) vary, so study the script carefully before using it and adapt it as necessary. I make no guarantees!

In essence, it calls GRETEL to regroup the partitioned files for each module at play (in my case, Telemac-2D and Gaia), then searches through the case file using grep to find the name of the results file, and finally copies the results files to the directory specified when calling the script. The & tells the shell to run these commands simultaneously.
I personally call this script gre2res.sh and leave it in my home directory, so I can call it using the syntax:
~/gre2res.sh /path/to/output/folder

[code]#!/bin/bash
# input: supply path to results
# make sure you are in the Telemac working directory (usually named after the current .cas file) when you run this script

# change following line to wherever Telemac's shell config file lives, as necessary
source $HOMETEL/config/pysource.sh

# calling GRETEL - make sure to add and remove modules as necessary
# if you're not sure what the files are named, run "ls *CLI" and see what happens
# NOTE: make sure you change the number after --ncsize to the number of cores you used, otherwise GRETEL won't be happy
gretel.py --geo-file T2DGEO --geo-file-format SERAFIND --res-file T2DRES --res-file-format SERAFIND --ncsize 120 --bnd-file T2DCLI &
gretel.py --geo-file GAIGEO --geo-file-format SERAFIND --res-file GAIRES --res-file-format SERAFIND --ncsize 120 --bnd-file GAICLI

# searching within the associated .cas files for the results file name
# for me, a results file line might look like this:
# RESULTS FILE                ='../Results/hdyn-cold.slf'
# so I ask grep to find the results file line and perl to strip away everything but hdyn-cold.slf
# again, your mileage may vary, your version might look like this for instance:
# T2DRESULTS=$(grep -e "RESULTS FILE" T2DCAS | grep -v "FORMAT" | perl -p -e "s/'//g;" -e "s/RESULTS FILE *= *//g;")
# have fun with regex :)
T2DRESULTS=$(grep -e "RESULTS FILE" T2DCAS | grep -v "FORMAT" | perl -p -e "s/'//g;" -e "s/RESULTS FILE *= *\.\.\/Results\///g;") &
GAIRESULTS=$(grep -e "RESULTS FILE" GAICAS | grep -v "FORMAT" | perl -p -e "s/'//g;" -e "s/RESULTS FILE *= *\.\.\/Results\///g;")

# finally copy results files into the specified path
cp T2DRES "$@/$T2DRESULTS" &
cp GAIRES "$@/$GAIRESULTS"

Again, this script is adapted to my use case, so I can't say it will just plug and play; you will almost certainly have to play with the grep/perl lines or add/remove modules. One could conceivably list out the CAS files present and use these as input for GRETEL with some tweaking:
PRESENTMODULES=$(echo *CAS)
There may be a way to get the results file name more elegantly by using TELAPY or DAMOCLES, but I haven't figured it out (if you do, please share how!). That said, I hope this can be of some use to some other lost soul out there :woohoo:
The administrator has disabled public write access.

Write out results file for failed simulation 1 year 9 months ago #42019

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

There is an optimisation issue with GRETEL since the addition of HERMES to TELEMAC, but in case is it easier for you, you can (could) also use the --merge option, same as your launch command, also documenting the tmp or fixed directory name.

Hope this helps,
Sébastien.
The administrator has disabled public write access.
The following user(s) said Thank You: Renault, TelemacUser1

Write out results file for failed simulation 1 year 9 months ago #42020

  • Renault
  • Renault's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 120
  • Thank you received: 33
Hello Sébastien,

Wow, this is so much simpler than the script I wrote! I just tried it, and for me this worked (running from the .cas directory):
telemac2d.py --merge casefile.cas -w /path/to/tmp/dir/where/files/are/located

This writes both the T2D and GAI results files to the location specified in the case file, same as my script but much simpler and more foolproof.

I'm certain the documentation team are swamped, but this would be a really nifty addition to the Telemac-2D user manual, as it's the next best thing to a dico entry that automatically does the merging after a failed simulation. I'm almost tempted to request a Gitlab account and add it myself :laugh:

Thanks again for this much simpler solution!
André
The administrator has disabled public write access.
The following user(s) said Thank You: TelemacUser1

Write out results file for failed simulation 1 year 8 months ago #42143

  • Renault
  • Renault's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 120
  • Thank you received: 33
Hello, quick update. I should note that this command works better if the number of threads is specified, in this case 12:
telemac2d.py --ncsize=12 --merge casefile.cas -w /path/to/tmp/dir/where/files/are/located

Hope this helps.
The administrator has disabled public write access.
The following user(s) said Thank You: TelemacUser1

Write out results file for failed simulation 1 year 9 months ago #42027

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
Hello,

Please do register a GitLab account even for these (samml) things - we appreciate all
the help we can get as we cannot possibly do everything ourselves :cheer:

Best,
Sébastien.
The administrator has disabled public write access.

Write out results file for failed simulation 1 year 9 months ago #42045

  • Renault
  • Renault's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 120
  • Thank you received: 33
Hello Sébastien,

How do I register for an account? I tried both the ot-consortium email address on the Contact page and Boris's email in the Developer guide (section 7.1), but both returned mail delivery errors. I've looked elsewhere and only seen "please contact us" with no other definite contact info. Is there a page describing who to contact, with a valid email address :laugh: ?

Thanks,
André
The administrator has disabled public write access.

Write out results file for failed simulation 1 year 9 months ago #42047

  • borisb
  • borisb's Avatar
  • OFFLINE
  • Admin
  • Posts: 128
  • Thank you received: 64
Hello,

The email address in the developer's guide is not correct... That said, we will set up a form on the Telemac website to create a GitLab account in order to avoid spam and I will update the documentation accordingly.

However you won't need to wait for this: I just asked our GitLab administrators to create an account for you. You should receive an email as soon as it's done.
The administrator has disabled public write access.
The following user(s) said Thank You: Renault
Moderators: Pablo, pavans

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.