I see that your are limiting the ram maybe this could be the issue ? Could you try increasing it.
@yugi: Normally, when a process runs out of memory, we get a relevant error message in stderr. This is not the case, but worth examining.
@Sardar: You have two options to increase the memory per process, but I am unaware of how much RAM you really anticipate for your specific experiment:
- using thin nodes (188 gb per node), you would lower ppn and increase pmem, so that pmem x pmem <= 188gb. I leave it to you to experiment
- alternatively, you can use the bigmem nodes (760 gb per node), and keep ppn=36 and raise pmem to 21gb. If that is still not enough, you can switch to using superdome machine which offers up to 50gb per core/process
Have you ever run similar tasks on other machines/clusters successfully to the end? If so, how much memory did you typically need?